Re: Who can express URL syntax with BNF

Daniel W. Connolly (connolly@hal.com)
Tue, 26 Apr 1994 12:31:54 -0500


In message <199404261347.AA23770@RA.DEPT.CS.YALE.EDU>, Stan Letovsky writes:
>>
>>$Word = '[^/=;?#]*';
>>
>>$scheme = $1 if s*^([A-Za-z0-9\.-]+):**; # @# syntax of scheme?
>>$hostport = &unescape($1) if s*^//($Word)**;
>>$fragment = &unescape($1) if s*#($Word)$**;
>>$search = &unescape($1) if s*\?($Word)$**;
>>$path = &unescape($_);
>
>Minor question:
>This looks like perl, but I can't quite parse the regexps.
>Is this some variant perl dialect or alternate regexp syntax?

Sorry... it's just perl... Practical Extraction and Reporting Line-noise.

>Major question: This reminds me of an issue I strumbled across
>recently, about the possible coexistence of #label and ?query-string
>in the same URL.

Hmmm... from "Universal Resource Identifiers: BNF"
http://info.cern.ch/hypertext/WWW/Addressing/URL/5_URI_BNF.html

the following productions:

fragmentaddress
uri [ # fragmentid ]
uri
scheme : path [ ? search ]

would suggest
http://host.com/database?search#fragment
is kosher.

The fact that none of the following characters:

reserved
= | ; | / | # | ? | : | space

can occur in a fragmentid or search leaves no ambiguity that I can see.

(thoug it means that http://host.com:3000/ doesn't parse -- the colon
is no good).

We really need a spec that disambiguates cases like this. That's why
I'm building a test suite:
http://www.hal.com/%7Econnolly/url_test/

What's there is pretty old...

> I did some experiments with Mosaic 2.4 that
>suggested it did not recognize both in the same URL (ignored
>the label, I think, although it was ignoring labels in any
>script results when relative URLs were used, so I am not
>positive how it interprets this combination in all contexts.)

Ah yes... the old "see what Mosaic does" test. Hardly satisfying.

>Your regexps do not suggest any exclusion between #label
>and ?query; I can't tell if it imposes an order on them.
>Does anyone know what the official (? is there such a thing?)
>position is on the legality and syntax of combining #label
>and ?query in one URL?

Well... the URI working group completely balked on this sort of
thing... In their game, a URL is just scheme:opaque-string, with the
syntax of the string defined on a per-scheme basis.

In response to that, the WWW team sort of "took their marbles and went
home." Tim's editing a URI standard that has all the WWW mechanisms in it.
See Tim's collected notes:

http://info.cern.ch/hypertext/WWW/Addressing/Addressing.html

Daniel W. Connolly "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project (512) 834-9962 x5010
<connolly@hal.com> http://www.hal.com/%7Econnolly/index.html