Re: Byte ranges -- formal spec proposal

Ronald E. Daniel (rdaniel@acl.lanl.gov)
Wed, 17 May 1995 20:58:31 +0500


Thus spoke Ari Luotonen <luotonen@netscape.com> (at least on Wed, 17 May 1995)

> _________________________________________________________________
>
> BYTE RANGES WITH URLS AND HTTP

We have been putting off the problem of fragment identifiers, and this
is a good start on the problem. I have a few reflex objections about
details - such as preferring 0-based addressing to 1-based - but they
are very minor. My major objection is that I would like to see
byterange addressing as one component in a more general fragment
identification architecture. The "Miscellaneous" section, quoted below,
mentions the possibility of combining different addressing schemes, but
does not provide any specification. I would be a LOT happier if we
could have an overall scheme that byterange, paragraph, row/col, word,
stanza, and other addressing schemes could fit into.

For example, I might want queries such as:

Get the value of the <title> element in an HTML file
http://host/path;generic-id="title"

Get bytes 1-5 of the second paragraph of a file
http://host/path;para=2&byterange=1-5

Get a portion of a JPEG
http://host/path.jpg;rows=37-99&cols=53-200

When we start looking at the addressing needs of a variety of
specifiers (rows/cols, paragraphs, ...) then we may find that
we would prefer different choices of index base, inclusion or
exclusion of the elements at the extremes of the range, etc.

> Miscellaneous
>
> There are other kinds of ranges that can be addressed in a similar
> fashion; this document does not define them, but both the URL
> parameter and the Range: header are defined so that it is possible to
> extend them. This byte range specification applies to any
> content-type. There may be range schemes that are meaningful to only
> certain types of documents.
>
> As an example, there might be a linerange URL parameter, with the same
> kind of range specification, and the Range: header would then specify
> the numbers in lines. Example:
>
> http://host/dir/foo;linerange=21-30
>
> The response from a 123 line file would be:
>
> Range: lines 21-30/123
>
> This could be useful for such things as structured text files like
> address lists or digests of mail and news, but isn't meaningful to
> such document types as GIF or PDF.
>
> Other examples might be document format specific ranges, such as
> chapters:
>
> http://host/dir/foo;chapterrange=1-3
>
> Range: chapters 1-3/12
>
> Or just the first chapter:
>
> http://host/dir/foo;chapterrange=1
>
> Range: chapters 1/12
>
> MULTIPLE URL PARAMETERS
>
> If at some point there will be multiple simultaneous URL parameters,
> they should be separated by the ampersand character (just like
> multiple values are encoded in the FORM request).

We need to define more than just the syntax of how multiple
parameters will be seperated. We need to define the semantics
of foo=n1-n2&bar=n3-n4. Does the "bar" parameter apply to the
result of the "foo" parameter? Vice versa? Or do we return the
two selections seperately the way you specify with foo=n1-n2,n3-n4 ?

How are errors to be handled when we specify a range that is
longer than the file? What about when the starting offset of
the range is greater than the length of the file?

Byteranges are pretty nice since they are broadly applicable,
but I am not sure what it means to ask for a byterange of
a database. This problem is even more acute when we get into
parameters such as "paragraph", "row/col", "stanza", etc.
How are we to indicate when a parameter is inappropriate for
a URL, such as paragraph for an image? Usually row/col
will be inappropriate for HTML files, but if we have previously
selected a table then it is the natural way to get a table
element. How do we do that?

If we do not develop a uniform architecture for fragment
identification, we are going to have a slew of partial solutions before
we wise up and develop a uniform treatment. Then everyone will be
pissed because of differing addressing conventions, code bloat, etc.
and a total inability to make the uniform scheme match the previous
partial solutions.

My understanding is that HyTime can handle this uniform
fragment identification. Can people knowledgeable about HyTime
talk about the good *and bad* points of using HyTime addressing
for URI fragment identification? Is there a way we can
start small, with just byterange selection, then grow our
capabilities?

Ron Daniel Jr. email: rdaniel@acl.lanl.gov
Advanced Computing Lab voice: (505) 665-0597
MS B-287 TA-3 Bldg. 2011 fax: (505) 665-4939
Los Alamos National Lab http://www.acl.lanl.gov/~rdaniel/
Los Alamos, NM, 87545 tautology: "Conformity is very popular"