Re: Draft: Universal Document Identifiers

Tim Berners-Lee (timbl)
Wed, 4 Mar 92 10:42:32 GMT+0100


> Date: Thu, 27 Feb 92 19:45:42 -0500
> From: jcurran@nnsc.nsf.net

>Even if the exact scheme is not used, the requirement
> discussion contained in the paper is quite valuable.
> I have a few comments:
>

>] Terms
>]
>] The objects on the network which are to be named include
>] objects which can be retrieved, and objects which can be searched.

> Using this definition, one would infer that document identifiers
> would allow reference to a distinct file, a particular mail
> message, news article, etc. I would not anticipate that a document
> identifier would be used to identify a newsgroup, interactive
> service, archive directory , or a wais source. Are we trying to
> define a universal id or a universal document id? Might it be
> better to defer the definition of non-document resources and then
> come back and make the document specific id's be a subset of a
> future general resource identifier?

You are right that the UDIs were inteneded to be able to refer to any
of those things. (In the W3 world, they all look pretty similar
anyway -- they are all represented as [hyper]text objects.) It is
largely in order to be able to make references to any of those things
that we need a UDI rather than a WAIS-DI and a W3-DI and a news-DI
etc etc. A UDI allows references between systems, and expandability
for the future. My answer would be that we are trying to define a
universal document id, but where "document" has the very wide
interpretation as any data which can be retrieved, viewed or
searched: anything to which you might want to make a reference.
For example, a person is not a document (although to have a document
on the net representing each person might be useful... their
signature/disclaimer with links to their published works, etc etc.)
If we can't cope with the objects which are on the net now, how can
we hope to cope with the wierd things to come .. video clips from the
news last night etc...

] Relevance
]

] The life of a name is limited by any information contained within
it which

] may become prematurely invalid. It is therefore necessary to limit
the

] contents of a name to the information required for the operations
above.

] Other extraneous information about the document (its size, data
format,

] authorization details, etc) may in general change with time and
should

] not be part of the name.

> The proposed document identifiers have many characteristics which

> may change with time: storage location, access protocol, format,

> etc. If we focus instead on the "information content" of a \
> document, then it might be possible to form identifiers that are
> more robust. Many people consider:
>
> file://info.cern.ch./pub/www/doc/udi1.ps and

> file://info.cern.ch./pub/www/doc/udi1.txt
>
> to be the same document; just in different formats.

Precisely. We look forward to the day when a name like

x500:/CH/CERN/CN/TBL/TechNote-15

will be put through a name server which will return a set of
addresses. In the mean time, we don't have that ubiquitous name
server (directory) facility. So we have to make do with physical
addresses. And different versions of the same document look like
different documents. Its a shame. The plan is that UDIs can migrate
from physical addresses to registered names.

> It would be nice to be able to recognize this
> and allow the user (and user interface) to determine which
> instance should be used for retreival.

Yes. Absolutely. (The neatest way is for the client to send a set of
preferences over with the request, and for the server to decide which
to format to send. This is a suggestion for an evolved wais and/or
http protoccol.) Another way if for the client to ask a name server
for addresses, and retrieve the headers of each one to find out which
representation he'd prefer -- But I'd prefer all the represenattions
of the document to have the same name right down to the retrieval
protocol level.

> This recognition may only
> be perform if the document id's (now being used document content
> ids) contain only location and format independant data. It is easy
> to imagine that uniqueness could be assured by combining
> an organization, author, and title:
>
>
> cern.ch:www-staff:udi1

>
> ietf:osids:archdirectory-00

There are two functions: One, to find out whethre two documents are
the same. Two, to derive a (set of) addresses for retrieval of the
document. To be able to do the first, any unique id (like OSF/DCE
UUIDs or RFCxxxx message ids) will work. To be able to do the second,
a directory service is needed.

> Note that the actual location of the information might be far
> removed from the point of creation, and the format might be
> changed:
>
>cern.ch:www-staff:udi1;file://ftp.uu.net/doc/univeral-docids.PS.Z
>cern.ch:www-staff:udi1;news:<1992Feb21.121919.1@quake.think.com>
>cern.ch:www-staff:udi1;wais://nnsc.nsf.net/info-retrieval-notes?udi1

I see the usefulness of quoting both the unique identifier and the
physical address. I hope that in the future, though, one will only
need the first part "cern.ch:www-staff:udi1". That, fed into the
directory service, will produce a list of addresses.

You can, of course, still quote both: "You need document
x500:/cern.ch/www-staff/udi1 which I found on
file://ftp.uu.net/doc/univeral-docids.PS.Z".

I would also suggest that if a document has a unique registered name
then it should certainly contain that name, so that if you find it
some otherway, you can refer to it (make links to it) by its official
name.

> That's all
> /John

Good points -- thanks for the input...I think more needs to go in
about registered unique names in the document.

Tim BL