Image types and related issues [was: Re: filetype extensions]

Chris Lilley, Computer Graphics Unit (lilley@v5.cgu.mcc.ac.uk)
Tue, 10 May 1994 19:26:51 GMT


"Daniel W. Connolly" <connolly@hal.com> said:

> "Assume, for the sake of argument, that this caching server implements
> 100% intvertible translation between gif and tiff."

I am willing to pretend that you said

"Assume, for the sake of argument, that this caching server implements
100% intvertible translation between format A and format B."

The particular example you cited (tiff to gif) had enough problems; the reverse
process is guaranteed not to produce the same image. The information loss going
from a 24 bit TIFF to an 8 bit (at best) GIF is no way reversible.

> It appears that clients should be able to express
> a tolerance (and lack thereof) for information loss in conversion.

Yes, and this then harks back to the earlier discussion about format conversion
and negotiation on the original server.

> Something like:

> Accept: image/gif; t=1.0

> The Accept: header already specifies things like how much it costs the client
> to deal with the given format, and tolerance on how long it is willing to wait
> for a conversion.

It does? You mean, in theory, or do actual clients and servers generate and use
this information?

> We just add one that says "my tolerance for information loss
> is 1.0, i.e. no information loss is tolerable." For help icons and such, you
> would set t=0.9 or so.

OK, but you need to specify what exactly the different levels of quality mean.
1.0 is clear enough ;-) and so is 0.0 - convert it any old way but give me some
sort of image.

The meaning of the intermediate values needs to be defined. How does 0.7 diffier
from 0.4, exactly?

A further point; I assumed 1.0 to mean the exact same file as on the server.
What if, however, you ask the original server for

Accept: image/x-iris-rgb; t=1.0

(Iris RGB is a lossless 24 bit image format BTW) and the server has TIFF
available? It can do a conversion, and it can guarantee (in most cases) that the
RGB value of each pixel is identical. But it is not the same file.

And another thing - suppose a server is configured to convert the TIFF to an
Iris RGB, maybe cache it for a day, then delete it to prevent wasting disk space
as multiple formats of the same image build up. A month later, I use the same
URL and get the same conversion done. Fine. Now put a proxy in the way; it
happens to be cacheing last months Iris RGB file. If I ask it for that file, and
it asks the original server for the last modified and expires fields for
foo.rgb.

What should the server respond? The values for the original TIFF? A
last-modified of NOW as it is building the file transparently, on the fly, as we
speak?

Or consider the case where the server does not throw away the conversions, but
keeps them around (disk space is cheap at this theoretical site). Now I alter
the original TIFF. Whose responsibility is it to expire the Iris RGB, JPEG, Utah
RLE, PCX, etc etc formats that (perhaps unbeknownst to me) the server has
created. And later I make an even newer version of the image, but choose to save
it as a UTAH RLE. What happens to the previous TIFF (just in case you were
thinking of designating one formnat as a master format on which the others
depend, like some sort of revision control system or makefile)

[ Note for those who care; Iris RGB and Utah RLE (assumed linear encoding) and
TIFF (assumed generic RGB coding) are all 24 bit lossless formats. Within the
assumptions specified, these 3 can be freely and repeatedly interconverted
without information loss, which is why I picked them as examples.]

To sum up, I am saying that there is a complex interaction between a)
transparent format negotiation and conversion, and b) cache coherency issues
arising from proxy cacheing. These raise a whole host of issues that urgently
need an interim solution and long term need to be elegantly sorted out and
documented.

The issues do not seem to have been raised before till I started messing with
them, but then I don't work in a computer graphics unit for nothing ;-)

I think that part of the problem comes from the general culture of early
internet users. If images are just little gifs of hands, arrows etc that mean
'back', or they are JPEGs of trains, landscapes and naked ladies to stick in
your root window, the image quality considerations go out the window. But the
internet in general and WWW in particular will not remain the preserve of the
'casual browser at a university' for long.

Once you start getting important or even 'mission critical' images floating
around, these issues need to be solved. That time has not yet arrived - but that
time might be next year; lets sort it out while there is time.

By important I mean publishers shipping 48 MByte TIFFs which will eventually be
brought into a page layout system and appear in a glossy magazine. They don't
want the image content of these to be converted or altered. They may however be
happy if the internal (lossless) compression goes from packbits to lzw, as there
is no information loss. They certainly don't want it converted on the fly from a
40 K GIF that happens to have the same name.

By mission critical I mean things like medical images; if I have just been put
in a cat scanner and a consultant somewhere on the other side of the world is
teleconferencing with the surgeon who is about to perform brain surgery on me, I
want that consultant to see *exactly* the original image !!

<Side_issue>
This cultural heritage also shows in the mime types. image/tiff conveys nothing,
really. Look at the TIFF 6.0 spec. How do I specify that I can handle packbits
and lzw encoding, but not JPEG encoding; that I can handle palette and full
colour generic RGB, CMYK, greyscale, and bilevel images but not YCbCr, or CIELAB
and I would like any tiled images to be converted to strips? These are all
"tiff"

Of course, TIFF is the most complex format being discussed here.
</Side_issue>

> To reiterate: we need to be able to put this info _in_the_link_markup_, since
> it is not only a function of the client's capabilities, but also of the
> author's intent.

I agree absolutely. The tolerance info will vary from image to image so it
cannot be set once-for-all when installing the browser, for example. The
information must be saved in the document as it cannot in general be inferred.

> For example, when I create a link to a help icon, I don't care if a few
> color bits here and there get changed. But if I'm linking to a medical image,
> I certainly do care!

This agrees with my position.

> We just need to be careful! Keep
> all the issues on the table and allow references to express _exactly_
> what they refer to, and how much "slop" they'll tolerate.

Ok, fine. Once the definitions have been firmed up, tested out and standardised,
this seems to me the way to go.

<Side_issue>
One side effect of this is that document creation becomes, again, more complex.
A consequence of greater penetration of the web is that the range of skills in
both providers and consumers of information increases. You get more gurus, and
more dummies, as well as more folk in the middle.

As a consequence, productivity tools and intelligent, quasi-wysiwyg editors are
becoming more and more essential. The markup is becoming more and more complex.
People need to be sheilded from it.

I can see an editor where the writer has just inserted a link to an external RLE
image. Up pops a dialog box with directories and files, so th efilename gets
spelled right. Up pops another with some fields for "link text" and "brief
description" (for the ALT tag) and some checkboxes for image importance: just
decoration, keep similar, keep exact (for example). It then goes and puts four
lines of HTML++ gobbledegook into the file to express all this. And talks to the
revision control system to notify it about the new image file. And so on. People
are just not going to do all this by hand.
</Side-issue>

Comments from anyone about any part of this are solicited.

--
Chris Lilley
+-----------------------------------------------------------------------------+
| Technical Author, ITTI Computer Graphics and Visualisation Training Project |
+-----------------------------------------------------------------------------+
| Computer Graphics Unit,        |  Internet: C.C.Lilley@mcc.ac.uk            |
| Manchester Computing Centre,   |     Janet: C.C.Lilley@uk.ac.mcc            |
| Oxford Road,                   |     Voice: +44 61 275 6045                 |
| Manchester, UK.  M13 9PL       |       Fax: +44 61 275 6040                 |
| X400:  /I=c/S=lilley/O=manchester-computing-centre/PRMD=UK.AC/ADMD= /C=GB/  |
|  <A HREF="http://info.mcc.ac.uk/CGU/staff/lilley/lilley.html">my page</A>   | 
+-----------------------------------------------------------------------------+