Re: iso 8859 or escape sequencies?

Chris Lilley, Computer Graphics Unit (
Tue, 12 Apr 1994 12:50:08 GMT

Bert Bos <> writes:

> [original attribution missing]
> |Is there a reason to use the html "escape-sequencies" (&oumlaut for |

That should be &oumlaut; by the way. Note also that the character referred to
has become corrupted toia vertical bar (in my quoted attribution) by the mail
software, a problem that HTTP does not have.

> |etc.) for characters that are also in the iso 8859-1 character-set? Are
> |there browswers that do not support the full iso8859 character set but
> |do support the escape-sequencies?
> | -Timo H

> There are several reasons:

>- On many computers Latin-1 is not the default character set, so codes
> above 127 would be mapped incorrectly.

I disagree. It may not be the default character set, but ISOLatin1 is defined as
the character set that HTML uses. It is transfered in 8 bit mode, so it arrives
intact. Browsers which use some other character set in the hope that common
letters will occupy the same cose positions are therefore, as I see it, broken.

If a particular platform does not use ISOLatin1 and does not have a font that
uses the ISOLatin1 encoding, it is up to the browser to do something sensible
about it. Naively using a different encoding is not something sensible. Using
code mapping tables, overstrike, and so on is.

Bert's statement contains a hidden assumption

>- On many computers Latin-1 is not the default character set,
[ assumption; browsers should/will not do any code translation]
> so codes
> above 127 would be mapped incorrectly.

That assumption is not correct, IMHO.

If browsers on EBCDIC platforms were to display a, b, c etc incorrectly that
would definitely be considered brokem. I submit that just because my native
language happens not to need an a acute or a u umlaut, that is no reason to
consider these characters any less important, or rendering them correctly to be
an optional little detail.

>- Using the SGML entities ensures that the file can be e-mailed (see
> what became of your &oumlaut; above...)

Indeed. So browsers should certainly map these characters to 7 bit clean
versions, or use quoted printable, or base 64 encoding, or whatever when mailing
html files from the browser. That is a separate issue, concerned with reusing
the html file for something other than it's original purpose.

I see no reason to insist that people type these things in. For some people, it
is part of their language. They do not normally have to type these things in in
a special way; there are characters on their keyboards with the symbols on, they
press them, and get the correct letter.

Consider, for example, if a s&letterT;an&letterD;ar&letterD;s bo&letterD;y
ou&letterT;side your coun&letterT;ry (which happene&letterD; no&letterT;
&letterT;o use &letterT;he le&letterT;&letterT;ers "t or "d")
insis&letterT;e&letterD; tha&letterT; you en&letterT;er all &letterT;ex&letterT;
in &letterT;his way

How inconvenient this would be!!

- Browsers that cannot display the characters, can -- in principle --
approximate them.

Agree absolutely. Whether the character is transferred as an 8 bit
representation - perfectly valid when the transport is guaranteed to be 8 bit
clean - or as an entity reference (is that the correct term?) is however
orthogonal to how the browser choses to represent or approximate them.

Chris Lilley
| Technical Author, ITTI Computer Graphics and Visualisation Training Project |
| Computer Graphics Unit,        |  Internet:            |
| Manchester Computing Centre,   |     Janet:            |
| Oxford Road,                   |     Voice: +44 61 275 6045                 |
| Manchester, UK.  M13 9PL       |       Fax: +44 61 275 6040                 |
| <A HREF="">click here</A> |