Re: ISO charsets; Unicode

Stavros Macrakis (macrakis@osf.org)
Tue, 27 Sep 1994 15:08:08 -0400


Dave Raggett says:

The LANG attribute is essential for handling text which reads right
to left rather than left to right....

Actually, all that is needed is a unique identification of each
presentation character as right-to-left or left-to-right. If a viewer
encounters the logical sequence of letters Arabic-J Arabic-M Arabic-L,
is presenting them using Arabic script, and has an Arabic font to
present it in, it should display the glyphs:

Final-Arabic-L Median-Arabic-M Initial-Arabic-J

Note that this requires two kinds of information: first, that Arabic
uses distinct glyphs for letters depending on adjacent letters, and
secondly, that Arabic script is written right-to-left.

If however the viewer cannot display Arabic script, or if the user
prefers Latin script (perhaps s/he doesn't even read Arabic script,
but is consulting the etymology of a word that comes from Arabic in a
dictionary), it may well choose to present it in transliteration as
"jml" in that order. You can see from this example that it doesn't
matter what the _language_ is, it doesn't even matter what the
_letters_ are; what matters is the presentation _script_ used to
represent the letters. Of course, it is critical that Arabic-J be
encoded differently from Latin-J.

The algorithm for displaying mixed left-to-right and right-to-left
glyphs is pretty straightforward and is presented in the Unicode
documents. There is NO EXCUSE for using presentation order in HTML
documents.

-s

PS This is not to say that a language attribute is a bad idea, but it
is orthogonal to the script issue.

PPS Presenting Arabic nicely actually involves more than just Initial,
Medial, and Final forms, but that is peripheral to the issues being
discussed here.