Re: ISO charsets; Unicode

HALLAM-BAKER Phillip (hallam@dxal18.cern.ch)
Thu, 29 Sep 1994 11:35:36 +0100

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Nathaniel Borenstein: "Re: Languages (was Re: Forms support in clients)"
Previous message: Marc VanHeyningen: "Re: Forms support in clients"
Maybe in reply to: Richard L. Goerwitz: "ISO charsets; Unicode"
Next in thread: Richard L. Goerwitz: "Re: ISO charsets; Unicode"

In article <897C@cernvm.cern.ch> you write:

|>I agree with this 100%. The problem that remains to be solved is
|>whether we need a directionality switch. It's been claimed that
|>MIME's default "visual" encoding standard just doesn't fit into
|>HTML and SGML's philosophy, because it anticipates too many aspects
|>of the presentation - the way clients will choose to display things.
|>But can we do without it?
|>
|>It would be theoretically cleaner, for sure, if we all just bit
|>the bullet and did things "right", i.e., used normal byte order
|>for all languages, relying on the clients and servers to negotiate
|>how direction changes will be handled.

Since clients will have to cope with `proper' byte order there is no advantage
in also permitting a reversed scheme to make it `easier'. All that would do
is add in an extra case to cope with.

MIME is a bit different, or rather text/plain is different. Here a `treat
all characters as going left to right' mode is easy to do. But the idea
of block shuffling HTML to work out the logical order of the text gives me
the screaming heebie-geebies, literally.

<p>
this is some text &aleph;&gimel; ordered aleph gimel <lang=hebrew>&gimel;
&aleph; desrever</lang> as was the last bit followed by the word "reversed".
</p>

The problem with the reverse ordering is that it falls apart where there
are multiple words and a line break intervenes. In general the rule should be
to permit arbitary characters in arbitrary contexts (so I can refer to the
aleph collaboration inside english text). But that proper text odering requires
a <lang> environment.

The rules for formatting are quite easy to work out:

1) The margins of the paragraph are set by the language environment that
the paragraph started in. Ie range left, range right, center, justify.

This also sets the main scan order, the starting point for the
typesetting after a new line.

2) Typesett each word working out if it will fit, if there is not enough
space start a new line.

3) If the scan order changes then the space remaining on the line is
calculated and typestting continues as normal, except that we do
not finalise placing until either the line ends or we have
another scanning order reverse.

When this happens we fill in the offsets on the buffered
segments so that the end of the text in the previous scan
order adjoins the previous text.

so if we have an imaginary text where e is left right scanning and h is right
left :-

eeeeeeee_1 eeeeeeeee_2 eeeeeee_3 hhhhhh_2 hhhhhhh_1 eee_1 hhhhh_1
hhhhh_3 hhhh_2 eeeeee_1 eee_2

Because we started left to right the second block of hhhh continues at
the left margin.

-- Phillip M. Hallam-Baker

Not Speaking for anyone else.

Next message: Nathaniel Borenstein: "Re: Languages (was Re: Forms support in clients)"
Previous message: Marc VanHeyningen: "Re: Forms support in clients"
Maybe in reply to: Richard L. Goerwitz: "ISO charsets; Unicode"
Next in thread: Richard L. Goerwitz: "Re: ISO charsets; Unicode"