RE: WWW support for Cyrillic (and UNICODE)

Vladimir Sukonnik, Process Software Corp (sukonnik@elnath.process.com)
Wed, 2 Nov 94 13:12: 0 PST


>Several reasons, in my estimation:
>
> 1) Unicode increases overhead, being 16-bit rather than 8
> 2) It is not supported by GUIs
> 3) 16-bit characters aren't supported by compilers
> 4) US/European programmers are cultural chauvenists
>
>Of course most of these objections are spurious:
>
> 1) UTF-8 allows for backwards 8-bit compatibility, adding
> to storage requirements only for characters outside the
> one-byte range
> 2) (valid objection)
> 3) Wide characters should be supported by ANSI-conformant
> compilers and libraries
> 4) US and European programmers aren't stupid; they are just
> not terribly aware of the situation in countries like
> Russia, India, Japan, etc. Give them gentle nudges, and
> they *will* respond....

>Adding multi-language capability to the Web is going to take time,
>because it will require changes to HTML, to servers, and to clients.
>Given the lack of multilingual support in most GUIs, a lot of new
>widgets will have to be created, and people will have to stretch
>themselves to learn how things like Japanese and Arabic scripts
>work. It's going to take time, but it appears we'll get there.

>Right here several things have been hashed out. For example, we
>all pretty much seem to agree that LANG and CHARSET or CODEPAGE
>attributes will be needed for HTML (with some sensible defaults).
>We've also come to the realization that logical ordering of data
>is the only way to go. The "visual" ordering that MIME allows
>for embedded Hebrew or Arabic just won't work in the long run.
>So we have to bite the bullet and make sure that clients can
>do the visual reordering themselves for mixed right-left/left-
>right languages. (There is, by the way, a terribly explained al-
>gorithm for doing this in appendix A volume 1 of the old pub-
>lished Unicode standard; I can supply people with a more prac-
>tical tutorial if anyone wants it.)

>Things *are* happening. Be patient, and offer to help. Inject
>comments where you feel they will be appropriate. Cut some code
>if you know how; otherwise, do some research on scripts and stand-
>ards in various countries and help guide the process. Above all,
>though, don't complain. Help us out!

>Richard Goerwitz

Richard,

Thanks for your reply. I agree, in general, with your estimate
of the state of the art of this issue. I just want to point out a few
things. First, Microsoft GUI (the 32-bit one) supports UNICODE,
so does MS Visual C++. I know that this does not solve the problem
for non-windows platforms. Second, EMWAC supports UNICODE in
their release of HTTP server. I believe (please correct me if I am
wrong), that NCSA Mosaic browser supports Unicode as well. The URL of
the UNICODE testing facility for browser developers is:

http://emwac.ed.ac.uk/html/internet_toolchest/UNICODE.HTM


Best regards,
Vladimir.

+---------------------------------------------------------------+
| Vladimir Sukonnik Voice: 1-508-879-6994 |
| Principal Software Engineer http://www.process.com |
| Process Software Corp Fax: 1-508-879-0042 |
| 959 Concord Street E-mail: sukonnik@process.com or |
| Framingham, MA 01760 USA sukonnik@bumetb.bu.edu |
+---------------------------------------------------------------+