Parsing HTML: Easiest way?

Bowden Wise (wiseb@cs.rpi.edu)
Tue, 17 Oct 1995 13:53:07 -0400


Hello HTML colleagues:

I am doing some reasearch in multimodal interfaces and want to develop
a multimodal Web browser. I want to use sound/speech as well as
visual graphics to present HTML to users.

Since this is a demonstration app for my PhD, I do not need to develop
a full featured browser.

What I would like to do is parse an HTML file into some structure that
I can use in my app to base my presentation on (either auditory or
visual) so that both presentations are driven by the same high level
information about the HTML file.

I do not have a Web browser to base my browser on, so my question is
what is the best way to parse HTML for my purposes? I am using a
Windows 3.x platform (16-bit).

Some ideas I have thought of doing include:

- using sgmls
- using the W3C Reference Library
- using the W3C Line Mode Browser as a base

are there any other mechanisms I might use? I haven't much experience
with coding browsers/parsers for HTML. So, I welcome your insights
into this dilemma.

Also, I have not subscribed to www-html, so please reply via e-mail.

Many thanks in advance.
Bowden

--------------------------------------------------------------------
G. Bowden Wise
Computer Science Dept, Rensselaer Polytechnic Inst, Troy, NY 12180
Email: wiseb@cs.rpi.edu WWW: http://www.cs.rpi.edu/~wiseb/