Re: Initial Draft --Cascaded Speech Style Sheets

Raman T. V. (raman@mv.us.adobe.com)
Wed, 14 Feb 1996 09:34:26 -0800


Mary Holstege writes:
>
> I think the draft is very interesting and has a lot of good ideas in it.
> And yet...
>
I'm a little confused reding your comments below, perhaps you could
elaborate/clear my confusion:

> I just can't escape the feeling that attributes such as 'pitch-range' and
> 'richness' look a lot like attributes such as 'serif' or 'dpi'. That is:

> they more properly belong in a font (voice) definition than in a style
sheet
I don't understand the point that "they dont" belong in the stylesheet.

Setting pitch-range etc achieves the same effect aurally as switching from
bold to italic visually.
Hence, it *belongs* in the stylesheet so that a designer can specify a
pitch-range of 0 to render <pre>...</pre> in a monotone voice (this is one of
the examples I have working)

Resorting to pushing all of these into the "voice-family" would result in a
huge number of voices and total disorganization. Note my comments in the spec
about speech output devices and auditory displays still being in a nasant
state as compared to the world of visual displays.

> specification. In setting styles for rendering a document visually, we
> pick "Times" or "Helvetica" or "Gothic" etc. because we know that the font
> family has certain affectual characteristics that match our needs. Similarly,
> one should be able to select the vocal equivalent of "Times" or "Gothic" for
> the same purpose in the same fashion.
I don't quite understand why say the above in the context of your earlier
comment.
Given that the visual stylesheet (either at the user end or the remote end)
can set font properties, I think the speech stylesheet should provide
equivalent functionality and flexibility.

>
> More radically, given that typographical features such as bold, italic, and
> font size were invented precisely to render certain auditory features in
> a visual medium, surely the reverse is true? Can we not organize voices
> in a manner analogous to fonts, indexed by a few basic attributes such
> as volume, pitch, and stress rather than trying to make every possible
> variation of speech available at the style sheet level.
The "speech properties" exposed in the spec are no more complex than the
various visual properties exposed in the stylesheet level, and speaking from
experience of having implemented two large speech output systems, I can state
that it is necessary.

>I suspect this
> would make the style sheet too cumbersome to use (both from an implementor's
> and an author's standpoint).
Not true, since it's already working with W3 to an extent and the code is
*not* complex.
The cascaded speech stylesheet that I am currently using is about a third the
size of the visual stylesheet used by W3 --this is because I only need to set
a few things in the speech domain.

>
> <out-there-radical-notion>
> Indeed --- is it possible to use the *same* style sheet for voice and treat it
> as a font mapping problem? Line spacing and hard line breaks are pauses
> (map points to suitable time units), flush left is send-to-left-channel, left
> margin is...

Such a mapping as you describe above would be one possible stylesheet that
allows the listener to get an accurate view of the visual rendering.
Note however, that such a view may be desirable only in the case where you're
trying to build a mental picture of the visual rendering.
As far as producing pleasant speech is concerned, the linebreaks and page
breaks are irrelevant --linebraks are a consequence of flowing the text into a
specific container --in this case the screen-- and have little to do with
pauses on the speech side. In fact, doing what you suggest above *all* the
time would result in choppy speech with poor intonational and prosodic
structure.

> </out-there-radical-notion>
>
Radical --true-- but restrictive.
Keeping the speech and visual style sheets has a number of advantages as
pointed out in the introductory section of the specification.
Moreover, a user desirous of producing an auditory rendering that is a true
reflection of the visual layout could use a "speak-verbatim.css"
that does such an explicit mapping from visual to speech properties.

> Eh. Probably not.
>
> Still, is what you've done invent a set of *style* sheet attributes or a
> set of *rendering* attributes? This is the difference between, say, a
> word processor style definition and a line drawing specification in that
> same word processor.
>
I think the speech properties defined are rightfully style sheet
properties. Note that some of the confusion you feel may be a result of the
temporal organization of speech and audio.
>
> -- Mary
> Holstege@kset.com
Thanks for your comments and ideas, I certainly appreciate them even though I
disagree with most of them :-)

>
>
> Mary Holstege, PhD
> Manager, Online Engineering
> KnowledgeSet Corporation
> 555 Ellis Street Tel: (415) 254-5452
> Mountain View, CA 94043 FAX: (415) 254-5451
>
>
>

-- 

Best Regards, ____________________________________________________________________________ --raman

Adobe Systems Tel: 1 (415) 962 3945 (B-1 115) Advanced Technology Group Fax: 1 (415) 962 6063 1585 Charleston Road Email: raman@adobe.com Mountain View, CA 94039 -7900 raman@cs.cornell.edu http://www-atg/People/Raman.html (Internal To Adobe) http://www.cs.cornell.edu/Info/People/raman/raman.html (Cornell)

Disclaimer: The opinions expressed are my own and in no way should be taken as representative of my employer, Adobe Systems Inc. ____________________________________________________________________________