T.E.O.'s Draft--Cascading Speech Style Sheets (txt)

Raman T. V. (raman@mv.us.adobe.com)
Wed, 28 Feb 1996 08:58:40 -0800


I'm sorry, but this proposal from Europe is a *joke*! (apologies if I sound
rude --that is not the intent)

We're talking of a style sheet specification --not a speech synthesizer.
I'm completely bemused by the assertion at the end that states
"not many people can afford expensive devices so we are making a simple one"

JuanJo Miguez writes:
> T.E.O.'s Draft--Cascading Speech Style Sheets
> K.U. Leuven
>
>
> Ing. to be Juan Jose Miguez Iglesias mailto:Juanjo.Miguez@KULeuven.ac.be
> ir. Filip Evenepoel mailto:Filip.Evenepoel@KULeuven.ac.be
> ir. Bart BAwens mailto:Bart.Bauwens@KULeuven.ac.be
> Prof.dr.ir Jan Engelen mailto:Jan.Engelen@KULeuven.ac.be
> Prof.ing Antonio S. Pena from the E.T.S.I.Telecomunication of Vigo (Spain)
>
>
> A SIMPLE DEFINITION
> -------------------
>
> The T.E.O. group at the Katholique University of Leuven in Belgium
> believe that the best way to include Speech within the CSS is to make it
> simple and general, so that it's easy to use. We agree with the Raman T.V.
> Initial Draft:
>
> (http://www.eit.com/msgid/199602130050.QAA10031@labrador.mv.us.adobe.com)
>
> that is very interesting to include Speech in the CSS but we don't want
> to make it very complicated. Many people doesn't even know decibels, most
> actual speech synthesizers are mono and it's easier to give values to
> some features with numbers (in a more theoretical way, then this values
> will be mapped to the real values for each synthesizer). You can see this
> page with your browser in HTML in the URL:
>
> http://www.esat.kuleuven.ac.be/~juanjo/csss1.html
>
> We have defined the set of properties for Cascading Speech Style Sheets
> like in the CSS1 Working draft:
>
> Speech
> ------
> Volume
> Value: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
> Initial: 0
> Applies to: All elements
> Example: volume: 5
>
> The reason why the default value is 0 is because normally there
> will not be sound, but in the case that other value is specified
> the speech syntetizer will start working. There are many sets of
> values in the volume range (and all the other set of properties)
> depending on which speech synthesizer you use, so theese theoretical
> values will be mapped into the real values used by the synthesizer.
>
> We think this way is easier than Raman's one, where the user
> should know to make his own style sheet how what decibels are. In
> fact really few people know about this (engineers, Physics and so on).
> To make it easy we let people decide between a set of ten values
> that will be mapped by expert people to the real values in the
> synthesizer.
>
> Speed
> Value: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8| 9 | 10 |
> Initial: UA specific
> Applies to: All elements
> Example: speed: 6
>
> Some users (specially between blind people) prefers very high
> speed speech because they have a very good hearing so they could
> go very fast reading web pages. That is the reason why we chose this
> big range. Of course "speed: 0" is not allowed because you could
> not hear anything.
>
> Voice-type
> Value: | child1 | child2 | male1 | male2 | female1 | female2 |
> Initial: UA specific
> Applies to: All elements
> Example: voice-type: female1
>
> This is the way to set the phisical features of the articulating
> voice. For example the voice of a boy, a woman, a man, sounds
> different, and that is the reason.
>
> Pitch
> Value: | 1 | 2 | 3 | 4 | 5 | 6 |
> Initial: UA specific
> Applies to: All elements
> Example: pitch: 4
>
> This is a small range for the medium frequency (F0). The same
> person (the same voice type) can talk (in media) more grave or
> less, which gives the appearance to be a different voice. If we
> try to combine "Pitch" and "voice-type" for example:
>
> if voice-type=child1,F0=1 (low voice)--> real medium frequency:150Hz
> if voice-type=child1,F0=6 (high voice)-> real medium frequency:350Hz
> if voice-type=male2, F0=1 (low voice)--> real medium frequency: 50Hz
> if voice-type=male2, F0=6 (high voice)-> real medium frequency:150Hz
>
> All this voices sounds different. We have a big range of different
> voices because F0 (Pitch frequency) is mapped to different values
> of real frequency depending on the voice-type. That's why 6
> possible values of pitch are enough to make a simple definition with
> 36 different voices.
>
> When a user wants to write his personal CSSS, he can try any of the
> available values, and it will work because they will be mapped to real
> and typical values. With Raman's specification someone could try with
> an average-pitch of 5 Hertzs, but it will sound bad. We prefer to let
> people choose a relative number than an exact and perhaps wrong number
> of average pitch.
>
> Prosidy
> Value: | on | off |
> Initial: on
> Applies to: All elements
> Example: prosidy: off
>
> With prosidy activated the synthesizer gives the entonation (the
> evolution of F0 along the time) which will sound hard, soft, angry
> questionable..... If you have "prosidy:off" the result will be
> like the voice of a robot (blind people prefer this kind of voice
> and also hearing very fast voice)
>
> Language
> Value: defined in the ISO 639 (Codes for the representation of
> the names of languages)
> Initial: en
> Applies to: All elements
> Example: language: fr
>
> You can specify any language because the way to pronounce the same
> message is different between countries (e.g. fr,nl,es,en....).
> For example the Apollo II (multilingual speech syntesizer)
> supports 7 languages (russian, english, french, spanish...). The
> default value is english because it's the most used language in
> the web, and although many languages are not supported nor
> perhaps will be in the future, it's better to include all than a
> little part of them.
>
> We try to make understandable speech, but we think that it's
> difficult to make a speech synthesizer speaking in all the dialects
> of all the world's countries, as Raman suggests in his draft. It
> could be possible, but not many people could afford it. We are just
> thinking to make easy for the final user and with the devices that
> are now mostly used, so that this could be working soon because there
> are many people that needs it very much as soon as possible (blind or
> impaired people)
>
> This is a DRAFT, we have discussed about it, and now is your turn to say if
> you like as it is, or you would like to talk about some features. I hope
> you will tell us what you think about it. Thank you!
>
>
>
> Kath. Universiteit Leuven--Dept.Electrotechniek (ESAT), T.E.O.
> mailto:Juanjo.Miguez@KULeuven.ac.be
> ----------------------------------------------------------------

-- 

Best Regards, ____________________________________________________________________________ --raman

Adobe Systems Tel: 1 (415) 962 3945 (B-1 115) Advanced Technology Group Fax: 1 (415) 962 6063 1585 Charleston Road Email: raman@adobe.com Mountain View, CA 94039 -7900 raman@cs.cornell.edu http://www-atg/People/Raman.html (Internal To Adobe) http://www.cs.cornell.edu/Info/People/raman/raman.html (Cornell)

Disclaimer: The opinions expressed are my own and in no way should be taken as representative of my employer, Adobe Systems Inc. ____________________________________________________________________________