Put the information into the right words
Much of the on-screen information that needs to be spoken is in the form of text that can be read out as it is. However, there is also a lot of information contained in images or implicit in the visual layout. An example is where the programme guide is presented as a grid in which the rows and columns add meaning or context (e.g. Channel and time) to the items in the cells (e.g. Programme name). Blind users should be given all the information they require without overwhelming them with unnecessary details or repetition.
Directions and techniques
Speak text content as it would be read, unless it can be improved upon (high priority)
Text on screen should usually be spoken just as it appears. Text that is designed to give the required information will usually be suitable for vision impaired users and there is usually no need to paraphrase or embellish the text in the user interface.
However, in some cases the on-screen text may have been abbreviated in order to save screen space and a fuller version will be clearer. Figure 17, shows an example where the on-screen abbreviation AUDIO DES can be spoken as audio description.
Abbreviations should be announced as they would be read. For example, "EPG" should be read letter-by-letter as ee pee jee rather that eppug, whereas "PIN" should be read as a word, as pin rather than pee eye enn. It is not necessary to expand abbreviations. For example, the instruction Enter your PIN should not be expanded to Enter your Personal Identity Number.
In some cases, where the meaning is ambiguous, it may be necessary to embellish. For example, if there are two Next buttons on an interface, one referring to the next programme and the other to the next page of information. In this case, it may not be possible for the user to know which is being spoken, so expanding them in the spoken output to Next programme and Next page can be useful.
Provide spoken equivalents for images (high priority)
If images provide information that is not also provided in the text, that information should be spoken.
Speaking the information in an image is not the same thing as describing the image. What it looks like may be irrelevant. It is what it means that is important. For interactive elements, such as buttons, the meaning is what it does. This should be spoken briefly, e.g. Home or Next rather than Home button or Select for Next.
A good example is shown in figure 18, where programme names are followed by coloured icons containing numbers that give additional information. The numbers and colours are only visual representations of the information. It is the information that the icons represent that should be spoken, such as: first run and recorded on tuner number 3.
Describe information that is implicit in the visual layout (high priority)
Information that is not expressed in individual text or graphical items but which can be inferred by sighted users from the visual layout of a screen should be spoken. For example, a programme guide may provide options to be displayed as either a list view or a grid view. A sighted user will immediately recognise which view is being displayed when they see it, due to the layout. This is important information because the expected information and the interaction methods may be different for each view.
This information should be given in speech by describing the view type on opening. For example, programme guide, list view or programme guide, grid view.
Describe menus, options and values (high priority)
When a menu first appears, speak all of the following information:
- The name of the menu.
- The number of items in the menu.
- The number of the currently selected item, if there is one.
- The name of the currently selected item.
- The current value of the selected item (e.g. on or off), if a value is shown.
When a new menu item is selected, it is sufficient just to speak its name, number and value. When a new value is selected for an item, speak the item and value.
The following collection of screenshots shows a sequence of steps through a menu, with approriate speech output for each step.
User action: press System Setup button
System setup menu, 8 items, 1, picture settings
User action: press down arrow
2, sound settings
User action: press Select button
Sound settings selected
Sound settings menu, 6 items, audio output, stereo, use left and right arrow keys to change audio output
User action: press right arrow
Audio output, mono
User action: press down arrow
Volume, 3, use left and right arrow keys to change volume
User action: press right arrow
When the volume of the speech output itself is being increased, each increment speech volume 5, speech volume 6, etc. Should be spoken at the increased volume level.
Read numbers and dates in a natural way
Numbers should be spoken in a way that makes most sense given the context. This could be one of:
- A natural number, e.g. one thousand nine hundred and eighty four;
- A date, e.g. nineteen eighty four;
- Digit by digit, e.g. one nine eight four.
The day can be spoken in a number of ways, again depending on what is likely to be most easily understood. Take into account that the user may be unaware of the surrounding contextual information. Possible ways of speaking a day are:
- The name of the day, e.g. Wednesday;
- The date;
- A relative value, such as yesterday, today or tomorrow.
How you could test for this
Test prototype speech output with a number of blind users, using a Wizard-of-Oz methodology, to determine whether users find it informative, sufficient, succinct and satisfying. The Wizard-of-Oz methodology enables the tests to be carried out at an early stage, before the speech output is implemented, by using an experimenter to act the part of the speaking user interface. Tests should be task-based, in which users are required to understand and react to the information provided as speech in order to carry out typical viewing, navigation and set-up tasks. While the user operates the visual interface using the remote control, the experimenter speaks what the interface would output in response to each user action. This allows various approaches or variations of speech output to be trialled without the cost of implementing any of them first.