1.4 Provide an auditory description of the visual information in multimedia presentations

WAI checkpoint 1.3

Full WAI text: "Until user agents can automatically read aloud the text equivalent of a visual track, provide an auditory description of the important information of the visual track of a multimedia presentation."

A user agent is a piece of software for accessing Web content. User agents could be desktop graphical browsers, text browsers, voice browsers, mobile phones, multimedia players, plug-ins, and some software assistive technologies used in conjunction with browsers such as screen readers, screen magnifiers, and voice recognition software.

Multimedia is information presented in a combination of forms, including audio (speech, sounds, music, etc.) and images (video, text, graphics, pictures, animations, movies, etc.). On the web, multimedia information can be presented through the web browser in any of these formats or in combinations.

Animations created with a product like Macromedia Flash might combine images and audio. An instructional Flash animation about web design could include images of a website on a computer monitor, a narrative or voice-over and images of a person using software. The images are contained on the image track, while the voice-over is contained on the audio track.

An auditory description is a spoken description of the content of a multimedia file. In the case of a video, the auditory description would describe in speech, the visual content of the video, including images, actions, body language, changes of scenes, etc. The auditory description is typically synchronised with the audio content of the file. In the instructional movie on web design, the audio description would be inserted between silences in the voice-over. It could inform the user if the instructor was demonstrating a feature of the design software, or if the voice-over was referring to the image of the website on screen.

The auditory description can be presented as pre-recorded or synthesised speech.

A text equivalent of the visual track is "equivalent" when both fulfil essentially the same function or purpose. The text equivalent may be a description of the visual track, i.e., what it looks like or sounds like. It is presented in text format.

Most browsers do not automatically read out the text equivalent of a visual track and until such time as they do, you must present the auditory description in both text and audio formats.

Rationale

Users who can't see the visual content of a multimedia presentation require auditory description. This is important when users can't get close enough to the display monitor, have poor eyesight or they could be blind.

Users who can't hear the audio track require a text description. They may be working in a quiet environment, like a library, with the sound turned down or they might be in a noisy environment where the sound is obscured by noise. They could have poor hearing or could be deaf.

Some accessibility features are best handled by the user agent. Ideally, users should be able to set it to ignore any commands, which could be embedded in the underlying HTML that might cause the content of a web page to be rendered unusable.

Some newer browsers provide support for accessibility controls but not all browsers are the same. There are no widely adopted standardised, consistent controls to improve accessibility.

Ideally, all user agents would detect whether the equivalent of a visual track is provided in text format and would automatically read it aloud to the user. However, this is not the case and the requirement to ensure that a website is usable, without relying on proprietary browser features is placed on the developer until the time comes when browser behaviour is more standardised.

Until then, it is necessary to provide an auditory equivalent that is read aloud and embedded in the audio track of the multimedia file.

Directions and Techniques

Provide text descriptions for video, as captions

Captions are text equivalents of the visual content of a video. They are provided synchronously with the video content. Some media formats (e.g. Quicktime and SMIL) allow captions and descriptions to be added to the multimedia clip. If you are using these formats, you should take advantage of these features.

How you could check for this:

View the presentation with the monitor turned off or while looking away from the monitor

If you can't follow the presentation with sound only, then you need an audio description.

- View WAI checkpoint 1.3