You are here: joeclark.orgAccessibilityCaptioningBest practices in online captioning

Functional spec for a publishing tool that supports captioning


  1. Bring the software into compliance with ATAG.
  2. Make it easy for the software users to publish images, sounds, and videos with accessibility features, including long descriptions, captions, transcripts, and audio descriptions.

We’ll concentrate on the latter point in this document, with a further emphasis on captioning.

For images

Add long descriptions

When uploading or simply referring to an image, the software lets you:

a longdesc of that image. Stored preferably in a subdirectory called ld/ inside the images directory, and with a preferable filename equal to the root of the image filename with -LD appended.

For captioning

Associate a caption file

The user will have somehow created a caption file in either SMIL or SAMI format. Or the user may have created QTtext or RealText files for those specific players. Or some other file type may be used.

the software lets you upload or point to the file
the software lets you upload or point to the file
the software lets you upload or point to the file
RealPlayer prefers to handle SMIL files in .ram or .rtm containers. The software should let you upload or point to the SMIL file, then generate .ram or .rtm files itself, and associate those with the video
Windows Media Player
the software lets you upload or point to the file

Associate an open-captioned version

In all players, the software should let you designate an alternate video file that has open captions. Ideally the software should provide the viewer with a control to select open- or closed-captioned versions. That control must meet applicable accessibility specifications.

Associate a transcript

This is really not a good way to make video accessible, but for WCAG compliance, the software should let you

a transcript of the video.

For audio description

Associate a description file

Exactly as with captions, but the source is an audio file and not text.

Associate an open-described version

Exactly as with captions, but the source is an audio file and not text.

For audio only

the software should let you associate a transcript. (The imaginable but unlikely case in which an audio file is turned into a video file with no image except for captions need not be supported.)

Language variations

All of the above have to be variable by language, which adds another branch at a high level on the decision tree.

To make this work, it may be sensible to configure the software to associate any kind of text or audio file with video. That way, users could take a single videoclip and associate:

  1. main audio in n languages
  2. captions in up to n languages (not always a 1:1 relationship)
  3. descriptions in up to n languages
  4. open-captioned or open-described variants in up to n languages

For images, the same image may have different-language alt texts (already handled by the software, shurely?!) and long descriptions.

The browsing experience

Once those accessibility alternatives are associated by the software, the actual pages that the software serves should make them readily available.


The embed element is the only sure-fire way to add multimedia to a page, but it isn’t valid HTML. It can only be valid XHTML with a custom-hacked DOCTYPE, which the software should automatically use. If someone wants to be brave and use the object element, this too should be supported. Or the software could suggest another standards-compliant method.

External vs. embedded players

The software may behave differently for embedded video vs. video files that call up a separate player. The details have yet to be considered.