Joe Clark: Accessibility | Design | Writing

Captioning and iTV

Captioning and iTV


Closed captions are intended for deaf and hard-of-hearing viewers. They are a transcript of a program’s spoken words and non-speech information (that is, important sound effects, like a ringing phone, or manner of speech, like whispering). Since captions are in the same language as the program, they are not subtitles, which translate dialogue into a different written language.

Captions are closed in that they must be turned on or off by the viewer. (The other case – open captions that everyone has no choice but to watch – is exceedingly rare.)


In analogue television, captions are transmitted on both fields of Line 21 of the vertical blanking interval. In MPEG streams, which lack a VBI, Line 21 or the entire VBI is encoded in a private use area and regenerated upon playback. Each frame can carry at most two characters, though caption speed is lower on average because control codes must also be transmitted.

Captions are decoded by the viewer’s own equipment – nearly always a television set, most of which come with decoders built in as a result of a 1993 U.S. law. (There is no decoder requirement in Canada, but nearly all the same sets are sold here.) TVs with screens less than 13 diagonal inches don’t have to include caption decoders, but many do anyway, as do some VCRs. It’s still possible to buy set-top caption decoders, though they are now a niche item.

The captioning specification, known as CEA 608, provides for four streams of captions (CC1 through CC4), four streams of full- or half-screen text (Text1 through Text4), and additional metadata (XDS). On compliant TV sets, you the viewer may select from among those options, though most sets simply default to CC1 (and some provide no other choices). CC1–CC4 and Text1–Text4 are known as channels. Service providers (that is, captioners) can decide which, if any, of those channels to provide, but any and all channels can be present whenever Line 21 is present; you get all those services in one bundle.

Additional facts:

  1. Captions in the main program language are always transmitted on CC1 (on Line 21, field 1).
  2. Captions in a second program language are generally transmitted on CC3 (on Line 21, field 2). Some second-language programming appears on CC2 (Line 21, field 1), though that channel is increasingly reserved for a second version of the program’s main language (e.g., near-verbatim captions on CC1, edited easy-reader captions for children on CC2).
  3. Text channels are almost entirely unused. TV Crossover Links, to the extent that they still exist, are transmitted on Text2 (Line 21, field 1).
  4. XDS data is still transmitted. That data can include “time of day, station call letters, network, [and] name of the current program,” according to one expert, Gary D. Robson. Decoding of XDS is unusual in consumer equipment; it’s useful mostly to higher-end recording decks and inside broadcast control rooms.


  1. Captions are limited to 15 lines of up to 32 characters each. Caption location is governed by a preamble address code transmitted with each caption. (As we will see with iTV applications, that PAC can be rewritten on the fly to reposition captions.)
  2. Only four lines of text (not necessarily contiguous) can be displayed at a time.
  3. Monospaced fonts are used.
  4. The character set is unique and is not encompassed by any single Unicode plane.
  5. The specification calls for a black bounding box around characters (and for an extra blank character leading and trailing each line). White foreground colour is the default, but six other colours are possible. Manufacturers may permit the viewer to change the background colour on a TV set, but the captioner cannot do so by spec; only foreground colours are under captioner control.
  6. Roman, italic, underline, and blink character forms are available. (Blinking captions are thankfully rare.) Turning colours, italics, or underlining, and sometimes some combinations of those, on or off requires transmitting a control code that results in a visible space character in the captions.
  7. There are three presentation styles:
    1. Pop-on captions appear all at once as single blocks. They disappear completely or are replaced by other captions.
    2. Roll-up or scroll-up captions (the terms are not always hyphenated) scroll up the screen a line at a time in units of two, three, or four lines.
    3. Paint-on captions are pop-on captions that assemble themselves letter by letter as though they were the output of a daisywheel printer. Also like pop-on captions, they disappear completely or are replaced by other captions. Paint-on captions are believed to provide a slightly longer reading time and are used sparingly (e.g., on the first caption after a commercial break if it is followed quickly by other captions).
    A single program may use multiple presentation styles, e.g., scrollup captions for dialogue and pop-on captions for song lyrics.

Digital captions

Captioning for HDTV is, of course, different. HDTV captions are described by the CEA 708 specification. The same U.S. law that requires Line 21 decoders also requires that HDTV caption decoding be included in TV sets and also in tuners, set-top boxes, and some computer display cards.

708-format captions can be presented in any of eight fonts, with many more colours available. The character set is expanded, and numerous channels of captions can be transmitted, always at hugely increased speed compared to 608 captions.

At present, little or no native HDTV captioning is available; nearly all that’s presented to viewers is analogue 608 captioning translated to 708 format. (Even a program natively transmitted in HDTV may only have 608-format captions; the network or station must translate them to 708, which your television or set-top box then displays.) While Line 21 is a primitive and robust transmission technology (even a snowy, ghosted analogue TV picture can carry near-perfect captions), the task of ensuring that 708 captions actually arrive intact at a viewer’s TV set requires unusually extensive technical setup that has to be maintained without interruption through the entire broadcast chain. As with HDTV itself, the system tends to perfection or absence; either your captions are perfect or they aren’t there at all.

Caption preservation

As an accessibility provision for viewers with certain disabilities, captions are often required by CRTC order or as the result of settlements in human-rights complaints. Moreover, accessibility for people with disabilities is required by the Charter, the Canadian Human Rights Act, and the Broadcasting Act.

Accordingly, pass-through of captioning is not optional. Whenever a television picture is transmitted, any captions that are present in that signal must also be transmitted. An iTV application must take care to preserve captions even in extraordinary viewing conditions.

iTV caption behaviours

We’ve identified four use cases for captioning in iTV and have developed candidate procedures to handle each one. In all cases, audio is not altered.


In this scenario, the main television image is shown as a picture-in-picture of no more than one-quarter size. The act of squeezing the picture removes Line 21 data. (The resulting full-screen image still has a Line 21, but the original captions aren’t there anymore.) The viewer’s television set receives no captions; our application has to do the decoding.

Preference setting

We have no access to the settings the viewer uses in his or her television set; we have no way of knowing in advance whether or not the viewer usually has captions turned on. In our iTV application, we have to give the viewer an option separate from his or her TV settings. We have a few choices for enabling the display of captions in this mode.

Global preference setting
The iTV application can use a preference setting that controls caption display. That option may be difficult or unworkable given that there is no preference screen in the current application.
Onscreen control
Whenever we switch to squeezeback mode, we present an onscreen control (a button of some kind) to turn captions on or off. For accessibility to mobility-impaired people, the control must be very easy to reach and use via remote control. The button need not necessarily remain onscreen for the entire time the iTV application takes over the display image. (We can do user testing to find the right duration.)
Default to open captions
For Version 1.0 of our application, it may be simpler to show every viewer captions in every case. Each time you enter squeezeback mode, you watch captions.

Whichever option we select must be localized into French and English (and any other languages the application supports).

Display position

We have two options for where to display the captions.

As in the typical use case, captions appear on top of and cover up part of the television picture.
Offscreen or offwindow
Captions are displaced into a windoid or region below the main video.

Superimposed display is the obvious choice, but offscreen display is probably better. Since no more than four lines of caption text can appear at once, and since nearly all captions are displayed in individual chunks (either single pop-on captions or two-, three-, or four-line rollups), it is not difficult to imagine a black rectangle attached to the bottom of the squeezed video image that is large enough to hold at least four lines of caption text. Captions can simply be displayed there.

Offscreen display isn’t hypothetical. It’s already in use in the Rear Window® captioning system in first-run theatres; some software DVD players (like DVD Player under Mac OS X) and TV tuners (many ATI video cards) allow offscreen display of 608 and 708 captions.

The advantage of offscreen display for iTV captions it that the captions can be much larger and easier to read. The minor disadvantage is that they will be unfamiliar to most viewers for the first few minutes.


Captions are always the same proportional size on all televisions. Characters are confined within a box 26 lines tall by 16 dots wide. Not all dots are used for character rendering; the perimeter is used for spacing, for example.

Superimposed positioning

A one-quarter-size squeezeback image results in captions that are six or seven lines tall (depending on your choice of rounding when calculating 26 ÷ 4). Existing fonts will not be legible at that size; our application will require new bitmaps (or, less likely, outline fonts) that are custom-made by a qualified type designer and thoroughly user-tested.

The good news is that, as ever, captions are the last item to be displayed. Even if the squeezed video image is grainy, pixelated, or jumpy, our caption text will always be a pristine bitmap overlay. Caption text may be more legible than Chyrons on the original telecast.

Offscreen positioning

We have much wider latitude in caption sizing in this scenario. We may wish to sacrifice character width for character height, resulting in a condensed typeface (for which there are many precedents). In this scenario, too, existing fonts will not be legible enough, and we will have to commission and test our own custom fonts.


  1. Software caption decoding
  2. Custom fonts
  3. Possibly increased real estate for offscreen positioning
  4. Preference setting or onscreen caption selection


In the overlay scenario, the iTV application displays text or graphics that are much smaller than the full screen – usually a prompt or status display that appears in the bottom quarter of the image, for example.

This use case is identical to the existing scenario in broadcasting in which an emergency weather warning crawls across the screen. Competent broadcasters use a data bridge (a piece of rack-mountable hardware, or a software utility) that rewrites the preamble address code of incoming captions, repositioning them onscreen. (Usually captions move up three lines.)

In the overlay case, our iTV application does precisely the same thing, displacing captions so they will not cover up our prompts. It will not suffice to assume that prompts will only ever occur at screen bottom; the application has to be smart enough to move captions up for bottom-positioned prompts and down for top-positioned prompts.


Full-screen video

In this use case, the iTV application is running, but has no visible manifestation. Captions are passed through unaffected.


Full-screen interactive

Here, the application takes over the entire screen, with no video displayed from the underlying program. Since there is no video to caption, we block captions from reaching the viewer’s television. (The converse – letting captions through – is already in use by competing platforms and is known to obscure the onscreen application.)


In all cases

Mosaic display

Mosaic display (with multiple video feeds shown simultaneously as pictures-in-picture) is a more complex case that may not be handled in Version 1.0 of our captioning support. Initial finding suggest that only the network’s main feed will carry captions, and only that one should be decoded and displayed if there is ever a choice.

You were here: joeclark.orgCaptioning and media access
ResourcesAccessibility of visual menu systems and interfaces and video on demand (VOD) → Captioning and iTV

Published: April 2005 ¶ Posted: 2006.12.13

Homepage: Joe Clark Homepage: Joe Clark Media access (captioning, Web accessibility, etc.) Graphic and industrial design Journalism, articles, book