Closed captions are intended for deaf and hard-of-hearing viewers. They are a transcript of a program’s spoken words and non-speech information (that is, important sound effects, like a ringing phone, or manner of speech, like whispering). Since captions are in the same language as the program, they are not subtitles, which translate dialogue into a different written language.
Captions are closed in that they must be turned on or off by the viewer. (The other case – open captions that everyone has no choice but to watch – is exceedingly rare.)
In analogue television, captions are transmitted on both fields of Line 21 of the vertical blanking interval. In MPEG streams, which lack a VBI, Line 21 or the entire VBI is encoded in a private use area and regenerated upon playback. Each frame can carry at most two characters, though caption speed is lower on average because control codes must also be transmitted.
Captions are decoded by the viewer’s own equipment – nearly always a television set, most of which come with decoders built in as a result of a 1993 U.S. law. (There is no decoder requirement in Canada, but nearly all the same sets are sold here.) TVs with screens less than 13 diagonal inches don’t have to include caption decoders, but many do anyway, as do some VCRs. It’s still possible to buy set-top caption decoders, though they are now a niche item.
The captioning specification, known as CEA 608, provides for four streams of captions (CC1 through CC4), four streams of full- or half-screen text (Text1 through Text4), and additional metadata (XDS). On compliant TV sets, you the viewer may select from among those options, though most sets simply default to CC1 (and some provide no other choices). CC1–CC4 and Text1–Text4 are known as channels. Service providers (that is, captioners) can decide which, if any, of those channels to provide, but any and all channels can be present whenever Line 21 is present; you get all those services in one bundle.
Captioning for HDTV is, of course, different. HDTV captions are described by the CEA 708 specification. The same U.S. law that requires Line 21 decoders also requires that HDTV caption decoding be included in TV sets and also in tuners, set-top boxes, and some computer display cards.
708-format captions can be presented in any of eight fonts, with many more colours available. The character set is expanded, and numerous channels of captions can be transmitted, always at hugely increased speed compared to 608 captions.
At present, little or no native HDTV captioning is available; nearly all that’s presented to viewers is analogue 608 captioning translated to 708 format. (Even a program natively transmitted in HDTV may only have 608-format captions; the network or station must translate them to 708, which your television or set-top box then displays.) While Line 21 is a primitive and robust transmission technology (even a snowy, ghosted analogue TV picture can carry near-perfect captions), the task of ensuring that 708 captions actually arrive intact at a viewer’s TV set requires unusually extensive technical setup that has to be maintained without interruption through the entire broadcast chain. As with HDTV itself, the system tends to perfection or absence; either your captions are perfect or they aren’t there at all.
As an accessibility provision for viewers with certain disabilities, captions are often required by CRTC order or as the result of settlements in human-rights complaints. Moreover, accessibility for people with disabilities is required by the Charter, the Canadian Human Rights Act, and the Broadcasting Act.
Accordingly, pass-through of captioning is not optional. Whenever a television picture is transmitted, any captions that are present in that signal must also be transmitted. An iTV application must take care to preserve captions even in extraordinary viewing conditions.
We’ve identified four use cases for captioning in iTV and have developed candidate procedures to handle each one. In all cases, audio is not altered.
In this scenario, the main television image is shown as a picture-in-picture of no more than one-quarter size. The act of squeezing the picture removes Line 21 data. (The resulting full-screen image still has a Line 21, but the original captions aren’t there anymore.) The viewer’s television set receives no captions; our application has to do the decoding.
We have no access to the settings the viewer uses in his or her television set; we have no way of knowing in advance whether or not the viewer usually has captions turned on. In our iTV application, we have to give the viewer an option separate from his or her TV settings. We have a few choices for enabling the display of captions in this mode.
Whichever option we select must be localized into French and English (and any other languages the application supports).
We have two options for where to display the captions.
Superimposed display is the obvious choice, but offscreen display is probably better. Since no more than four lines of caption text can appear at once, and since nearly all captions are displayed in individual chunks (either single pop-on captions or two-, three-, or four-line rollups), it is not difficult to imagine a black rectangle attached to the bottom of the squeezed video image that is large enough to hold at least four lines of caption text. Captions can simply be displayed there.
Offscreen display isn’t hypothetical. It’s already in use in the Rear Window® captioning system in first-run theatres; some software DVD players (like DVD Player under Mac OS X) and TV tuners (many ATI video cards) allow offscreen display of 608 and 708 captions.
The advantage of offscreen display for iTV captions it that the captions can be much larger and easier to read. The minor disadvantage is that they will be unfamiliar to most viewers for the first few minutes.
Captions are always the same proportional size on all televisions. Characters are confined within a box 26 lines tall by 16 dots wide. Not all dots are used for character rendering; the perimeter is used for spacing, for example.
A one-quarter-size squeezeback image results in captions that are six or seven lines tall (depending on your choice of rounding when calculating 26 ÷ 4). Existing fonts will not be legible at that size; our application will require new bitmaps (or, less likely, outline fonts) that are custom-made by a qualified type designer and thoroughly user-tested.
The good news is that, as ever, captions are the last item to be displayed. Even if the squeezed video image is grainy, pixelated, or jumpy, our caption text will always be a pristine bitmap overlay. Caption text may be more legible than Chyrons on the original telecast.
We have much wider latitude in caption sizing in this scenario. We may wish to sacrifice character width for character height, resulting in a condensed typeface (for which there are many precedents). In this scenario, too, existing fonts will not be legible enough, and we will have to commission and test our own custom fonts.
In the overlay scenario, the iTV application displays text or graphics that are much smaller than the full screen – usually a prompt or status display that appears in the bottom quarter of the image, for example.
This use case is identical to the existing scenario in broadcasting in which an emergency weather warning crawls across the screen. Competent broadcasters use a data bridge (a piece of rack-mountable hardware, or a software utility) that rewrites the preamble address code of incoming captions, repositioning them onscreen. (Usually captions move up three lines.)
In the overlay case, our iTV application does precisely the same thing, displacing captions so they will not cover up our prompts. It will not suffice to assume that prompts will only ever occur at screen bottom; the application has to be smart enough to move captions up for bottom-positioned prompts and down for top-positioned prompts.
In this use case, the iTV application is running, but has no visible manifestation. Captions are passed through unaffected.
Here, the application takes over the entire screen, with no video displayed from the underlying program. Since there is no video to caption, we block captions from reaching the viewer’s television. (The converse – letting captions through – is already in use by competing platforms and is known to obscure the onscreen application.)
Mosaic display (with multiple video feeds shown simultaneously as pictures-in-picture) is a more complex case that may not be handled in Version 1.0 of our captioning support. Initial finding suggest that only the network’s main feed will carry captions, and only that one should be decoded and displayed if there is ever a choice.
You were here: joeclark.org → Captioning and media access →
Resources → Accessibility of visual menu systems and interfaces and video on demand (VOD) → Captioning and iTV
Published: April 2005 ¶ Posted: 2006.12.13