Response to CMP Captioning Key

The Captioned Media Program (CMP) is an arm of the National Association of the Deaf that uses U.S. government monies, and a few other minor funding sources, to caption audiovisual media for home-video, educational, and online use. CMP is, for some reason, located in Spartanburg, South Carolina, though it uses many accredited outside captioners. In fact, CMP’s process of accreditation is sometimes viewed as a model for other efforts in captioning standardization.

CMP has published its “style guide,” as such documents are somewhat derisively called, for a number of years. As far back as 1989, CMP refused to mail me a copy of the Captioning Key because I was outside the United States. Fortunately, CMP has come to its senses somewhat and now publishes the Captioning Key online – in an untagged, overlarge, ill-designed PDF with unnecessary prohibitions against even copying and pasting text from the document.

The Captioning Key is, additionally, often cited as a reasonable and research-based manual on how to caption. In fact, it has a wide range of problems and needs urgent revision.

Authorship

This review concerns the 2006 Captioning Key: Guidelines and Preferred Techniques by Jason Stark. I got the author’s name from the PDF metadata; it isn’t in the actual document.

Controversies surrounding CMP

The Captioned Media Program is said to have received a U.S. government grant for $3,235,514 in 2004. It’s part of a $17.25 million “cooperative agreement” from 2001–2007. Yet it captions barely anything – about 50 items a month, most of them short films, based on its new-releases page. A reasonably large captioning house puts out that many pieces in a couple of weeks at most, though many of those items, like commercials and music videos, will be even shorter than a typical CMP piece.
The Stark family seems to run the place.
Its Web site is poor even beyond the level of Web incompetence we associate with deaf groups; among many other things, it uses tables for layout, relies unnecessarily on JavaScript, and violates several accessibility guidelines despite having valid HTML on some pages. (A redesign by an actually competent standards-compliant shop is in order. All the obvious choices seem viable, including Happy Cog, Dan Cederholm, ClearLeft, Airbag Industries, inter alia.)

Comments and critiques

Misdefining captioning

Captioning is the process of converting the audio content of a television broadcast, Webcast, film, video, CD-ROM, DVD, live event, and other productions into text which is displayed on a screen or monitor.

“Converting”? What we’re doing is transcribing, not “converting.” (Additionally, it’s “or other productions,” not “and.” As written, we aren’t truly doing captioning unless we’re doing it to seven or more genres all at once.)

“Captions... display words as the text equivalent of spoken dialogue or narration”: No, they do not. A text equivalent is an alt text and applies to images and other “non-text content.” Captions display a transcription of speech. (Narration is “spoken dialogue.”)

Editing captions

“It is important that the captions be... equivalent and equal in content to that of the audio”: Apart from being as grammatically questionable as many other sentences in the Key, this declaration essentially precludes editing and mandates verbatim captioning. Yet, in the very next paragraph, we read:

The CMP captioning philosophy is that all media should incorporate as much of the original language as possible; words or phrases [that] may be unfamiliar to the audience should not be replaced with simple synonyms. Extreme rewriting of narration for captions develops problems of “watered-down” language and deleted concepts. Do not censor language unless previously done in the audio. Editing should only be done if required to meet the specified presentation rate.

Again, the writing is poor here (“rewriting of narration for captions develops problems”) and CMP is simply not being consistent. Either we edit or we don’t; in the former case, it becomes a question of when and how, and this paragraph gives blanket permission to caption for reading rate.

(By the way, what does “unless previously done in the audio” mean? If one uttered “damn” was overdubbed with “darn” but a second “damn” was not, which word do we caption? If we’re trying to say “Do not attempt to alter the caption transcription by replacing swear words with other words,” then say that.)

Extent of editing

Later, we are told that “[m]any educational, special-interest, and theatrical videos are not scripted to allow the time necessary for the process of reading captions and often have extremely rapid narration/dialogue. Therefore some editing may be necessary.” Define “some.”

The next point states that “lower- to middle-level educational videos should be captioned at... 120–130 wpm.” Upper-level videos “may be captioned slightly above” that. “No caption should remain onscreen less than two seconds” in any case listed in the manual.

“Adult special-interest videos” – this is apparently CMP jargon; it isn’t defined – “require a presentation rate of 150–160 wpm.” But so do children’s movies, paradoxically. “Adult movies should be captioned at a near-verbatim rate, but no caption... should exceed 235 wpm.”

Use of scripts

“[C]aptioning agencies are expected to research spelling, capitalization, and punctuation. Company scripts are not always reliable.” What “company”?

CMP has onerous requirements to report the research carried out to verify spellings. Not only do you have to do the research and spell the word right, you have to reproduce the word on a list and cite your source. While I do not disagree with the approach, it is a lot of work.

Discussion of closed captions

The Key resorts to lazy ready-made phraseology when it states that “Closed Captions (CC)... are invisible without a special decoder.” Many kinds of captioning are closed, and nowhere is it stated that decoders ceased to be “special” in 1993, when the Television Decoder Circuitry Act came into effect.
The claimed “Closed-captioning sample” is an atrocity. It would be a 30-second job for someone to pull out a digicam and snap a picture of a real TV screen with real Line 21 captions. Instead, CMP falsifies reality and uses a still picture overlaid with Arial (!) type in a huge rectangle.

It always amazes me when people try to fake the appearance of closed captions. There always seems to be an attempt to make them look better than they actually do. CMP has a fair-use right under U.S. copyright law to reproduce one frame of a television production for this purpose.
The Key recites blandishments about HDTV captioning:
1. “As part of this transition [to digital], closed captions will have to support the digital closed-caption format, EIA-708, and will be much improved.” Actually, there is no native 708 captioning at present, and all captioning on HDTV is translated from analogue captioning.
2. “Examples of improvements include the ability... to replace the black box with a translucent (see-through) background.” I thought we weren’t allowed to substitute easy words for hard ones. Translucent means partially see-through. Some HDTV sets will indeed permit the complete removal of the caption bounding box.
But we’re not done yet. A section entitled “Subtitles for the Deaf and Hard of Hearing (SDH)” reiterates various half-truths. (Incidentally, Lee Jordan of Captions, Inc. claims to have coined that term.)

These are similar to subtitles used for foreign films

I don’t see how they are, unless the comparison is one of typography, which I’ll cover in the next section.

but also include information such as sound effects, speaker identification, and other essential “nonspeech” features. The CMP captions are this type. Sometimes they are called “open captions,” though this term most often refers to “closed captions” made permanently visible by duplicating copies of a closed-captioned video while [a] decoder is engaged.

Captions of many kinds can be “open” or “closed.” When discussing analogue TV, “closed captions” can only be Line 21. But other media permit closed captioning, and you can use open captions anywhere. Open captions are rare, but I can attest from experience that they are seldom, if ever, decoded closed captions. (The last time I saw such a thing was when I set up a project for CBC reusing Line 21 captions online. Those were decoded TV closed captions.)
The Key then goes on to explain that subtitles, “as well as some SDH captions, are displayed using the mediums menu option [sic] on DVDs and the Internet.” I see.
The illustration for SDH uses Arial Black in mixed case and is almost as fake as the “closed-captioning sample.” The whole section is bookended by claimed “definitions”:
1. Types vary according to how the captions appear, how they are accessed, and what information is provided.
2. Methods vary according to when the captions are created or displayed.
Aren’t “access[ing]” and “display” the same thing? (Do you not “access” captions to “display” them?) Are closed captions a type or a method?

Typography

Document typography and design

First, the typography of the document itself is dreadful even by the standards of Windows users, who never met a font they couldn’t misuse. (The Key is actually a Microsoft Word document in disguise.) Pretty much the whole document is typeset in Verdana, with full justification (always “classier”), too large a point size, and too little leading.

Typography of captions

More pressing are the Key’s requirements for typography of captions. It’s a complete mess and needs to be rewritten from scratch right away.

The CMP requires pop-on captions in upper- and lower-case letters with descenders. Characters must be Helvetica Medium or a font similar to it. These captions have good resolution and fit the requested 32 characters to a line.
1. Full credit for requiring mixed case. Anything else is an abomination.
2. The requirement for descenders shows that CMP has attempted to understand typography, but we have concern for results, not intentions. It is all but impossible to find a typeface without descenders (e.g., Hobo). If they’re trying to say that you can’t use Line 21 captions burned into the video, they aren’t succeeding.
3. It borders on ridiculous to require a proportionally-spaced font but also require a fixed number of characters per line. (Later: “The weight must support a 32-character line.” You already told us to use the medium weight.)
  - The number 32 comes from Line 21 captioning, which uses monospaced fonts.
  - Anyway, which 32 characters – i or W? Does this mean I can crank up the point size until I reach a size that can accommodate 32 is, then go right ahead and use that size?
In discussing roll-up captions, we are told that “[d]ouble chevrons are often used to indicate a change in speaker.”
- The character sequence >> is not a “double chevron.” A chevron is two tailless arrowheads nested together.
- Unicode lists several characters with that description (though only one, unrelated to this discussion, uses CHEVRON in its name). Those characters include guillemets « and », much-less-than ≪, and left double angle bracket 《.
- Each of those is a single character. (Yes, a single character can have disjointed parts. i, colon, semicolon, exclamation point, and question mark are other examples.) >> are two characters and are merely “double greater-than.”

Use of Helvetica

CMP reveals its poor understanding of onscreen reading in its seemingly absolute requirement to use not merely Helvetica but Helvetica Medium.

The grotesk family of sansserif fonts is unsuited to captioning and subtitling. Of course second-rate subtitlers use it, or its bastardized cousin Arial, and will swear up and down that it’s just fine. It isn’t, and I’ve documented the reasons elsewhere. To sum up:
1. Confusable character shapes, including all the classic combinations (Il1|, S568, rn m, cl d), genuinely are confusable.
2. Reversible character shapes (bdqpg) genuinely are reversible, generating confusion for dyslexics and others with reading-related learning disabilities.
3. Default spacing is too tight, particularly for captions, which glow, hence blur into each other.
4. Geometric character shapes (OGQ are near-perfect circles by design) translate poorly into low-resolution media like TV.
It is not clear why Helvetica Medium is specified. I assume it went something like this: At one point, some captioning contractor used Helvetica Light, or a lousy knockoff of the book weight of Helvetica, in a captioning job; one person complained; and then CMP overextended its near-nonexistent typographic knowledge and issued a new rule.

Sometimes bolder type works better onscreen, sometimes it doesn’t. Readability of onscreen type is heavily influenced by spacing. Helvetica Medium is not guaranteed to be more legible than the book weight, and probably will be less legible because the default spacing will be the same and counters will be filled in. That certainly seems to be happening in one of CMP’s illustrations:

It’s the wrong solution to a problem that CMP cannot even articulate properly.
In any event, CMP blows its own requirement out of the water by permitting “a font similar to” Helvetica. Why not just be honest and give up any pretext of supervising captioners’ font usage? CMP must know perfectly well that allowing “a font similar to” Helvetica is a license for two-bit captioning shops to use Arial, a bastardized Helvetica knockoff seen as an atrocity by type experts. (The illustration above uses Arial Black.) While that exemption also permits, for example, Univers or Akzidenz Grotesk, in practice they won’t be used.

Additionally, explain what a font similar to Helvetica might be. Optima? Syntax? Balance? Flyer?
There are further redundancies: “Characters must be sansserif, have a drop or rim shadow, and be proportionally spaced.”
1. Two of those three you get for free with Helvetica.
2. A drop shadow is difficult for authoring tools to create and is not necessarily the best way to ensure foreground/background legibility.
3. I suppose a “rim shadow” is an outline.
4. In that case, the spec seems to be calling for Helvetica Outline; the only medium weight I can find is in my old Letraset catalogue. (Few text fonts are available in shadow or contour variants.)
The manual then goes on to say “If possible, translucent box is necessary.” Do you want drop (or “rim”) shadows, or a character mask, or both? Around the minimum outline of characters or around the full rectangle of all lines?
And yet further redundancies: “The font must include upper- and lower-case letters with descenders that drop below the baseline.” Descenders always do that. Few typefaces are all-caps.
The spec goes right on to blow a discussion of leading: “Pick a font and spacing technique that does not allow overlap with other characters, ascenders, or descenders.” Well, of course you aren’t going to do that. To avoid it, even the 1px or 2px extra linespacing used by default in typographically substandard captioning software will suffice.
“If possible, use accent marks, umlauts, and other indicators.”
1. When is it not possible?
2. When is an umlaut not an “accent mark”? (It’s an accent mark even when it’s a dieresis, the true generic term.)
3. What are those “other indicators”? (A list of ten examples would suffice.)

Alignment and positioning

CMP states, out of nowhere, that “[c]aptions that have two or more lines must be left-aligned.” Why? In a nature documentary with only a single narrator, why not use bottom-centre positioning? A case can be made for flush-left positioning, but it has not been made.
The rule also permits centred single-line captions (which “should be centred on line 8,” the bottommost line). But later we are instructed that, “[i]f speaker is offscreen, place captions to the far right or left, as close as possible onscreen to the offscreen speaker” (sic). So you can caption a show with centred single-line captions alternating with flush-left and “far-right” two-line captions and still pass muster.
The handbook almost urges captioners to edit captions down to two lines even if the third line could merely be a speaker ID. (“When a speaker cannot be identified by placement and his/her name is known, the speaker’s name should be in parentheses.” Well, there goes one line of caption text.) Later explanations contradict that requirement, e.g.: “If essential sound effects are used simultaneously with captioned dialogue, they must be placed at the top of the screen.” Suddenly the viewer must read top and bottom of screen simultaneously, and how are we going to manage this configuration while also hewing to two lines of captioning?
The Key continues to contradict itself: “For media with one offscreen narrator and no pre-existing graphics, captions should be left-aligned at centre screen on lines 7 and 8.” How does one left-align a caption at centre screen? Perhaps they’re trying to tell us to take the width of the longer line of the finished caption, centre that line, and make the other line flush left with that line’s left margin. If so, what this produces is a series of flush-left caption blocks at unpredictable and jumpy horizontal locations.
CMP betrays its ignorance of psychology of reading by showing an example of acceptable and preferred captions:

Acceptable

I wish to seek your approval.

Preferred

I wish to seek your approval.

Six easy words like that should really be one line. Forcing them into two lines increase the number of fixations required and makes reading slower (in general, 40 milliseconds vs. 30 milliseconds, according to Taylor and Taylor’s Psychology of Reading, p. 123).

CHUM has this same nonsense in its own style guide. I’ve seen them turn five-word captions into two lines.
CMP replicates Captions, Inc.’s nutty and risible requirement to indent a line by two spaces if its words are the same as the preceding line’s. “However, if two caption lines begin with the same word – but are not identical sentences – the second line should not be indented.” Where is the evidence of psychology of reading to back up this nonsense? And why don’t we see this effect anywhere else in written English?

Italics

You are forbidden from mixing roman (misspelled as “Roman”) and italic type in the same caption “except in cases of word emphasis.” You have to italicize a word “the first time [it] is being defined.” You have to italicize a speaker ID if the dialogue is italicized, for some reason. (The speaker ID is not an utterance and is not coming from an onscreen or offscreen source.)

But the very best part? The very best part is this: “Excessive slanting of italics should be avoided.” If you use a real italic (or, in the case of Helvetica Medium “or a font similar to it,” a real oblique), the problem goes away.

Non-speech information

For sound effects, we are told to “[a]void use of discriminatory terms.” Whatever might those be?
Do we need two guidelines for the following?
1. Sound effects necessary to the understanding and/or enjoyment of the video should be captioned.
2. Caption background sound effects only when they’re essential to the plot.
Do you know what a “concrete” vs. an “abstract” term is? Which category would you assign to the words in the pair running/galloping or bird/robin? Yet the advice is “When possible, use concrete rather than abstract terms to describe sounds.” Clear as mud. (Or concrete.)
CMP commits the common mistake of captioners who are halfway there: “When people are seen talking but there is no audio, caption as [no audio].” You may do that only if there is absolute silence. If there’s any other sound, then there really is audio and [no audio] lies to the audience. Alternatives include [no voice], [no audible dialogue], [mouthing words], [mouths “Call me”].
And this advice is another example of being halfway there: “When a person is already identified and is not onscreen but has started speaking again” – are you still with us? – “caption as [voiceover].” What’s wrong with being much clearer, viz. [Vincent narrating] or [girl thinking]?
CMP also seems to be in love with onomatopoeia. In an illustration, a wolf is captioned thus: grrrrrrrrrrr (yes, 11 ns). They’re trying to use research from a Gallaudet study to back this up, but it really only works for kids. (An interesting example is NCI’s captioning of Sesame Street, in which sound effects are captioned by the standard onomatopoeia used in English, like RING RING or MEOW.)

Then of course there is the issue of disagreement in writing onomatopoeia. If you see [mortars exploding], do you automatically associate it with the onomatopoeia kerboom! (including the r)? Well, that’s the example they give.
Ongoing sound effects are mishandled. The following advice is correct but insufficient:

If description is used for offscreen sound effects, it is not necessary to repeat the source of the sound if it is making the same sound a few captions later. Examples:

First caption

[pig squealing]

Later caption

[squealing]

In fact, what the second caption should say is [squealing continues]. If the squealing keeps on going for a long time, perhaps interspersed with other sounds or dialogue, it does indeed become necessary to remind us of the source: [pig continues squealing]. The example, in any event, conjures unpleasant connotations and should be replaced.
The Key gets its grammar wrong. “When describing a sustained sound, use the present participle form of the verb.” It’s actually the progressive aspect (dog barking). “When describing an abrupt sound, use the third-person verb form.” That’s the indicative aspect (dog barks).
Unsurprisingly, the manual counsels poor typography, suggesting speaker IDs like female #1 (omitted parentheses sic). Don’t use a number sign, and “female” may be more appropriate for captioning dialogue from aliens or animals than people.

Copy-editing

Authorities

The manual is weak at best about copy-editing and authorities to use for it. It recommends an online dictionary, i.e., a free Web-based dictionary rather than something actually authoritative like the Oxford American or an online Oxford English Dictionary subscription. The Key contradicts itself and states that “[c]aptioning agencies are expected to... [u]se a reputed dictionary.”

“Only as a last resort are proper nouns researched on the Internet,” which may be a bit of a limitation when captioning videos about the Internet.

The “English shore be hard” defence

Then we have these gems, which simply give up the ghost:

Written English rules on capitalization are difficult. First of all, there are a seemingly endless number of rules to master. Second, the authorities themselves don’t agree on the rules. Try to remember the basic purposes of capitalization: To load special significance into words and to give importance, emphasis, and distinction to words [not “special significance”?]. [...] It is not easy to determine the appropriate punctuation for written language. Spoken language sometimes appears improperly constructed when put into written form and can be even more difficult to punctuate.

Essentially, this advice boils down to “Jeez, is English hard or what?” We are paying you the big bucks to be competent at written English. The advice sounds like an apologia to a remedial reading class.

Punctuation

Transcription of [certain] speech constructions sometimes requires use of punctuation that is unique to the captioning process.

I know of very few examples (quotations extending beyond one caption, >> in live captioning, rendering of numbers, dashes in Line 21). But no examples are given.
CMP permits comic-book-style extended exclamation points. Actual example: aaaauuuggghhh!!! (Was that found in their online dictionary? And are they unaware that we tend to repeat letters at most twice, resulting in three, not four, letters?)
The concept of en and em dashes completely eludes these Windows users, as it so often does.
- Whether you use space–en dash–space or nospace–em dash–nospace is a matter of house style.
- Spaces with em dash do not work because of the linebreaks they trigger (at least in a system like captioning software that cannot break on either side of an em dash).
- The particular reading mechanism of captioning militates against nospace–em dash–nospace, which blurs into adjoining words that you have only a few seconds to properly read.
Nonetheless, under no circumstances save for discussions of such dashes themselves does one use hyphen-hyphen, as the CMP manual counsels and illustrates by example. (It says “double hyphens or a single long dash,” but “single long dash” is not defined, probably because they don’t know what it is and how it differs from other dashes. And only double hyphens are shown, without spaces on either side.)

Question for CMP: How many captioning jobs come through the door in which the double hyphens produce a linebreak between the hyphens? You may be further unaware that hyphens and nonbreaking hyphens are two different things, neither of which is an en or em dash.
The manual covers the issue of stuttering (“caption what it said” using hyphens), but CMP does not understand the problem well enough to differentiate between beginning- and middle-of-sentence examples. (Are the repeated letters capitalized at the beginning of a sentence or are they not?)
The rare example of fingerspelling is covered, which would only be captionable if an interpreter or the signer actually spoke the letters separately. The hugely more common example of spelling a word out loud is not covered. Nor are spelling bees, dictées, and quizzes, where spelling of words is the point.
The manual initially doesn’t tell us what to do with ellipses. Space afterward? (The example given has none. Later there’s a section forbidding the use of spaces on either side.)

It does very sensibly tell us not to “use ellipsis” – actually called suspension dots in this context – “to indicate that the sentence continues into the next caption.”
The Captioning Key actually permits neutral quotation marks! It also misapplies quotation marks to “titles of books, periodicals, plays, films, videos, short stories, and other titles of complete works,” all of which titles are italicized save for short stories. We are also incorrectly told to use quotation marks for “names of individual ships, trains, airplanes, and spacecrafts [sic].”
The following advice is incomprehensible:

Use quotation marks for onscreen readings from a poem, book, play, journal, or letter. However, use quotation marks and italics for offscreen readings or voiceovers. [...] Italics should be used to indicate: A voiceover reading of a poem, book, play, journal, letter, etc. (as this is also quoted material, quotation marks are also used).
The manual correctly instructs us on the use of quotation marks in passages quoted across captions (opening on all but the last, closing on none but the last).
The staffnote, ♪, is misnamed as “music icon.” It isn’t an icon, among other things. And we have the usual nonsense that we are to use two staffnotes at the end of the last line of a song. Why, exactly?

But we’re not done yet: “For background music, place a [staffnote] in the upper right corner of the screen.”
1. Why?
2. How does that indisputably indicate “background” music to a deaf person, let alone the type of such music?
3. What if someone is thinking at the same time? Captions have to be placed at top of screen in that case.
4. What if there are almost no moments in the production without background music? Does every period in which we would ordinarily place no captions onscreen have to be filled up with staffnotes at top right?

Speaker identification

In a rather bizarre practice I see replicated at CaptionMax and at a few mom-’n’-pops, speaker IDs are in all lower case “unless the character is being used as a proper name.” (It uses the infelicitous example of iguana/Iguana. Several examples from The Simpsons might be applicable here, or the several fictional characters named Horse or Cat.)

Caption breaks

Multi-caption utterances “should be broken at a logical point where speech normally pauses.” Speech does not normally pause; it is a continuous stream. (No, there are no pauses between words. Listen sometime.) But you can break a caption anywhere if any part of it might occupy 33 or more characters.

Multiple speakers

The Key completely blows it in giving advice for multiple speakers. This exceedingly important practice, one of the few that distinctly differentiates captions from subtitles, is glossed over in a single bullet point.

When people onscreen speak simultaneously, place the captions underneath the speakers. Do not use other speaker-identification techniques like hyphens.... If this is not possible due to length of caption or interference with onscreen graphics, caption each speaker at different time codes.
1. First of all, can you understand any of that as written?
2. If we have to caption one person’s speech, then clear the caption and present another person’s speech, how are we going to guarantee a reasonable reading time?
3. If we are nearly forbidden from using more than two lines of captioning, how are we to use “other speaker-identification techniques” like presenting two caption blocks at once?
4. The example given is absurd and shows the degree of care CMP exercises:
  1. “He has a cavity” comes before “Yes, we’ll need to fill that.” But in the illustration, they’re captioned backwards.
  2. A hyphen is not a dash.
  3. And the introduction to this illustration (“It’s confusing as to who is speaking”) is not necessarily true in the context of the video. This kind of absolute prohibition looks great to someone who doesn’t watch captioned video and works exclusively from Microsoft Word documents and fake screenshots, but it does not correspond to the demands of working captioners. You can, if necessary, use a form like this:
    
    — HYGIENIST: He has a cavity.
    — Yes, we’ll need to fill that.
Next we are forbidden to move captions to correspond to a speaker’s movements onscreen – “one placement... must be used. Confusion occurs when captions jump around the screen.” No, it doesn’t. Why not just require invariant bottom-centre positioning? If you can’t clearly see moving mouths or other proof of who’s talking, “confusion occurs” when a character moves but the caption doesn’t. What if the caption is now superimposed over somebody else? The speaker isn’t necessarily the only character moving.

This segment also fails to handle the case of a sequence of caption with onscreen and offscreen views of the same speaker. Do captions stay in the same relative place? Or do they attempt to magnetically follow the speaker around, moving to the edge of the screen closest to the character’s position? (NCI does it the former, or incorrect, way; WGBH tends to do it the latter, or correct, way. The manual slightly leans toward the magnetic method in its discussion of speaker IDs.)
“When a person is thinking, dreaming, or the like, list the description in brackets and place italicized captions above the head.” But the head is not the source of the voice. Helvetica Medium does not per se have an italic, and the instruction is not clear about whether or not the “description” (actually non-speech information) has to be italicized, too; in the accompanying photo, it is. (We are later told it must be italicized.)

“Consumer input” vs. research

Guidelines in this manual have evolved over the 47-year history of the CMP program [sic]. However, captioning research and technological developments continually dictate changes and improvements in the captioning process. The CMP staff, with a combined near-century of captioning experience, rely heavily on consumer input when incorporating these changes.

If we take the above at face value, then the critique you are now reading will be used to improve the next version of the manual. A half-century of experience means little if you’ve been making mistakes the whole time.

And “consumer input” is dangerous. I know for a fact that Captions, Inc. habitually typed a space before question marks and exclamation points (and inside inverted bangs and question marks in Spanish) because one person complained. This is not, however, how we actually write English or Spanish. If we relied on “consumer input,” we’d still be using all-upper-case captions and editing to 130 words a minute. Individual deaf viewers voicing complaints can be and are in the wrong sometimes.

“Consumer input” cannot be discounted. If somebody writes in to complain that a tape that was supposed to be captioned had no captions, you need to pay attention to that. But what should be emphasized in the Key is captioning research. The passage above almost, but not quite, promises to give “consumer input” a veto over research findings. This is no way to run a railroad, let alone a captioning operation. CMP needs to be clearer about what it is actually saying.

Miscellaneous

The document tries to give us a history lesson by stating: “In 1947, the first true ‘captioning’ occurred as captions were placed between film frames.” They’re called intertitles and the actual source is not cited. (What film, and who did it?)
A definition of real-time, or, as the manual calls them, on-line captions hyphenates that word (but not “onscreen”). It also gives a list of “live productions” that somehow manages to leave out news, which is what we invented real-time captioning for in the first place.
Pop-on captions are said to “appear onscreen all at once, stay there for a few seconds, and then are replaced by another caption.” Or a blank screen.
Paint-on captions are poorly defined: “Similar to roll-up captions [no, they aren’t], individual words are ‘painted-on’ from left to right, not ‘popped-on’ at once as an entire caption.” You really only see these in Line 21 captioning, but if you’re old enough, you’ll recall their use in open captioning The Scarlet Letter on PBS circa 1979. I admit that defining and explaining them is tricky, but here goes: Paint-on captions assemble characters one at a time from left to right and top to bottom, resulting in a caption block that stays in position like a pop-on caption until replaced. Used in cases of tight timing (e.g., the first caption in a commercial with a lot of opening dialogue) and believed to permit increased reading time.
Advice on numeral handling is incomplete. I’ll give only a few examples:
1. “Use numerals when referring to technical and athletic terms” provides the example 3 goals. Why use a numeral there? A more apt example is numerals used in iconic ways, as in “To speak to an operator, press 0” (not “zero,” which one cannot press).
2. There’s a lot of discussion of fractions, but no requirement to use real fractions (½ vs. 1/2). In fact, the document assumes fake fractions (“insert a space between a whole number and its fraction” could apply to 1 1/2 but never to 1½). Nor are we instructed on how to write measurements (three-eighths of an inch or 3/8″ or ⅜″?). Of course, that would lead to a discussion of real inch marks, which CMP could probably not differentiate from the neutral quotation marks it already authorizes.
3. “If a fraction is used with ‘million,’ ‘billion,’ ‘trillion,’ etc., spell out the fraction.” Oh? How about “three and three-quarters billion dollars”? Isn’t that written as $3.75 billion?
4. If a character looks at a clock and says it’s twenty to six, how do you render that? The section headlined “Time” discusses easy and obvious cases only.

Posted: 2006.06.06