This document provides Joe Clark’s comments on “Canadian English-Language Broadcasters’ Closed Captioning Standards and Protocol” (“the Manual”). The version I used carries the document ID CABDR5_MS04.doc
and is dated September 09, 2002. Page numbers refer only to that version.
See also: Alphabetized version.
All on this page:
Every heading has an anchor and can be individually linked. A full table of contents that will make such links possible may be added this week after I alphabetize the contents.
I will continue to debug small copy errors over the week, and may reorder elements so they are alphabetized. All notable changes will be documented in this section. (An example of a change that would not be notable is correcting teh to the, or an HTML fix.)
These comments can be found, in various file formats, at joeclark.org/access/captioning/cab/
.
Joe Clark • joeclark@joeclark.org • 416 461-6788
I was invited to provide these comments by Liz Chartrand, Sarah Crawford, Harvey Rogers, and Susan Wheeler, to whom I extend heartfelt thanks. By agreement with Susan and Harvey, my function is to identify mistakes. I have tried to emphasize research citations and rational argumentation rather than advancing unsubstantiated opinion.
But please understand the ramifications of my brief. I was charged with finding as many errors as possible in the Manual. Accordingly, these comments are all critical by definition. I am not, in fact, being “too negative”; “being negative” was the objective.
Note that, in technical documentation, critical readings like this are par for the course. In engineering documentation, for example, it’s possible for a single round of consultations to produce hundreds of action items for updating and correction. This document discusses some 70 topics – not atypical for a 50-page technical document.
As in technical documentation, the whole point is to flag every error so a decision can be made to either correct it or live with it. Readers are requested not to hold it against me that I was this thorough in doing what I agreed to do.
The exact goals of the Manual need to be more clearly stated. Don’t be afraid to take a stand!
If the Manual is to improve future captioning practice, contributors must accept that we have to outlaw what captioners are doing wrong now. We should not be a little bit pregnant. It’s nice to “strongly discourage” this habit or that, but certain practices must be banned outright. In other words, captioners could not claim adherence to the Manual if they engaged in such practices.
It should also be pointed out that, save for cases where the content itself allows a certain discretion, any style guide should not be treated as a shopping list. It’s all or nothing. We have as many varieties of captioning as there are captioners because captioners have picked and chosen from predecessor styles and also made things up as they went along.
In the future, Canadian offline programming should not, for example, carry captions from house X or shop Y. Programs should not carry certain kinds of captions; they should simply carry captions, all of which look and act the same. If we really believe that captioning is an integral part of a television program, we have no choice.
A special note on my frequent reference to the Canadian captioning monoculture: Having followed captioning for 20 years, I am not familiar with any country whose captioning industry is so dominated by women in their 20s with liberal-arts degrees. The “women” part is observably true and is mentioned for the sake of completeness but is not the problem, so please don’t call me sexist. The other parts of the description are the problem. Young captioners have not had enough time to read and write extensively and tend to lack wide general knowledge. Of course there are exceptions, and of course not everyone working in captioning in Canada is a young woman with a liberal-arts degree. But that is nonetheless the trend.
This lack of diversity in the captioning workforce – this monoculture – can explain why certain errors keep being made even across companies. Lacking the years under their belts and without a variety of educational and literary backgrounds, captioners simply do not know better.
I advocate better training for existing captioners, weeding out captioners who are not really suited to the job, and drastically expanding the diversity of applicants recruited into captioning. Our American friends have had good luck with a variety of approaches. I can suggest that the CAB start work on new ways to attract smart, seasoned, talented, and qualified people of all sorts to the field of captioning. (And audio description, for that matter.) As a question of human-resources development, this is the sort of thing the feds should also help out on, but we shouldn’t necessarily wait on that.
The Manual is not ready for distribution and implementation yet. I believe that will not come as a surprise to readers.
However, I can announce that I have a verbal partnership agreement to collaborate on the Open & Closed Project, a worldwide accessibility training and standardization program that I originated.
The goal of the Open & Closed project (expected to last three years) is summed up in its slogan: Uniting global knowledge of accessible media. Some of our objectives:
As part of the process, we will attempt to license every captioning style guide in existence. It may make more sense for the CAB and its members to fold the current project into Open & Closed.
As an independent body affiliated with a respected academic institution, a standardized English-language captioning style guide may be more widely implemented if it emanates from Open & Closed. A French-language manual could also be created so that it does not conflict with the English-language manual. Keep in mind that our entire project involves research, training, and standardization; we can and will work on topics like the CAB manual day in and day out, while Committee members all are busy putting out captioned programming or managing entire stations. Subcontracting the current Manual to the Open & Closed project may make sense administratively.
Quite apart from that possibility, we are eager to discuss funding arrangements to carry out our research, standardization, and training work in various fields, of which two, captioning and audio description, are of immediate regulatory interest to broadcasters.
[p. 14] ¶ The goal of the following guidelines is to provide Canadian off-line caption editors and on-line caption stenographers with options and tools to use with discretion when making their decisions, so that there is uniformity and consistency across the Canadian captioning industry. These guidelines are the result of carefully considered research, observation, and experience and should be applied by Canadian closed captioning providers to the greatest extent possible.
[p. 14] ¶ 1. Put content first. Verbatim transcription is always the goal. Communicating the meaning and intent of the program must always take precedence over stylistic or aesthetic considerations.
I would say “stylistic or æsthetic considerations” need to be explained. By this reasoning, we would eliminate pop-on captioning altogether because roll-up captioning can caption verbatim nearly all the time while pop-on often cannot.
2. Edit responsibly. Adhere to established guidelines for editing and bear in mind that the only justification for deviating from the verbatim is to ensure sufficient reading time.
3. Be consistent. Document your style decisions and technical methods and apply them consistently.
This principle openly authorizes the establishment of competing style manuals. (Where else would “style decisions” be “documented”?) The goal of this Manual is to eliminate that outcome. How do Canadian captioners “document” style decisions when they don’t have the training, education, or research citations to do so?
4. Keep descriptions simple. Caption scripts should not be cluttered with excessive or complicated descriptive captions.
I don’t think this is important enough to become one of the Four Commandments, as it were. In any event, actual audience research has clearly called for more notation on non-speech information.
A competing list: At a recent meeting of U.S. Department of Education–funded captioners and their consumer advisory panels (held 2002.09.14 in Washington), four competing goals of captioning were enumerated, and they are much more solid. (I had these dictated to me over the phone. When I get the actual transcript, I’ll update the page, with appropriate notations.)
This is a better point of departure than the Manual’s current list.
I submit that the Manual, and the entire Canadian captioning industry, must cease referring to captioners as caption editors. They are in fact caption writers (the Caption Center terminology). They write captions.
There’s already too much editing in Canadian captions. Let’s not provide implicit authorization through job titles.
“Captioner” is a word I use to mean person doing captions or firm doing captions, depending on context. It too is better than “caption editor.”
Note that caption “editors” may feel they are of higher prestige than caption “writers.” Having been both a writer and editor, I cannot support that position.
The Manual’s examples of edited captions [p. 29] are in fact textbook examples of how not to edit captions, but fairly represent the skill level in evidence at present.
“He bought a wrench and he bought a hammer and a screwdriver, and a drill” is most aply edited down to something like:
He bought a wrench and a hammer
and a screwdriver and a drill.
(15 words down to 13.) The suggested eight-word edit –
He bought a wrench, hammer,
screwdriver and drill.
is a sentence that would never be spontaneously uttered outside an ESL class. (At least add the word a before “drill.”) It is condescending and patronizing to subject a viewer to that kind of editing, though at least one captioner would demand that every such sentence be so edited.
This advice –
[p. 31] ¶ Divide captions into very short phrases and distribute them evenly throughout a scene to average out caption durations to the acceptable maximum of three seconds per line.
– turns a slow scene into a techno music video. Very slow speech comes up so seldom that it does not require this degree of reconstructive surgery. Could it be that slow speech is slow for a reason, and tends to have built-in pauses that can be accurately mirrored in captions?
Why the laboured discussion about one- and three-line captions? Use as many lines as are necessary to accommodate the flow of the dialogue and reading speed. Pulp Fiction requires a different approach than On the Road Again. I am not convinced that a set of short captions is really better than a single big one in all cases.
[p. 15] ¶ Therefore, all obvious speech must have captions. Research shows that captioning consumers watch for visual cues from the faces of television speakers to direct their eyes to the caption area of the screen. If there is no caption, it creates a false alarm and considerable frustration.[7][vii]
[p. 32] ¶ Improperly divided text stops the reader, so that he or she spends time re-reading instead of reading quickly and then scanning the picture.
No, there are two eye-gaze studies that confirm what should be obvious to you if you reflect on your own caption viewing: You spend most of your time watching the captions. (In scrollup captions, you spend nearly all your time doing so.)
See Carl J. Jensema et al.,“Eye-movement patterns of captioned-television viewers,” American Annals of the Deaf, 2000. “[S]omeone accustomed to speechreading may spend more time looking at an actor’s lips, while someone with poor English skills may spend more time reading the captions.... [T]here is some preliminary evidence to suggest that higher captioning speed results in more time spent reading captions than on a video segment.”
For segments with no captions, eye movements tended to zip around the screen. But for segments with captions, the preponderance of eye gaze dominated at the bottom of the screen. “The addition of captions apparently turns television viewing into a reading task, since the subjects spend most of their time looking at captions and much less time examining the picture.”
Rather interestingly, subjects were re-tested with the same videos a few days later. Subjects spent slightly more time looking at the picture than reading captions. That was also true for the hearing subjects, who had initially had less experience watching captions (“only the deaf subjects watched it regularly”). “Viewers read the caption and then glance at the video action after they finish reading.”
In other words, add captions to a program and people spend most of their time reading them. The effect holds irrespective of speed: 80, 106, 122, 197, and 220 wpm were used, with no significant difference in results. Experienced captioning viewers – even if the experience comes from the first phase of this eye-tracking experiment and nothing else – spend less time focusing on captions.
This isn’t the only evidence. See Jensema et al., “Time spent viewing captions on television programs,” American Annals of the Deaf, 2000. “It was found that subjects gazed at the captions 84% of the time, at the video picture 14% of the time, and off the video 2% of the time. Age, sex, and educational level appeared to have little influence on time spent viewing captions. When caption speed increased from the slowest speed (100... wpm) to the fastest speed (180 wpm), mean percentage of time spent gazing at captions increased only from 82% to 86%.” Four silent custom videoclips, captioned at 100, 120, 140, 160, and 180 wpm, were used in the study. All 25 subjects were deaf. On average, subjects spent 84% of their time looking at captions. The range was 82–86%. Variation in caption speed has no serious impact on the time spent watching the rest of the screen.
It just is not true that we watch the screen and then occasionally look down at captions. Now, that is certainly the case when a long stream of uncaptioned video is interrupted by a new caption, but that is an unusual event.
From the Manual:
A viewer watching action will need a change in caption shape or positioning to detect that a new caption has occurred. The pattern seems to be that a viewer first detects a change of caption, then reads it, then scans the picture until there is another caption change.[8][viii] Therefore, if two sequential captions have the same shape and placement, a change of captions may not be detected. It is therefore important to vary the placement of sequential captions when their shapes are identical.
How does one vary the placement of sequential captions? There are right and wrong ways. What are they?
In any event, what is stated above is not, in fact, “the pattern” of eye-gaze in caption viewing, according to Jensema’s research.
It is not sufficient to argue that a change in caption shape or positioning will cue the viewer that “a new caption has occurred.” For the better part of ten years, several but not all Canadian captioners have engaged in protracted negligence in creating captions with a nonstandard blink rate, the number of blank frames between captions. (Some Canadian caption writers have even claimed not to understand the concept of blank frames between captions.)
The best number, used for 20 years by every professional captioner in North America until VoiceWriter spread itself into Canadian captioning, is two frames. Vitac has used a blink rate of one frame for a couple of years, initially for talkier programs like The Practice, and it’s OK in general; not coincidentally, Vitac was also the first captioner to commit to verbatim captioning whenever possible. A blink rate of zero (widely used in Cnadian captioning) makes it too difficult to tell when captions have changed.
Higher blink rates make caption changes too conspicuous. A blink rate of four frames – the default in VoiceWriter and widely seen elsewhere – makes captions blink on and off like a turn signal on a dashboard. They are nonstandard captions. But worse yet, too-high blink rates force captioners to edit.
Each frame of video can carry up to two visible caption characters. (The number is not exactly two in all cases due to control codes and similar issues.) If a captioner uses a blink rate of four versus two, after three captions have appeared and disappeared, six additional frames have been lost in blink rate. That’s 12 characters – two five-letter words plus their trailing spaces. After every three captions, captioners have no choice but to remove two words from subsequent captions just to maintain presentation-speed parity with captions everyone else has used for 20 years.
Also, be aware that captions remaining on the screen too long are likely to be re-read by the viewer, causing confusion.[9][ix]
It’s pretty uncommon for a caption to stay visible for too long. I generally see that after an encoding error where the screen-blank command (a single pulse) is missing.
It’s good practice to sow screen-blank commands through extended periods of no captions (even five seconds is long enough to do so). Among other things, people channel-surf, causing their caption decoders to pick up caption characters that are then displayed with the next full caption. Sending out clearing pulses from time to time reduces or eliminates that problem.
Reading speed – that is, speechrate (write it as two words if you wish) or presentation rate – is a much less contentious issue than Canadian captioners seem to believe. (How contentious is it? I got yelled at for 80 minutes in a meeting on this very topic. Why? It was claimed that big-D deaf people cannot read faster than 150 words a minute, hence every program must be edited to that speed, even when using roll-up captions. Note well: My use of the term “yelled at” is not hyperbole.)
There’s solid evidence for the following:
See Jensema, “Viewer reaction to different television captioning speeds,”American Annals of the Deaf, 1998. “Participants used a five-point scale to assess each segment’s caption speed. The ‘OK’ speed, defined as the rate at which ‘caption speed is comfortable to me,’ was found to be about 145 words per minute (wpm), very close to the 141 wpm mean rate actually found in television programs.... Participants adapted well to increasing caption speeds. Most apparently had little trouble with the captions until the rate was at least 170 wpm. Hearing people wanted slightly slower captions. However, this apparently related to how often they watched captioned television. Frequent viewers were comfortable with faster captions.”
See Jensema et al., “Closed-captioned television presentation speed and vocabulary,” American Annals of the Deaf, 1996: “The average caption speed for all programs [surveyed] was 141 words per minute, with program extremes of 74 and 231 words per minute.... The percentage of script edited out ranged from 0% (in instances of verbatim captioning) to 19%.”
When it comes to editing captions as extensively as was done on The Captioned ABC News in the 1970s, “almost everyone now considers [that] overediting.… Deaf viewers wrote letters to caption companies indicating they wanted access to whatever was spoken on the audio and that captioners should not play the role of censors. According to conversations with captioning company officials, caption companies have tended to interpret this as meaning deaf people want straight verbatim captioning.”
The study documents the reality that caption speeds in excess of 150 wpm are common and that deaf viewers not only never asked captioners to alter the dialogue allegedly to suit them, they asked for the exact opposite.
See Captioned Films/Videos Program, Captioning Key: Preferred Styles and Standards, 1995: “CFV adult special-interest videos require a maximum presentation rate of 150–160 wpm. Special-interest videos include adult education, self help, and how-to-do materials.... CFV theatrical (movie) videos are captioned at near-verbatim rate. However, in general, no caption should exceed 225 wpm.”
See the Television Broadcasting Services (Digital Conversion) Act 1998 Draft Captioning Standards (Australia), 1999: “A caption should stay as close as possible to the original wording while allowing the viewer enough time to absorb the caption’s contents and still watch the action of the program. To meet the second criterion, programs for adult viewers are captioned at a reading rate of 180 words per minute.... Programs should be captioned at an appropriate reading speed for the intended audience. For adults this is 180 words per minute.”
Foreign-language subtitling is widely understood to use slower presentation rates than captioning – in many cases, maddeningly slower rates for hearing people used to captioning. But even the one and only book on subtitling in print advocates speechrates near 170 wpm. See Jan Ivarsson and Mary Carroll, Subtitling, 1998: “[Film distributors] have established a norm, which may be expressed as follows: 2 lines = 80 characters = 8 feet of film = 128 frames = 5½ seconds. This results in a reading speed of around 175 words per minute.”
The Manual’s advice to stay under 200 wpm is fine (though much of its advice is contradictory and misunderstands how captions work). But more explanation is necessary, because at least one Canadian captioner is willing to scream at you for an hour about the necessity of editing captions to a childish reading speed.
I would suggest a rewrite of this section. In particular, advice like “Leave two-line captions onscreen for three seconds” is too rigid and too vague all at once. Is that one line of dialogue plus a single-line speaker ID? Is that two 16-character lines? Is that two lines of edge-to-edge speech with italics and quotation marks and scientific vocabulary?
Further:
[p. 29] ¶ Off-line caption editors are reminded that they are likely to have a faster caption-reading speed than the average viewer. Therefore, they should not rely on their own reading skills to judge the pace of words.
In fact, Jensema’s evidence shows that experience reading captions makes it easier to read faster captions. Captioners merely have a lot of experience reading captions. (Actually, that is untrue. Not many captioners in Canada watch captioned TV at home. Squinting at a little MPEG window in Swift as a horribly distracting cavalcade of 30-point caption text streams by alongside does not constitute “reading captions” in my book.) But caption viewers have a lot of experience reading captions. The advice above is exactly wrong: Captioners’ own reading is a valid basis of comparison.
Moreover, no one seems to have acknowledged that most households now have VCRs. If some deaf subgroup has a hard time reading captions, well, there’s probably no speed they could reasonably keep up with, and maybe they should just start taping programs and rewatching them rather than expecting all captions to be edited their level.
For this reason, if you’re captioning a program that you know will be released on home video first, you might be able to get away with faster presentation speeds if it is genuinely necessary to represent the program. The viewer can always rewind. If you’re given a TV series to reformat for DVD (heaven help us when a Canadian captioner is given that many tools to misuse), recaption nearer verbatim.
Another overlooked issue in evaluating all aspects of caption readability is the visual acuity of the audience.
Sondra Thorn, a deaf optometrist who used to work in captioning (I corresponded with her in the 1980s!), has written two papers on caption-reading abilities with impaired vision. In addition, the Royal National Institute for the Blind has surveyed the television viewing habits of visually-impaired people (though I don’t have a research reference).
Conclusions? People sit too far from their TVs to read captions; their vision is often not corrected properly (though some vision deficiencies make it hard to read text on a screen anyway); and, most importantly, even tiny amounts of blur make it impossible to read fast captions.
Thus, captioning viewers should be encouraged to sit closer and get proper glasses before insisting that captioners do things their way. Captioners, and contributors to this Manual, are advised to take the requests of individual viewers, irrespective of hearing status, with a grain of salt. In some other cases, I have heard of captioners rewriting their style books in response to a single complaint, and in a couple of those cases the rewrite violated standard English orthographic rules.
Reference: Thorn, F., & Thorn, S., “Television captions for hearing-impaired people: a study of key factors that affect reading performance” Human Factors, 1996.
This section makes no sense at all:
[p. 15] ¶ Non-verbal utterances, repeated words and false starts to sentences are not generally included in real conversation. However, if they contribute to the understanding of a dialogue, dramatic effect, joke, or personality, they must be included.
[p. 15] ¶ Captions should not cover graphics or keys, characters’ eyes or lips, or areas of sports action.
Captions always have to cover something. Very large graphics will end up being covered. In certain unusual cases, captions will cover keys (a term not defined to this point); more commonly, captions cover keys because captioners don’t bother to move the captions.
The prohibition of covering characters’ lips is overrated. The common example of one character sitting at a desk and another standing behind that character forces captions to alternate between top of screen for sitting person and bottom of screen (at opposite side) for standing person – hardly worth the trouble, considering how small the sitting person’s lips are.
In closeups, where this rule makes at least some sense, your options may be to cover lips or cover eyes. (Or nose, I suppose.) Which do you choose?
I’m not sure we can afford to be all churchy about covering certain parts of the screen. Covering the screen is what captioners do. There are times when we have no choice but to cover things; it’s not always unrecoverable (we can move the caption a moment later, for example, in the case of a full-screen photo, map, or illustration) or even always important.
The Manual fundamentally misunderstands the role of justification in captioning. Justification refers to the placement of margins, and there are really only two kinds, flush left (left-justified, quad left) and flush right (right-justified, quad right). Centred captions – that is, captions without true centring but no flush margins, either – are not said to be justified. That may come as a surprise if, in your word processor, “centred” is an option in the same menu as flush left and flush right. But word processors cannot be relied upon to get the terminology right.
Due to a serious design error in the original Line 21 specification. the left edge of any caption line could be located only at any of eight tab stops spread across the 32-character line at four-character increments.
1 2 3
12345678901234567890123456789012
T T T T T T T T
The left side of the screen was actually a tab stop.
Overnight, we went from open captioning, which can and did use right justification, to a system of full left justification, simulated and crude centring using four-character increments, and no right justification at all save for the rare cases in which consecutive lines were exactly the same length.
What we were stuck with, then, are left-justified or centred captions displaced to the right, which do a miserable job of indicating rightmost or rightward position.
The TeleCaption II decoder introduced what is known as transparent-space positioning, in which transparent space characters can be added to lines to pad them out and position their left edges at any character position. Because of (overstated) concerns about compatibility with the original TeleCaption decoder, transparent-space positioning was not really used until the early 1990s.
The EIA 608–compliant decoders built into television sets now all understand transparent-space positioning, and there are tens of millions more of those in use today than the total number of original TeleCaption decoders ever sold. Those decoders are too old to support anyway – and should not be supported.
The only bases on which to claim that right justification should never be used are:
I am not convinced that centring captions under a character, even if the block is slightly to the left or right of the screen centreline, is such a hot idea, though it is implicitly countenanced in this section. I am certainly convinced that using flush-right captions to indicate a speaker at the extreme right of the screen is a good idea. What else are we going to use?
Hence –
[p. 35] ¶ Do not use right justification in captions.
– is quite wrong.
This section needs to be expanded.
[p. 15] ¶ Roll-up captions may revert to a 16-character line with left justification, and may be positioned at the far left of the screen or at the centre of the screen.
[p. 15] ¶ If spoken words or lyrics are different from a textual graphic, for example, when there is talking over end credits, full captions must be included and moved so as to interfere as little as possible with the essential visual elements, using one of the above techniques.
Also:
[p. 16] ¶ Do not start or end a caption in mid-sentence assuming that a textual graphic will be read in the correct sequence to complete the captioned sentence. For example, the complete phrase, “Tonight’s program is brought to you by Sunnybrand Detergent” should appear in captions, even if the Sunnybrand Detergent logo appears as a full page graphic while the captions are displayed.
The topic of handling subtitled passages or programs is not addressed in the Manual and needs to be. I have never seen a Canadian captioner handle it correctly.
[p. 16] ¶ However, the reading rate for preschool children should be considerably slower than for adults: four to six seconds for each line of text. It is appropriate to edit text for preschool children using very short, concise sentences that do not fill a line. One-line captions are best.
These assertions are at best severely debatable and contradicted by practice. The only credible research basis for a decision of verbatim vs. edited captions is presently underway and will not be published for many months.
Children’s programming is found in different forms. Teletubbies is famous for having been created specifically for kids too young even for Sesame Street. An enormous range of children’s programming is not educational in nature and uses adult-level dialogue. (You may find the storylines clichéd and insufferable, but the characters talk like adults in a syntactic or pragmatic sense.)
Remembering that captioners have a responsibility to the program and to viewers, I don’t see how it is defensible to aggressively edit all children’s programming. In any case, that isn’t what is presently done. (I wouldn’t use Canadian-captioned programming on Canadian children’s channels as examples given the poor expertise at work. Look at U.S.-captioned shows.) Even Teletubbies is captioned verbatim!
The only real-world experience we have with heavily-edited children’s captions revolves around Sesame Street, which, to my knowledge, only NCI has captioned through its entire run. From day one, the show has been heavily edited, and uses a different approach to non-speech information (outright use of child-level onomatopoeia, like moo, cluck, meow). In recent years, captions are also closer to characters’ heads than ever before. But I am aware of no documentation whatsoever of this decade-long captioning practice. (It is typical of NCI not to document anything despite its nominal status as a captioning “institute.”)
The other example is Arthur, with near-verbatim grownup-style captions on CC1 and heavily-edited kids’ captions on CC2 (with a different approach to NSI). It’s an ongoing WGBH research project that will give us our first reliable evidence on this question. One outcome that WGBH confirmed is possible is as follows: Very young deaf kids cannot read and pretty much cannot follow whatever captions we give them. If true, then we have even stronger reason than ever to caption adult-style because adults are the only viewers who can read the captions.
One must also keep in mind that captioners are not trained grammarians. Canadian captioners are not really all that good at editing adult-level dialogue, let alone altering syntax to be more understandable by kids. I have a couple of very old papers providing guidelines on caption editing for kids, but to describe such papers as advocating “very short, concise sentences that do not fill a line” would be an absurd exaggeration.
It is also possible that roll-up captions might be easier for kids of certain ages to follow than pop-on captions, but that is supposition.
Absolutely the most significant error in the Manual concerns case of captions. Real-time captions are very difficult to create in upper and lower case, though Waite & Associates manages it and it’s the only way French and Spanish real-time captions are done. A case can be made that all-upper-case real-time captions are a necessary evil.
However, all offline captions must use correct upper and lower case just like the rest of the English language. I cannot overstate the importance of this requirement. For 20 years, we’ve enjoyed the full, glorious English language in print yet HOURS AND HOURS A DAY OF SCREAMING, SEMILITERATE ALL-CAPITALS CAPTIONING ON TELEVISION. It’s time to grow up and start writing captions that actually follow English orthography.
Let’s recap the facts:
If mixed case is really “worse” in captioning, then why has it been so heavily used?
The Manual itself countenances the use of mixed case for long passages and for NSI and speaker IDs. That amounts to yet another example of collapsing complex auditory phenomena onto relatively small and unexplained orthographic alterations.
But in the former case, what evidence can you provide that deaf and hard-of-hearing viewers immediately and transparently understand that a long passage of sudden mixed-case captions signifies, say, a literary reading? And in what other realm of English orthography would we write everything but literary readings in capitals?
Answer: We wouldn’t.
Now, please try to explain how the use of all-capitals setting for closed captions is still supportable. Save for real-time captioning, there is no evidence whatsoever supporting its use and an avalanche of evidence against it. It is an open-and-shut case, no matter how much it upsets Canadian captioners’ apple carts.
Upper-case captions can and should (indeed must) be used as a differentiator in speaker IDs. Compare:
The Caption Center was eventually persuaded to adopt the latter style, and now it’s working fine. All-mixed-case captions in which even speaker IDs use mixed case do not provide enough differentiation. However, mixed case can and should (indeed must) be used for non-speech information.
I’ve written two articles on this subject by now, one of them current.
By the way, I’m not quoting the entire section on this topic [p. 17] because it is horrendously wrong from start to finish.
Also, converting to mixed case obviates the entire discussion of acronyms [p. 20]. Just write them in capitals (no, not italic capitals), unless the context suggests the acronym would be misread as a word. That case does not come up often – I.R.A. is the only example I can think of. (Well, possibly A.D. for “audio description.”)
E-mail addresses do not have to be in mixed case. Domain names are case-insensitive by RFC specification. (Try it sometime: Mix case all you want in domain names in a browser and in E-mail. Any permutation will work.) UserIDs can theoretically be case-sensitive but never actually are, except on ancient X.400 mail systems, of which there are almost none in existence. But there’s no reason not to match the onscreen orthography in those cases.
[p. 17] ¶ To match on-screen case for proper names and titles, and use the spelling preferences of performers such as k.d. lang (not K.D. Lang) whenever possible.
Let’s not get ridiculous. Should marchFIRST, KISS, thirtysomething, sex, lies & videotape, and other orthographic contortions be indulged? Hardly. A weak case can be made that captions must match onscreen type; a fair case can be made that a TV commercial must follow the corporate orthography because it is an example of flat-out corporate salesmanship.
A stronger general case can be made that examples like those listed above do not have the same orthographic etymology as intercapped names like WordPerfect or even MacDougall and are merely affectations promoted by questionably-literate marketers. It is not our job to abet such marketers.
Also note that sentences that end in URLs or E-mail addresses take standard end punctuation.
[p. 17] ¶ Captions usually appear as white text on a black background because this is the best combination for visibility. Colour captions have been successfully used as a special effect in music videos (not music segments within a program), and as an effect with certain voices in dramatic stories, but they have generally tested poorly both as an indication of speaker identification and as an indication of emphasis.[10][x] Colour captions can never be used as the sole indicator of who is speaking. Proper placement and speaker identification are always required because colour can be difficult to discern against the video background. Use of colour captions is discouraged until such time as research is conducted to develop proper guidelines for their use.
Recall that the Manual must be written to outlaw current mistaken, unjustifiable, or egregious practices of Canadian captioners.
Chief among these is promiscuous misuse of italics, including use for some applications the Manual authorizes, like product names.
Use of italics must follow English orthography. Style manuals (you cannot get any better than Chicago) offer an exhaustive list of permitted, required, and prohibited uses.
This section mentions foreign phrases. Use italics unless the phrase is very well integrated into English, like lederhosen, jihad, dépanneur, or rendez-vous. I think the example given (deja vu) is sufficiently anglicized for roman setting (note the absent accents). Consult a recent Canadian dictionary and follow its advice.
There is no adequate coverage of use of italics for offscreen speakers (taken to absurd extremes by Captions, Inc.), thinking, inner voices, and narration.
[p. 5] ¶ The Manual asserts:
Television is recognized as the most popular source of information and entertainment in the world. By making television programs accessible with closed captions, Canadian broadcasters facilitate the involvement of Deaf, deaf, deafened, and hard of hearing people in popular culture. Closed captioned television programming provides these groups of people with accessible, screen-based, cultural, historical, and educational communications. Caption providers, therefore, bear an important responsibility to caption viewers. This is the most compelling factor in the creation of standards for closed captioning.
[p. 5] ¶ Canada’s broadcasters are committed to making television accessible to everyone and are therefore committed to treating closed captioning with the same responsibility and sensitivity to their audience as they treat the aural and visual elements of television. To achieve this objective, they have been instrumental in advancing technology so that captioning is of the highest possible quality.... Every year, the broadcasting industry invests significant resources in high quality program captioning to meet the needs of Deaf, deaf, deafened, and hard of hearing people.
We are attempting to provide baseline training for captioners. This section is demonstrably untrue, distracting, and irrelevant to the Manual’s mission.
[p. 6] ¶ It is necessary to explain that some deaf people (that is, the signing deaf) contend they constitute an identifiable culture, and that they believe “deaf” should be capitalized when referring to that group and no other.
Big-D deaf people exist. But the Manual is not a sociological treatise. Specifically related to captioning, if certain practices must be engaged or avoided because of demonstrable rather than asserted characteristics of, say, prelingually-deaf viewers who use sign language as a preferred method of communication outside the written medium, that’s fine. Similarly, if other practices are indicated or contraindicated for, say, hard-of-hearing or hearing viewers or any other group, that’s fine, too.
I merely object to the Manual’s complete acquiescence to the philosophy that big-D deaf people are in some way more important than any other group. If they weren’t deemed more important, why would they be able to reuse a word and capitalize it?
Referring to the typical viewer base of captioning as “deaf and hard-of-hearing viewers” is sufficient. The overlong and hypercorrect phrase “Deaf, deaf, deafened, and hard-of-hearing viewers” is superfluous and pandering (and uses three of the same syllable all in a row). When specific issues require explanation based on the linguistic or other characteristics of a subpopulation, by all means use relevant terminology and go into all necessary detail.
[p. 7] ¶ Through captions, viewers can read authentic language and see how it is used in meaningful situations.
Given how much and how badly Canadian offline captions are edited (one broadcaster edits everything without exception down to 150 words per minute, with severely limited skill and finesse), it is untrue to claim that Canadian captioning viewers “can read authentic language.” Edited Canadian captions read like nothing a human being would ever spontaneously utter (and like nothing a screenwriter would ever spontaneously write). The author of these remarks has watched captions for 20 years, has published nearly 400 articles and a book, is an experienced editor, writes clean copy, and has experience working with seasoned proofreaders with decades of experience, so please accept my credentials in leveling this criticism.
Nonetheless, please keep in mind that all I need to do is transcribe the actual audio of 15 minutes of one or two Canadian shows and reproduce the corresponding caption text to put the lie to the contention that Canadian offline captions always or even usually represent “authentic language.”
Also, the phrase “and see how it is used in meaningful situations” is better recast as “and understand its usage in TV programming.” I would not go so far to say that every situation is meaningful, nor that TV programming represents actual English as used in conversation. There are too many levels or registers of English used on television for that to be true.
[p. 7] ¶ Now that closed caption decoders are built in to most television sets, the audience for captioning has grown to include those who can hear, but who choose to watch a program in silence. They may choose to read closed captions when others are sleeping, on the phone or studying, or they may prefer reading captions to listening.
This paragraph seems to knowingly render invisible those hearing captioning viewers who watch captioned TV with the sound on. I am not the only one, and it is unwelcome to be rendered officially nonexistent in a captioning manual.
In fact, irrespective of whether or not they keep the volume turned on, hearing viewers are probably the majority audience now.
[p. 8] ¶ They break up the transcript into succinct phrases, which will either roll up on or pop on the video screen.
I wouldn’t say succinct (“Characterized by clear, precise expression in few words; concise and terse”). This too implies an editorial function on the part of the captioner, whose presumptive job is to hack, delete, skim, lighten, and alter the source rather than represent it.
Captions can be quite lengthy. I have read innumerable three-line captions that are the right length, and none of them are particularly concise (since a three-line caption can contain roughly 96 characters, or about 20 five-letter words). Captions need not be “succinct.”
Roll-up captions are not divided at all, really. (One can imagine a few exceptions, like the aberrant individually-placed scrollup captions that kids are adding to programs these days. I don’t consider sentence-ending carriage returns “divisions.”)
Try: “In the case of pop-on captions, they break up the transcript into individual segments [or chunks]. In the case of roll-up captions, they transcribe the audio in full and add linebreaks and other typographic divisions.”
[p. 9] ¶ Preparation: Off-line caption editors must take the time to carefully research all names and unfamiliar words or phrases that occur.
Caption editors must strive for accuracy and are reminded that the ideal way is to have a second person screen for errors, and a third person do a final pass with the audio off.
[p. 12] ¶ Because captions usually cover the bottom three lines of the television screen, efforts should be made by broadcasters and producers not to put essential visual information in this area. Whenever possible, graphic information should be placed well above the safe title zone, so that there is room for both captions and graphics to display.
This isn’t accurate and isn’t very useful advice.
[p. 12] ¶ Because caption data pulses occur before captions appear, the first caption of each program segment must occur at least 15 frames (half a second) into the program or its time code cue may be lost in the transition from commercial to program. Also, caption data must be blanked at least 15 frames before the end of each program segment so that irrelevant captions do not bleed into commercial breaks or other programming.
This isn’t very clear, and is a bit too liberal.
Ellipses must not be used between captions if no pause is present. In other words, don’t add an ellipsis to the end of any caption that does not naturally terminate in punctuation, as Captions, Inc. does.
Do not soft-hyphenate words. Do not attempt to add hyphens to words. The only permissible hyphens are hard hyphens, that is, hyphens that must be used or the word would be misspelled.
Why? This too may come as a sur-
prise to captioners, some of whom think a cap-
tion with nine words is ten full per-
centage points better than a cap-
tion with ten words, but it’s been es-
tablished for decades that adult read-
ers do not read word-by-word. Instead,
the eye bounces in sac-
cades across several words at a time, landing in fixa-
tions that vary according to famili-
arity with the text, reading condi-
tions, type size and linespac-
ing, and other factors. General-
ly, only unfamiliar or hard-to-read words are be deci-
phered letter-by-letter, the way we
learned to read in grade school.
With a 32-character line and monospaced fonts (in capitals most of the time, no less), soft-hyphenated words impede saccadic motion of the eyes. (How easy was it to read the preceding paragraph?) Hyphenation in print typography takes years to get right and still is a matter subject to some dispute. Don’t do it at all in captioning, save for the supassingly rare words that are too long to fit in 32 characters, like supercalifragilisticexpialidocious.
Here are some actual hyphenation atrocities perpetrated by a leading Canadian distributor:
To repeat: No soft hyphens.
However, contrary to the Manual’s advice, you can break a line after a hard hyphen. Why not? After hyphens is one place where the English language can break lines.
Also, the Manual’s sections on “Dashes” [p. 21] are quite wrong. First, “parenthetical information” as found on TV is usually an appositive: “A construction in which a noun or noun phrase is placed with another as an explanatory equivalent, both having the same syntactic relation to the other elements in the sentence; for example, Copley and the painter in ‘The painter Copley was born in Boston.’ ”
Second, the construct listed in the Manual –
Please take him
--the guy in red--
is, to use the scientific terminology, deeply weird and is unprecedented in English orthography.
Please take him--
the guy in red--
is the standard orthography.
First, it seems to have been decided that NSI will be enclosed in parentheses without initial capitals unless proper nouns require them. A case can be made for that decision but has not been.
Even after 20 years, Canadian captioners still have not figured out the concept of aspect in writing non-speech information. It’s a grammatical term: “A category of the verb designating primarily the relation of the action to the passage of time, especially in reference to completion, duration, or repetition.”
(phone ringing) is in the progressive aspect (possibly the continuous aspect – it’s a fine line). (phone rings) is in the indicative aspect.
The “Descriptive caption examples” section elides this issue entirely.
[p. 24] ¶ (engine revving)
(whispering)
(phone ringing)
(loud knocking)
(pager beeping)
(rapid gunfire)
(gunshots)
Isn’t there a difference among (phone rings), (phone rings twice), (phone rings then stops), and (phone ringing)?
How about (belches) vs. (belching)? (clears throat) vs. (clearing throat)? (How long can you belch or clear your throat continuously?)
Moreover, the Manual fails to explain how to handle continued or interrupted NSI. An example is captioning the word (ringing) when we can see the phone and (phone continues ringing) later on.
This section requires much more elaboration. Along with speaker identification and absence of translation, non-speech information is the key criterion differentiating captions from subtitles and merits much wider treatment.
In particular, the Manual needs to outlaw the use of strictly limited and preordained structures in writing NSI. One Canadian distributor captions everything in an [ Xing of Y ] format (complete with errant spaces inside brackets):Have you ever met anyone who talked like that?
There is as yet no explanation why Canadian captioners constantly tell us (lyrics unclear) or (indiscernible). Trust me, lyrics are much less likely to be unclear than you think, and few, if any, conversations are indiscernible. Distant music or conversation is a different story. I’m talking about music or dialogue that I can not only understand but repeat out loud right then and there. In fact, that’s what I usually do, talking back at captions that lie to viewers like me by claiming the words could not be understood. I could understand them.
In fact, an ear for dialogue is a prerequisite for a successful captioner. It comes from deep-seated fluency in the English language and a wide vocabulary, aided by excellent audio quality. I’ve seen a lot of indistinct conversation captioned as such by Americans (because the director and sound engineer designed it that way; nobody could decipher the dialogue, and that is intentional), but I have seen one (count it: one) case of unplanned indecipherable foreground dialogue – a drill instructor screaming continuously at a recruit in a U.S. documentary. (\indecipherable\) was the caption, and it was true.
(The Caption Center would probably argue that disembodied overlapping voices in September Songs: The Music of Kurt Weill was another case, but such voices were meant to be indecipherable and fragmentary. I believe another counterexample could be given: An interview segment in which a helper pops into the room and mumbles something out of the coverage area of the microphone.)
I reiterate one of my shibboleths: If Canadian captioners weren’t nearly all young women in their 20s with little life experience and insufficient experience reading, writing, and transcribing, we wouldn’t have this problem. (One more time: Canadian captioners are a monoculture. Everyone has the same deficiencies and everyone blanks on the same issues. There is no safety net because there isn’t a wide enough range of experience and knowledge.) And if captioners were perhaps paid better, they’d care more.
The explanation “Do not guess at indiscernible speech” [p. 25] is something of a motherhood issue. Who would want to guess? The real problem is an overdiagnosis of speech as indiscernible, a problem that traces itself to poor skills in the Canadian captioning monoculture.
On the topic of transcription accuracy, I would note there is a tremendous reflex to Google everything. The assumption is that there is a page somewhere on the Web that can authoritatively answer any query on the correct way to render a word or phrase.
Well, that’s not true. I’m a stickler for accuracy and I have myself left incorrect spellings of proper nouns online for months at a time. (Sometimes the original source can be incorrect. One article actually got its own byline wrong and I simply copied it. How could I have known? Fortunately, I later found the correction.)
Even printed sources are not always reliable. I read a book on current French cinema that consistently misspelled the surname of one of the directors the book itself interviewed. Further, foreign names can be transliterated in several ways, a phenomenon found in some languages more often than others – Russian and Ukrainian more than, say, Japanese.
While no specific reference medium is bulletproof, I would argue that a reliance on the Web is a recipe for trouble. Many captioners are unaware that, predating the Web, fee-charging electronic databases proliferated. They still rake in good cash even today; that’s why, for example, Thomson has been selling off newspapers in favour of electronic databases. (Disclosure: Since I did not opt out of the court-defined class, I am party to a class-action lawsuit against Thomson Corp. over unauthorized reuse of copyrighted newspaper articles in fee-charging databases.)
The most famous American database service is Lexis-Nexis, carrying an astonishing wealth of information. But specific industries and periodicals are also online. In the case of the fashion industry, for example, periodicals like Women’s Wear Daily are online in full text; you just have to pay for access. Articles in trade papers like these are quite likely to spell the names of their subjects correctly.
If you’re on a deadline and need to know the spellings of eight designers who showed in Paris four days previously, a subscription to one of these databases is well worth the money. Ten dollars in search time answers your question right then and there.
A case could be made that Canadian captioners should band together to negotiate group rates for database access. In any event, the costs are not particularly onerous.
The manual flagrantly authorizes the full range of Canadian misdeeds in music captioning. Note on notation: The staffnote character is difficult but not impossible to typeset. The vertical-bar character | is used as a substitute by everyone but this Manual, which uses an underscore, a bad idea for several reasons: We can actually underline in captioning (do you mean you want that space character underlined?), and underscores bleed into each other, making it hard to count how many instances you intended. (Similarly, backslash \ is used to indicate the italic toggle.)
The Manual must explicitly outlaw the single most egregious and appalling habit of Canadian captioning, namely a caption like [ ||| ] slapped on the screen, up to a dozen times per program, as some indifferent, casual, nonspecific indication of “music.”
Music is important. It is not unidentifiable, interchangeable, or inconsequential. It must be treated with much greater respect than Canadian captioners presently do. The [ ||| ] must never, ever be used again for any reason. Whenever you the captioner are tempted to slap it onto the screen, ask yourself:
Wouldn’t you say the following usage is a tad more sophisticated, literate, and understandable than robotically copping out from your responsibilities by slapping [ ||| ] onto the screen whenever anyone strikes a musical note?
Use your heads. Write it out!
See also: Punctuating music.
Please exercise caution in telling captioning viewers a segment has “no audio” [p. 26] (Cf. eye-gaze).
“No audio” means total silence. Now, what were the last five occasions in which you encountered dead silence on a videotape?
If mouths are moving but no voice is coming out, then that’s what’s happening: (no voice). (mouthing words) also sometimes works – though perhaps the caption (mouthing) is too unclear – when that is actually what’s happening. But (mouthing words) it’s a way longer caption than (no voice), and such moments are usually so brief that a shorter caption is better.
If you really do find a segment with (no audio), by all means use it.
First of all, “inflection” is perhaps not an apt term here. The Manual states [p. 27] that “[m]any people speak with inflections or accents, use liaisons between words or leave endings off words, etc.”
Everyone speaks with an accent; accents are relative. Words are not articulated separately; speech is a continuous flow of sound. (Untrained people tend to violently disagree with that statement; it’s like learning there is no such thing as Santa Claus. It remains true nonetheless: There are no breaks between words in continuous speech.)
I assume “leave endings off words” refers to the case of, say, looking vs. lookin’. A common misconception; nothing is being “left off.” Perhaps you refer to French speakers in English who leave off plural markers and verb tenses (“I walk back from the store after I pick up some tomato”).
No captioning style guide I’ve read has coherently addressed this topic. The result has been full-on miscaptioning of programming by every captioner I can name. Yes, everybody gets this wrong. One prominent case: It was never mentioned on Star Trek: The Next Generation that Jean-Luc Picard, while allegedly French, speaks with a British accent. I think it’s fair to say that hearing viewers took note. In a more diegetic example, Carrie on Sex and the City makes a new friend who is, in fact, Australian. It comes as a big surprise in captions when he starts mentioning Sydney since we hadn’t been told of his Australian accent.
This section requires expansion.
[p. 5] ¶ These standards are only now possible as a direct result of the extensive experience gained in English-language broadcasting in Canada.
It’s been possible to write standards since the early 1980s when captioning began. The “extensive experience” noted here is the cause of the problem. The Manual attempts to explain right and wrong ways to caption; in may respects it seeks to deprogram Canadian captioners of their worst existing habits. We are trying to establish new standards, which will involve unseating existing standards, if they can be called that.
[p. 6] ¶ Deafened and deaf (lower case ’d’) are terms that refer to individuals who have lost all hearing at some point in their lives.
[p. 49] ¶ deaf/deafened: Terms that refer to individuals who have lost all hearing at some point in their lives. These people use spoken language and rely on visual forms of communication such as speechreading, text, and occasionally sign language.
Surely it is inaccurate to say these groups have lost “all” hearing. “All usable” hearing may be more accurate, or simply “nearly all” hearing.
[p. 8] ¶ Trained off-line caption editors watch and listen to a videotaped program and create a transcript of the audio, including descriptions of sound and music.
Captioners render sound in text. Speech is one stream of the sound so rendered. It is tautological to state that captioners describe sound.
Music is not the only stream of non-speech sound that is rendered. The term is a bit strained and academic, but non-speech information, used in captioning research, better encapsulates the full range of sounds other than words that are rendered in text. NSI is the acronym.
The Manual means well in one section but makes a mistake:
[p. 23] ¶ The art of off-line captioning involves making creative and informed choices about what to include in a caption script, and descriptive captions should never be included at the expense of dialogue. Negotiating space and time limitations while simultaneously crafting the most accurate representation of the story possible is a constant challenge, and while descriptive captions can do a great deal to enhance a viewer’s understanding of a program, there are situations where their use is more appropriate than others.
Actually, if a cellphone starts ringing while Carrie and Samantha are chatting over brunch, and keeps ringing, and then finally, after 30 seconds, Carrie turns around and hollers “Are you going to answer that phone?” then you may well have to interrupt the dialogue while it is unfolding to indicate that the phone is ringing continuously.
Anyway, the rest of the paragraph is disingenuous given that pop-on captions can appear in multiple blocks. You can caption the dialogue and also the NSI. One has quite enough room in roll-up captions to do the same, presuming they are prewritten; you can easily fit in (phone ringing at other table) or equivalent.
[p. 8] ¶ The caption editor assigns a time code address to each caption as well as a position code.
“Timecode” is one word. Captioners demonstrably do not assign timecodes to each caption. Some captioners may be forced to do so by outdated software, but it is not necessary; proper software handles that drudgery for you. In a long monologue, for example, all the captioner needs to do is provide an in time for the first caption and an out time for the last and the software fills in the blanks.
Timecode is not even really manifest in the ultimate captioned submaster (not “sub-master”). Captions just appear and disappear at certain moments. Timecode really only exists up to the encoding stage. I’m not sure it’s that important.
[p. 9] ¶ Pop-on captions are most commonly used for dramas and sitcoms, movies, documentaries, and music videos.
It should be stated that pop-on captions are suitable for any programming. After all these years, the only programs for which I can see an exemption are prerecorded shows with continuous breakneck speechrates (Iron Chef could be a worst-case example) where buildup time would require so much editing that few captions would appear verbatim and it would require an entire workweek to caption the show (as mentioned later on [p. 9] of the Manual).
It is important to outlaw the use of roll-up captions for fictional narrative, arts, and music programming and require the use of pop-on captions for same unless there is an incontrovertible reason to use the former, which comes up rather less often than Canadian broadcasters would like us to believe.
[p. 9] ¶ In the off-line situation, roll-up captions are normally reserved for programs that have a live flavour, such as entertainment, sports and news magazines, awards programs, and soap operas.
You’re deliberately mixing up current practice with preferred practice. What does “live flavour” mean? Not many shows air live these days. Entertainment Tonight and its ilk certainly do not. Soap operas are almost always pre-recorded, with extremely unusual live segments, and in any event are fictional narrative programs that should not be captioned using roll-up unless there is no alternative. (I distinctly remember when soap operas started to be captioned. Pop-on was used, as it should be. Later, a combination of too many soap operas having to be captioned all at once, apparently tighter turnaround times, and apprently stingier production budgets forced the use of roll-up captions, including the entirely improper use of real-time captioning.) By the Manual’s own logic, use of roll-up for soap operas should be “strongly discouraged.”
[p. 10] ¶ This method can also be used when the program itself is pre-recorded and the text is created ahead of time, but there is not enough time for encoding in off-line mode.
I am not convinced that encoding time (the actual length of the program plus time to set up tapes) is the constraint.
First, by the Manual’s own logic, you should be writing the word Teleprompter as TelePrompTer, the perverse corporate orthography. It is in any event not a generic word yet (even Xerox isn’t xerox yet), so it needs a big T.
Meanwhile:
[p. 11] ¶ This type of captioning is mostly applied to news shows and soap operas that are 100% scripted. It is only appropriate when a script has been prepared and is available for an entire broadcast, and when there is no ad-libbing or improvising. Any other application is strongly discouraged.
You can’t caption a news show with Teleprompter or ENR captioning and have it be legit under CRTC regulations. All sizes of television station are required “to caption... all local news programming, including live segments, using either real-time captioning or another technology capable of producing high-quality captioning for live programming.” Teleprompter or ENR captioning simply cannot meet that standard unless speakers never deviate from the script, nothing is ever ad-libbed, and speech is the only significant audio source. When are all those true at once?
Good real-time captioners (Canada does not have many real-time captioners who are not good, at least in the English language) can achieve over 90% accuracy when averaged over reasonable periods, like a one-hour show or a week of them. They also caption non-speech information. I challenge readers to point out a newscast where Teleprompter or ENR captioning can do the same. Its use for news should be outlawed in the Manual.
[p. 11] ¶ It is the caption stenographer’s responsibility to prepare his or her dictionary, entering names and vocabulary that he or she can anticipate encountering during the captioning of various programs.
It is the broadcaster’s responsibility to provide any available material such as guest lists, rundowns, key lists and so forth, which are necessary for the caption stenographer to prepare for a program.
Vocabulary that should be but generally is not included in steno dictionaries:
That’s a better list than before, but it isn’t sufficient quite yet.
[p. 13] ¶ In emergencies, call a stenocaptioner. Failing that, your station must have a plan with a chain of succession so that there is always someone whose job it is to type information in real time into an ENR or Teleprompter captioning system for display.
[p. 16] ¶ For fast-moving sports, play-by-play commentary is not captioned. Instead the captions are blanked, giving time to see the play, then captions continue after the play is complete.
That’s not true. That is merely one method of captioning continuous play-by-play. There is no testing of viewer preferences on this issue that I know of, but I don’t consider the practice harmful. Nor is the practice of captioning continuously harmful (in fact, the absence of continuous captions was part of the Vlug vs. CBC human-rights complaint, so writers of the Manual might proceed with caution). I challenge readers to prove one method is really vastly better than the other, though if our goal is to caption verbatim, I don’t see how we can support the idea of deleting entire categories of speech from captioning.
Remember, we are not captioning for robots. People can ignore the captions if they want! If the sports action is more interesting, they’ll watch the sports action.
During the Stanley Cup and the Winter Olympics, you can watch both methods yourself by comparing Canadian and U.S. hockey coverage airing at the same time, or Olympic figure skating.
There is no discussion of the use of paint-on captions as first or even also second captions when buildup time is tight. The thinking is that the letter-by-letter display makes it somewhat easier to read the whole caption in time.
Adding paint-on captions when a pop-up caption sits stationary onscreen (as in call and response of a music video or musical) works adequately well in my experience, though it is rarely seen.
The example under “Building captions” [p. 36] is too hypothetical and seems unwise anyway.
There’s a reason why we don’t underline in English [p. 18], and it’s called “italics.” Captions should follow English orthography.
Moreover, to toggle underlining on and off in Line 21 (save for colour-plus-underline combinations) requires an inserted blank space. That means end punctuation will end up underlined:
was added to the cast of Survivor.
The discussion of numbers is inadequate.
The overriding rule, as everywhere in captioning, is to follow English orthographic rules unless an incontrovertible case can be made otherwise. In punctuation, I can think of only a few such cases:
Quotations spanning more than one caption, covered in their own section.
Glawson’s physical therapists
concentrated his training
on the quadriceps-- thighs--
and gastrocnemius-- calves--
to compensate for wasting
that occurred in hospital.
In Inuvik-- formerly
Frobisher Bay-- teen suicides
have decreased
a mere 5% since new measures
were introduced in 1999.
(Liberties were taken with linebreaks. Spaces after the dashes can be elided by a linebreak, which is actually an advantage.)
To sum up the alternatives:
Glawson’s physical therapists
concentrated his training
on the quadriceps - thighs
- and gastrocnemius - calves -
to compensate for wasting
that occurred in hospital.
Glawson’s physical therapists
concentrated his training
on the quadriceps -- thighs
-- and gastrocnemius -- calves--
to compensate for wasting
that occurred in hospital.
I believe these are the exceptional cases. Everywhere else, follow the rules. You are typesetting English in the captioning medium; you are not writing some new language with its own rules.
Follow the real rules of English as authoritatively documented in style guides like Chicago rather than any half-baked misconception of what the rules are. I have met Canadian captioners who believe that commas are required after every single adjective (e.g., “long, brown hair”), and everyone seems to think that spaces go inside brackets (they don’t). Some captioners seem to think we use American double quotation marks with British logic, which in fact we do not.
Similarly, it may come as a shock to captioners to learn that or, so, and, but, and then used at the beginning of a sentence [p. 20] rarely require a comma right after (usually only when an actual pause is intended). However, a comma cannot be omitted before the name of an addressee. (It’s an iron-clad rule. Don’t believe me? Compare the correct “Come on, Eileen” versus the rather incorrect “Come on Eileen.”)
I return here to the youth and inexperience of Canadian captioners, nearly all of whom are a monoculture of young women with liberal-arts degrees and not enough life experience as readers and writers. Many, possibly most, captioners are trainable, but such training requires learning the established rules and following them.
Accordingly, the following advice is wrong:
[p. 20] ¶ It is essential that caption editors and caption stenographers make use of a variety of Canadian dictionaries and style guides to reach sound decisions about punctuation. Document your decisions and use them consistently.
You need a range of reference materials, but you must not pick and choose among style guides, which amounts to shopping for a justification for your own errors or biases. Canadian dictionaries are necessary (Gage and Oxford are the most recent), but U.S. style guides like The Chicago Manual of Style are authoritative and indispensable. I am not convinced Canadian offline captioners are so good at rendering the spoken word in the written that they know better than Chicago.
Words spelled out should be hyphenated, as the Manual suggests [p. 21], but the “double-letter” construct deserves attention.
But do not use hyphens in numbers. To
dial 911 is to
dial 911 and not
9-1-1.
Bearing 241 mark 73 is
bearing 241 mark 73 and not
bearing 2-4-1 mark 7-3. There are exceptions, like radio code:
10-4,
10-20. NYPD Blue–style station designations are hyphenated: I called McGuire down at the 2-7. (You could use
Two-Seven, though it’s a bit poncy.)
The manual’s coverage is not quite right. Here, “extended quotation” means a quotation spanning more than one caption.
Extended quotations in captioning – and note this excludes roll-up captions by definition – use a quotation mark at the beginning of every caption save for the last, which has only an ending quotation mark. (Periods and commas go inside quotation marks without exception; this is Canadian English, not British. Question marks and bangs rarely are set inside quotation marks unless part of the quotation: The question they ask is "What would Jesus do?" is correct, while What do you mean by "extra-virgin?" is not.)
Nested quotations follow the very same rules and continue alternating between single quotation marks (the apostrophe, used for the first level of nesting) and double. It’s entirely possible to have two levels of ongoing extended quotations (on an interview with Maya Angleou, I witnessed three); it’s also possible to have discrete quotations nested inside extended quotations. Don’t use italics for nested quotations (an NCI error).
For comparison purposes, note that the rule in print is: Every paragraph gets an opening quote; only the last paragraph also gets a closing quote. We must alter this rule in pop-on captioning because pop-on captions are discrete units. No quotation marks? It isn’t a quotation. Quotation marks at both ends? It’s a complete, self-contained quotation. I don’t know of any Canadian captioner who gets this right.
Note that roll-up captions are virtually never discrete captions the way pop-ons are. Hence the following usage, which I saw in the week before writing these remarks, is wrong in English orthography (unprecedented, in fact) and wrong in captioning:
The self is "not a very
"skilled and intelligent self,
"but allows animals to develop
"into the intentional, volitional,
"and cognitively selective creatures
they are."
In pop-on captions (setting aside justification and even line lengths for the moment), a correct usage is as follows:
I sat in his office
and he picked up this big sheaf of papers.
He goes,
"Let me read you what it says
in\Selleck vs. Globe."\
And he went on this big
spiel.
I remember it like yesterday.
"'The article complained of was called
"'"Tom Selleck's Love Secrets by His Father,"
"'and contained a number of statements
attributed to Tom Selleck's father
"'which disparaged and downgraded
"the romantic character
and capability
of his son.'
Well, I guess that's in the eye of the disparager."
And that's the guy's
\defense\lawyer!
The self is "not a very
skilled
and intelligent self,
"but allows animals to develop
"into the intentional, volitional,
and cognitively selective creatures they are.
"Descarte's faith in the assertion
'I think, therefore I am'
"may be superseded by a more primitive affirmation
"that is part
of the genetic makeup
of all mammals:
'I feel, therefore I am.'"
He told me, "She helped with the financing.
"Armistead asked me,
'Who's going to read
the narration?'
"And they said
'Lily Tomlin.'"
Then he groaned and tells me,
"The guy perks right up and says
"'No, it's fine.
"'She's going to use
this opportunity
"to come out and be honest.'
"So Armistead wrote
her coming-out speech after all.
And she cut it!"
I mean, you can only laugh
at something like that.
The section on quotation marks avoids an important issue.
[p. 21] ¶ Where quoting a passage from a book, play or poem etc., it should be captioned verbatim.
If necessary, you can remove an entire sentence or, under battle conditions, an entire phrase. The same applies to sacred texts like Shakespeare (I just recently saw my first-ever edited Shakespeare captions: 25-year-old women earning $13 an hour know Shakespeare well enough to edit him?) or any film with cult status, like The Rocky Horror Picture Show.
Other examples include historical speeches, song lyrics, and anything else that a viewer could actually look up.
[p. 22] ¶ Do not leave more than one space in a caption. A single space may be used after a period, colon, or semicolon as necessary.
2. It is common to leave a single space before and after music notes and parentheses.
3. There should not be a space between parentheses and the text enclosed within them.
The case could be made that, since we’re stuck with monospaced fonts, the orthography used in typewriting is applicable, namely the use of two spaces between sentences. Naturally, we use only one space after other punctuation, like colons or semicolons; “as necessary” is meaningless and authorizes untrained captioners to write no space or a raft of spaces after such punctuation.
Sections 2 and 3 are correct but too vaguely written.
The Manual is just a wee bit off here:
[p. 26] ¶ Punctuation in songs should be limited to occasional commas where phrasing requires it. Otherwise, there is no punctuation.
No. Use all necessary punctuation within song captions. (Why not? It’s English, right? But see below.) That includes quotation marks and extended quotations, which, believe, me, do come up:
| He’d say
"I know what I want |
| "and I want it now |
| "I want you |
| ’cause I’m Mr. Vain" |
Commas, en dashes, colons and semicolons, you name it – if it’s necessary, use it.
What gets elided are caption-ending commas and periods. Some captioners (actually, only CaptionMax, to my knowledge) attempt to caption songs as though they were complete sentences, with full end punctuation. Sometimes they actually are full sentences, as in opera or musicals. Generally, though, song lyrics are fragmentary. It is not hard to find examples in print orthography of elided end commas and periods.
Question marks at ends of song captions must always be used. Exclamation points should be used with discretion; there’s so much EXCITEMENT! in music that one is tempted to end every caption with a bang.
The Manual is silent about utterances delivered somewhat in time to backing music without being entirely sung. Poetry set to music, you might call it. I can cite a reprehensible example that nonetheless fits: “Rico Suave” by Gerardo. Also nearly anything by King Missile or Meryn Cadell; the music video “Invalid Letter Dept.” by At the Drive-In is the best example yet seen.
Some kind of explanatory caption is desirable (I don’t have a ready-made example for you), then proceed to caption the doggerel as though it were music but without staffnotes.
One very old Canadian captioning centre (derived from an old development agency) goes to eye-crossing lengths in slapping a new caption onscreen after nearly every shot change. Yes, they’ll break a ten-word sentence into three captions. Yes, they’ll break a caption after the word the; they’ll break a caption anywhere at all. They’re still doing it nearly 20 years on, demonstrating thoroughgoing misunderstanding of captioning and outright contempt for audiences, who will find reading 300 fragmentary captions harder than 100 intact captions.
I mention this because the Manual does not emphasize the important distinction between scene and shot changes. There are few cases where a caption can persist across a scene change; there are quite a few cases where a caption can persist across a shot change. (Indeed, in commercials, music videos, and twitchily-edited TV shows, you have no choice.)
I would also point out that experience shows the cleanest breaks around any kind of visible edit leave the last frame before the edit and the first frame after the edit blank. (Remember, we’re using a two-frame blink rate, not zero or four.) Captions then look like an immutable feature of the program rather than something raggedly tagged on afterward, as Canadian captions tend to do.
The full topic here is timing of captions, not shot and scene changes. It’s too important to be discussed in a single page, in part because nearly everyone in Canada gets caption timing so terribly wrong. The section in the Manual requires expansion.
Setting multiple simultaneous caption blocks at different locations onscreen isn’t the only way of denoting multiple (near-)simultaneous speakers. The Captions, Inc. approach (derived from subtitling and only subtly mishandled) also works. In fact, it probably works better than discrete blocks.
- Are you both there?
- GEORGE: Yes.
- MELINDA (whispering): Yes.
Beats the hell out of what everyone here is using, doesn’t it?
Given that we are now captioning in mixed case with speaker IDs in upper case, combinations of speaker ID and NSI use mixed case:
There is too much variety in delivery styles and the sorts of information that must be notated to rely on a single style. Just keep everything on its own line whenever possible, though it sometimes is not, as in:
PRIME MINISTER CHRETIEN
(translated):
Hence, the following actual Canadian malapropisms are to be avoided: Too much punctuation, all of it nonstandard, the whole thing too difficult to read.
Shakira: [ Singing ]
[ "Objection" ]
O-Town Group: [ Singing ]
[ "We Fit Together" ]
Margaret: [ On radio ]
[ Stammering ]
Also, do not use slashes when more than one effect happens at once.
Police officers:
[ Grunting/
chuckling ]
Do you mean “and” or “or”? Are they grunting or are they chuckling?
Another topic never handled correctly by Canadians. Translators work in the written medium; interpreters work in signed and spoken languages. Captioners could never caption a “translator.”
Don’t use an ID like Voice of interpreter: because what we usually caption are voices.
Here are some useful correct examples:
The Manual ambiguously states:
[p. 38] ¶ If there are long music segments within a roll-up captioned program, music lyrics must be in pop-on style.
Lyrics can roll-up only when they are unusually fast, or when there are just a few lines of singing interspersed with conversation. Even if rolling up, lyrics must follow pop-on rules of division, and music notes must appear at the beginning and end of each lyric.
All sound and music descriptions should roll up, but follow pop-on rules of division.
The first edict is unnecessary and probably unworkable for nightly talk shows. More important is staffnote placement. I don’t see any reason not to place beginning and ending staffnotes at exactly the same places you’d insert them were you using pop-on captions. The other option (inconsistently applied by the Caption Center) seems to be to use an opening staffnote at the very beginning and a closing one at the very end.
Note that one still requires >> and speaker IDs in music captioning. Hence the following would be correct:
(band plays "The Good in Everyone")
>> PATRICK PENTLAND: | First off,
here's what you do to me |
| You get rough,
attack my self esteem |
| It's not much,
but it's the best I've got |
| And I thought you saw the good in everyone |
>> ALL: | Ooh, the good in everyone |
| You see the good in everyone |
| You see the good in everyone |
(instrumental break)
Don’t blank roll-up captions just to show a single pop-up caption. In other words, do not clear three lines of carefully-composed real-time captions to display a poorly-written NSI caption like ( Cheers/Applause ) or, worst of all, [ ||| ].
It’s debatable whether “Canadian English-Language Broadcasters’ Closed Captioning Standards and Protocol” really is possessive rather than descriptive. That is, I doubt we need the apostrophe.
Anyway, it’s a very long adjective chain. Rewrite it as “Closed Captioning Standards and Protocol for Canadian English-Language Broadcasters.” Among other things, it puts “closed captioning” first.
[p. 7] ¶ The policy also requires that all such licensees close caption at least 90% of all programming during the broadcast day by the end of individual licence terms.
A lay reader of the term “broadcast day” will incorrectly assume that all 24 hours are covered. The Manual should mention that the CRTC-defined broadcast day ends at midnight. In fact, I’m not even sure when it actually starts – 6:00 A.M.?
[p. 6] ¶ In its call for applications for licences for new digital, pay, and specialty television services, Public Notice CRTC 2000-22, the Commission stated that it expects applicants for new services to commit to close captioning at least 90% of their broadcast day by the end of their licence term.
However, the facts of the actual commitments are now well established. Any captioning or description commitment by broadcasters undertaking Category 1 or 2 digital specialty services was rubber-stamped, ranging from no commitment to 90%. Just as an example, here is FashionTelevision’s commitment: “In view of the still relatively sparse penetration of digital services and the corresponding likely his/her level of use of older recorded material, it is impossible to project the percentage of programming that will be captioned over the term of the license.” FashionTelevision was expected to caption 90%, but the CRTC endorsed the applicant’s reasoning.
The entire discussion of CRTC requirements is so incomplete as to be misleading. Nothing short of a “requirement” or “condition of license” has any enforceability with the Commission, which has never once fined or punished a broadcaster for failing to live up to captioning commitments. An “expectation” of 90% captioning or “encouragement” to achieve same has no practical relevance.
This entire discussion should perhaps be scaled back considerably or, if it’s intended to be substantive, it must be complete even if the result does not entirely gladhand or congratulate the broadcast industry.
[p. 8] ¶ Labour Intensiveness: It may take 18 hours or more to off-line caption a one-hour program depending on the complexity of the program, speaking rate, rate of scene change, and difficulty of topic.
An “hour” of television on commercial TV can occupy only 48 or fewer minutes – 20% shorter than a full clock hour.
[p. 10] ¶ A skilled caption stenographer listens to a program and writes the speakers’ words in what is called steno, a form of phonetic shorthand based on a special 24-key keyboard where common words are written in one stroke of the hands.
I am not sure why we can’t say that steno is a shortened form of the word stenography, and define it (“The art or process of writing in shorthand”). “Steno” seems to be used in a more sweeping sense to refer to the entirety of a stenographer’s theory, briefs, and lexicon.
Also, few words can be written in one stroke (pressing and releasing one or more keys and doing nothing else).
[p. 11] ¶ Spelling: Mistakes in real-time captioning often look like spelling errors. However, they are usually the result of a mistranslation by the computer, or what is referred to as an "untranslate," a phonetic rendering of a word that was not pre-programmed into the caption stenographer’s dictionary.
Not a very solid explanation. Mistakes in real-time captioning are often clear-cut operator error (like mishearing the word or not knowing which homonym to use). The intent here seems to be to explain incomprehensible real-time-captioning errors. I would expand this to more than a single paragraph or delete it.
The Manual is rife with copy errors. Hyphenation is particularly poor. I can edit a later version.
We have, in effect, the standard conundrum of typography: We’re using the medium to discuss itself. A manual on accuracy and completeness of text transcription has to be very accurate and complete in itself.
I can heartily recommend that the Committee hire Moveable Inc. to proof the final report. It will cost good money, but you get what you pay for. I write clean copy, but I persuaded my publisher to hire Moveable to proof my book on Web accessibility. Only a few outright errors were caught, but Moveable made 2,000 suggestions concerning the 430 manuscript pages, the majority of which were enacted.