Comments on U.K. guidelines on audio description

Background

The Independent Television Commission (ITC), a regulatory body overseeing certain television channels in the United Kingdom, has produced guidelines for captioning (the U.K. term is “subtitling”), sign language, and audio description on British television.

The guidelines are available on the Web. Also:

Codes & Guidance Notes is the gateway page.
Two other guideline documents are available, for captioning (read my comments on those guidelines) and sign language.

The Guidelines are extremely valuable: They are the only publicly-accessible set of instructions for audio describers. Moreover, they are not proprietary: Though called Guidelines, in effect they are requirements that any U.K. describer must follow (though there is no apparent enforcement mechanism for noncompliance). By comparison, training materials by the Descriptive Video Service or even the manifestly incompetent AudioVision Canada (see analysis) are not posted publicly. (It is, however, possible to order an AudioVision training kit.)

These comments point out the successes and failures of the U.K. Guidelines, and provide warnings for inexperienced describers who might be tempted to take the Guidelines’ advice too literally.

Pros

The Guidelines were conscientiously developed by intelligent and educated staff after extensive field testing and interviews across Europe in the so-called Audetel project, for which there is no definitive online resource. The Guidelines get a lot of things right, or nearly so.

In an extended passage, varying the structure of succeeding sentences can make the description more tolerable. Check the last sentence in this example from the Guidelines:

Karen picks up her clipboard and walks in front of the shattered dressing-table mirror. [Pause] She picks up a blood-stained lace mat. [Pause] She touches the corner of the red-spattered sheet on the bed. [Pause] She walks alongside the bed. [Pause] From the floor, she picks up a broken, silver-framed photograph of the missing woman smiling broadly with her arms around a teenage boy.

However, for greater variety, as the Guidelines themselves advise elsewhere, use the character’s name more often.
The Guidelines handle describing for children quite well. Having sat in on a DVS recording session for Aladdin, I can attest that children’s shows are not ruined by a punchier delivery. (“The bee is Genie!” the narrator said. “Can we have a bit more oomph, more of a sense of wonder?” Gerry Field said, and ran a re-take.) The example the Guidelines give works beautifully right on paper, and immediately conjures images of Disney elephants, which explains why the seemingly interpretative term “undeterred” works here. Can’t you imagine Mrs. Jumbo stopping for a second, frowning, setting her jaw, and trying again? The passage in the Guidelines reads:

Inside the wagon, Mrs. Jumbo’s eyes widen and her face broadens into a big smile as she sees Dumbo’s little trunk appear through the bars on the window. She lopes towards him, but the chains on her feet stop her from reaching him. Undeterred, she stretches her long grey trunk out of the window and searches for him like a hand in the dark. She finds Dumbo’s little trunk, strokes his face, and rubs his cheek. Dumbo smiles and wraps his trunk around hers. Two plump teardrops fall from his eyes... he wipes them with his mother’s trunk.

Nice, huh?
The document mentions an absolutely essential practice that’s so small it is often overlooked (viz., AudioVision):

Setting the scene is an essential part of audio description. Scenes change in a matter of half-seconds and without guidance the visually impaired viewer can quickly lose the thread of a story or narrative. There may only be time to say one word, but it gives the viewer a starting point. “Now...” can indicate a change of scene: “Now on the stairs,” “Now outside,” but it should not be overused. Any word that appears too frequently in a description becomes a distraction. [...] “Indoors,” “Upstairs,” “In the bedroom,” “That night,” “The next morning,” is more effective.

Note: Don’t use a phrase like “the next morning” unless you can actually prove it really is the next morning and not merely morning. Do not draw inferences and give them voice. Again, this is a lesson AudioVision fails to grasp.
Very smart on describing musicals, though the context is kids’ shows: “Describers should avoid speaking over songs where possible, but if vital information needs to be conveyed, it should be fitted in after the first verse or during repetitions in the song or during instrumental passages.”
The Guidelines glancingly mention an issue that absolutely will come up. “To many [viewers], expressions like in close-up, pan across, mid-shot, crane shot, etc., may not mean anything, but it is important to try to understand why a director has chosen to film a sequence in a particular way and to describe it in terms which will be understood by the majority, if there is room to do so.” Describers must be aware that they are interpreting the intentions of the director as expressed in visible (and, in rare cases, audible) cues.

However, as with interpreting facial expressions, directorial devices you describe must usually be completely obvious and unambiguous to any reasonable observer. (A reasonable keen observer, not a twit.) The example given by DVS involved a palisade in Henry V, which was described using exactly that term, followed by a short definition. “A fence of poles forming a defense barrier” had military importance in the play and was more than merely a fence of poles. The palisade was there for an obvious reason and had to be mentioned by name. Same with the name of the traditional Chinese dress, a cheongsam, in The Joy Luck Club.

You could come up with your own examples: The recurring movement of rings of light on Terence Stamp’s face in The Limey; the black robes of villains and the cream robes of heroes in the Star Wars movies; symmetrical compositions in Stanley Kubrick films; filters on images of the sky in Gattaca; repeated splitscreens in Run Lola Run. As the Guidelines tell us, not only is it possible to bring a taste of the director’s vision to a blind audience, it is required, where time permits.
The document recounts an experiment in which the film Monsieur Hulot’s Holiday was described almost extemporaneously by a comedian:

During the Audetel trials, the late comedian and broadcaster Willy Rushton was asked to describe the first few minutes of Jacques Tati’s classically whimsical film Monsieur Hulot’s Holiday, Rushton was given a script but asked to be quite free with his interpretation:

[Noise of a car backfiring]

“Monsieur Hulot, as you may have gathered, has arrived. He lopes in through the door conveniently marked L’Entrée de l’hôtel. Now a country road. A small veteran car chugs into view... attached to its right side, one of those butterfly nets, and a fishing rod standing upright like a sort of flagpole. The little car spews out smoke as it splutters along.”

The people who remembered seeing the film laughed a lot and thought the description excellent. They enjoyed Rushton’s opinionated asides and characteristic comments. Others, who had never seen the film, enjoyed listening to Willie Rushton but not having experienced the visual jokes for themselves, and no film dialogue to vary the rhythm found their interest waning after a while.

I am quite opposed to the use of celebrity narrators unless they have the chops to subsume their personality into the job of narration. If you are ever more than trivially aware that it’s Jack Nicholson (or, worse yet, James Cameron) talking into the mike, you are dealing with substandard description.

This sort of thing, though, sounds like a tremendously fun and rewarding idea, though it might really work only in Britain and only with comedians who are well-known. It might work even though it defies any number of rules of thumb, like keeping your voice even, refraining from editorializing, and disapperaing into the program. Sometimes exceptions prove the rule. (Kids’ shows are very often described in a way that just wouldn’t cut it with programs intended for adults. This may be another example.)

Cons

The Guidelines go very seriously awry in their advice for, and examples of, the grammatical tense to use in writing descriptions.
An audio description is a commentary [that] tells the viewer what is happening at a given moment, so it should be in the present tense, using the present continuous for ongoing activities. The opening of the film Dead Poets’ Society:

A wall painting of a class of adolescent boys, all with short haircuts, wearing ties and sports jackets. In front of the painting, a boy, aged about eight, in a red school cap, is having his tie adjusted. A teenage boy in a Scottish hat opens his bagpipe case, carefully fitting the pipes together. A master focuses a camera on the eight-year old, as an older boy in a boater puts his arm around the smaller boy. The bulb flashes. A white candle is lit. Another master is whispering instructions to an elderly former pupil.

The mixture of simple present and present participle gives the text a better narrative feel. If the simple present is used throughout, it can sound abrupt. Where there is the luxury of enough time, a description should read like a piece of writing that makes sense on its own. Situations can be put in context and the describer can sometimes refer back to an action, if there is time. From Close Encounters of the Third Kind:

The little boy has slipped out of his bed and is padding down the stairs in his Boston University T-shirt over his pyjama bottoms towards the open porch door and the bright light outside.... he turns his head towards another sound.... he toddles into the kitchen and stares wide-eyed at the mess on the floor. He raises his head towards the noise, his big round eyes fascinated. His mouth opens in calm surprise.

What the Guidelines call the present continuous tense (the progressive aspect) must be used with great caution. The describer talks about what’s happening in real time. Even if the action a character takes is a process, like unpacking the groceries or mixing a martini, it is extremely unwise to use the progressive aspect: "Mrs. Brady is unpacking the groceries." "Justin is mixing a martini."

Why? The progressive aspect harkens back to sighted friends telling you what’s happening as you both sit in a movie theatre. “What’s happening now?” “Mrs. Brady is unpacking the groceries.” “What’s happening now?” “Justin is mixing a martini.” Even though the question “What’s happening now?” is of course not part of the audio description, the use of the progressive aspect keeps resetting the clock: He’s doing this. Now she’s doing this. Now he’s doing that. You’re no longer watching a flowing program that unfolds moment by moment: He does this. Then she does this. Now he does that.

Also, this practice makes the editorial voice of the describer too apparent. “Well, I will begrudgingly tell you what is going on, if you insist. A boy, aged about eight, in a red school cap, is having his tie adjusted.”

It is very hard to come up with examples in which the progressive aspect is mandatory. A case where one character interrupts another might work, and even then a construct using “as” or “while” works better and is simpler:
- As Marc ties up a black garbage bag, Melissa pokes her head around the corner.
- Martha stifles a yawn while the clerk, gazing absently through the window, talks on the telephone.
Further, DVS hides recently-completed actions in the present tense (the indicative aspect), and, while this may merely be a question of exposure, it works very well. To use the Close Encounters example: “The little boy slips out of his bed and pads down the stairs in his Boston University T-shirt over his pyjama bottoms towards the open porch door and the bright light outside.” (What an unwieldy sentence.) In this case, the fact that the boy slipped out of bed well before anything else in the description phrase took place is neither here nor there, frankly.

Another poor example given in the document:

Waving their arms, they run towards the platform. [Sound of a train pulling away] The train is pulling out of the station.

The train is pulling out of the station and... and...? Superman stops it? The train derails? Loretta hollers “Stop the train”? It runs right over Olive Oyl? Rather, the train pulls out of the station.

Another example is merely clumsy:

Outside in drizzle a small Mini is being driven very slowly. Reg is at the wheel.

Rather, “Outside in the drizzle, a small Mini passes slowly by.” I think we can assume this Mini is not the spawn of Killdozer and that some person is behind the wheel. But wait! We hear that right in the next sentence!

As you can see, the use of the progressive aspect must be individually justified, and only if every other option has been attempted and clearly does not work.

The iffy parts

Since audio description relies on razor-sharp and highly precise writing skills, it was a bit of a surprise to run across so many ambiguities which, if followed to the letter by inexperienced describers, might well result in description even worse than what we’re stuck with in Canada.

The Guidelines state: “There are three golden rules to description: describe what is there, do not give a personal version of what is there and never talk over dialogue or commentary.”
- Actually, the first golden rule is best expressed as “Describe what you see.” More can be “there” in a scene than is visible. There may be tension in the air in a scene that, while perceptible, is not visible. It is too tempting for beginner describers to editorialize; they could fall back on the justification “But all the hints were there.” Note that even the dictum “Describe what you see” is inadequate: There are occasions when a sound must be described because its origin is ambiguous – for example, during a quarrel, is it John or Mary who sighs loudly? The point is that “Describe what is there” leaves too much wiggle room.
- Similarly, the overly broad disclaimer “The research revealed that there are many definitions of a successful audio description, not merely because describing styles differ, but because there are many fundamental differences in audience expectation, need and experience” will invariably be misconstrued and deployed by incompetent describers, as in the Canadian example, as a justification for their own limitations. You can just imagine the hauteur with which a company spokesperson tells a reporter “Well, I think we can all agree there are many styles of audio description, and we think ours is the best of the lot.” Trust me, I’ve gotten this before with captioning.
- “Do not give a personal version of what is there” should really say “Do not give personal opinions, project from what you actually see, or jump to conclusions.” That’s not an exhaustive list, but the fact remains that audio describers very often may provide “a personal version of what is there”: In many cases, a description of facial expressions is necessary and valid, and to do so requires more than a pseudo-objective cataloguing of onscreen facts.
- There are plenty of occasions when it is absolutely necessary to describe over dialogue. This so-called golden rule should be removed altogether.
The Guidelines use the example of The Silence of the Lambs to illustrate setting up tiny plot details that an alert sighted viewer might have noticed. But the Guidelines completely blow it. They state:

Hannibal Lecter steals a pen from his jailer, Dr. Chilton. At that moment, there is no opportunity to mention it, but it is crucial to know about it in the horrific “cage” scene. At a convenient moment, the description says: “Chilton fumbles for the pen he last had in Hannibal’s cell.” The audience now knows the pen is missing and its re-appearance in Hannibal’s hands makes sense.

Except this does not represent what actually happened in the film. Chilton misplaces the pen and Lecter stares at for an extended time. Sighted viewers know full well that the pen can act as a weapon. The describer should have told us what was going on right when it happened even if that required describing over dialogue. (If the dialogue were an important plot element, describe in advance. Under absolute battle conditions, once the crucial dialogue ends and Chilton leaves Lecter’s cell, say “While in the cell, Chilton leaves his pen on a chair, which Lecter stares at, wide-eyed and unblinking.”

The solution proposed by the Guidelines amounts to “Oh, yeah. We forgot to mention this before.” You can’t describe what happens in a film 45 minutes after it takes place.
Ironically, the quality of the writing, and particularly punctuation, in the Guidelines often makes it hard to understand the point being made. Quoted descriptions are particularly badly handled; they look like a hurried real-time captioning transcription, with one word after another and another until a period finally occurs to terminate the stream. This explanatory passage is quite unclear:

Work carried out within Audetel using a police drama and 62 elderly subjects revealed that comprehension of the plot and enjoyment of the programme were enhanced by the presence of an audio description. It was not obvious, however, that such a benefit would accrue. The study had to examine the possibilities that the description might have been a distraction, it might have become confused with the main programme dialogue and it might have overloaded the cognitive resources of the viewer.

What wasn’t obvious, and when? Going into the study, was there doubt that elderly subjects could keep up? If the study had to examine the chance of distraction, did it do so? If the descriptions “overloaded the cognitive resources of the viewer,” whatever that means, what is the proposed solution? Describing less often? Describing using simpler words? (We tried that with captioning in the 1980s. It didn’t work. You cannot talk down to an entire audience to avoid confusing a minority.)

Quote from the Guidelines:

Some films [with] more action than dialogue (and often... a continuous musical soundtrack) require almost continuous description and this can prove tiring to listen to. If the gaps between dialogue or commentary are too short, the audio description is more of a hindrance than a help. Where practicable, therefore, each programme or episode should be assessed for suitability by an experienced audio describer.

There seems to be an implication that certain programs should not be described because they require too much or too little description, and that certain episodes of a series should be individually excluded from description plans. The Guidelines go into great detail about certain genres that work poorly with description, like newscasts, and that’s easy to agree with. A very fast dramatic or comedic series, like The Simpsons, could and should be described. And the Guidelines contradict themselves: Long periods of description are ill-advised in certain genres of film, but are OK with Close Encounters of the Third Kind (which the Guidelines tell us required two weeks to describe) and nature documentaries.

Some shows require extended description and some don’t. Is accessibility more important than a rather twee, precious checklist of pretexts to exclude shows from description?
The Guidelines state:

Potentially the largest audience to benefit from audio description is simply those sighted people who do not always wish to direct their visual attention at the television screen.... The ability to record described programme and movie soundtracks on audiocassettes and to play them while driving or travelling on a train is also a major benefit for sighted viewers. Audetel examined the implications of these forms of entertainment on the move for broadcasters and rights owners, and discovered significant commercial potential for the sale of pre-recorded described audiocassettes of popular programmes.

I agree with the sentiment, but see no practical way to persuade any viewers but the most technically keen (that is, the under-35s) to switch on audio description. Even turning captions on and off requires hunting around in onscreen menus on nearly all televisions. Unless and until there are simple, dedicated, standardized buttons for descriptions on/off and captions on/off located on TV remote controls, the fact that the controls are buried in menus will render these useful access provisions all but invisible. I note this because the impression I get from the Guidelines passage is of an undifferentiated “general” viewer, i.e., middle-aged and not technically adept, who would ostensibly activate descriptions while looking away from TV. Marketing audio-only recordings is a separate issue, and rather more plausible.
On the topic of “regional accents,” here’s what the Guidelines say: “Occasionally a slight regional accent may fit the bill, but each programme has to be assessed separately. [While] a regional accent for a regional programme might seem logical, in practice it is quite difficult to find the accent which suits everybody. For example, there are many Lancashire variations, whereas standard English is at least understood by most people.” The issue is complex in the U.K., with its hundreds of regional dialects and the dominance of Received Pronunciation – the type of English the Royal Family or prestige newsreaders speak. By “standard English,” the Guidelines clearly mean “newsreader accent,” or Received Pronunciation.

Canadian and American descriptions of British programming will always exhibit a contrast between the actors’ many accents and the describer’s. (Same with Australia, one would think.) But we live with this: As the Guidelines tell us, local vernacular must always win out.

Bilingual viewers in Scandinavia were asked if they would prefer an audio description in English or in their own local language when it accompanied an English-language import. The English audio description would have the advantage of matching the programme language and might be available to purchase with the programme but it would not reflect the culture of the target audience. A description in the local language could help to clarify any difficult English programme dialogue, and would avoid any miscomprehension of an English description which might arise. Opinion was unanimously in favour of a local language description.
On the topic of foreign-language films, the Guidelines are very unclear but, after much puzzling, I have figured out that they endorse the best practice: If the dialogue is provided without subtitles or dubbing, do nothing special. If subtitles appear, read them out loud. If the dialogue is dubbed, do nothing special, unless you have to wait a moment for consecutive dubbing. (You find this usually in news programs, where the guest speaks a foreign language for just a moment and an English-speaking voice then reads a translation. Dramatic films are usually dubbed in synchrony, a topic unto itself.) The Guidelines say “Translating the spoken lines might be interpreted as spoon-feeding and not what the programme producers intended.” Spoken dialogue is not visible and must be left alone, with rare exceptions (e.g,. clarifying who said what if it is ambiguous from audio alone).
The Guidelines, moreover, appear to suggest that a foreign-language film would have to be dubbed if it were intended for audio description. In other words, we imagine a scenario where a British broadcaster commits to describing Friday-night films. Despite the fact that not a lot of Britons speak a language other than English, and even then they’d have to speak the language of the movie, the broadcaster lines up a foreign-language film with no dubbing or subtitling. Then, suddenly, to provide A.D. for this film, we have to go to the expense of dubbing it. First, this scenario would never happen in a million years. Second, the Guidelines elsewhere tell us that Europeans prefer A.D. in the local language even in foreign-language programming. What’s sauce for the goose is sauce for the anglophone. If you’re given an all-French movie to describe for an English-speaking audience, describe it in English. What’s the problem?
The document is very seriously mistaken in its advice to catch viewers up on what’s previously happened in a series and to explain dialogue.

[A] woman police officer... says: “We’re going to need CID and SOCO.” A moment later a car draws up outside and a woman gets out. She takes a metal attaché case from the boot of her car and goes into the house. CID is a well-known term but SOCO, though perhaps familiar to regular viewers of police series, may not be generally known. Explaining SOCO dispels any doubts that might creep into the audience’s mind.

An extremely dangerous suggestion. Is our goal to describe visual details or provide a kind of social work for blind viewers to provide an advantage over sighted viewers?

“We’re going to need CID and SOCO” is a line of dialogue. It is audible and invisible. A blind viewer is at no disadvantage whatsoever in hearing the dialogue. This is not a case where the source of the dialogue is unclear from audio alone but unambiguous to sighted viewers.

If a sighted viewer has to sit there and wait to figure out what SOCO refers to, so should a blind viewer. And a deaf viewer, too: The line would be captioned and they’d have to wait to puzzle it out, too. Let’s not play favourites.

It gets worse: The Guidelines actually suggest providing a recap of previous events on a soap opera. Funny, sighted viewers don’t have that advantage. How does this practice fit with the golden rule “Describe what is there,” let alone “Describe what you see”?
“The two burly men are not mentioned by name in the film, but without a name they would have to be described as the fatter, shorter man or the taller, dark-haired man, which is cumbersome, if repeated.” If the men are not named in the film, do not name them. And the rationale given is unrealistic since the description of one of these men relies on exactly the kind of physical characteristics deemed cumbersome: “She is greeted by two burly men. Derek, the shorter and balder of the two, pays the cab. Gordon opens the door.”
Our British colleagues are unfamiliar with what really happens with live events and how this affects description.

Live ceremonial events such as the state opening of Parliament or the inauguration of the Olympic Games are normally narrated with an understanding that there will be visually impaired viewers watching. Usually no more description is necessary.

Really? Where is this true? I watched the Sydney Olympics on three stations in two languages and there was a thorough expectation that viewers could see what the commentators were talking about. In any event, while live description has only rarely been attempted, it is not impossible, and the British should give it a go now and then. If I am not mistaken, no program types are exempted from U.K. audio-description requirements; eventually, techniques for live description will have to be developed.

An entire analysis of the differences between live audio description and play-by-play is available. Authors of the Guidelines really ought to read it.

Conclusions

The Guidelines need significant updating, but are a tremendously valuable and worthy addition to the extremely limited scholarship and portfolio of practical advice on audio description.

You are here: joeclark.org → Captioning and media access →
Audio description →
Comments on U.K. guidelines on audio description

See also: Comments on U.K. guidelines on “subtitling”

Updated 2003.01.17, 2007.11.26