Accessibility research roundup
Here’s an occasional Weblog giving you a quick overview
and review of, and commentary on, some of the academic research
concerning captioning and audio
description.
Captioning
And in this section, we focus exclusively on Carl
Jensema of the Institute
for Disabilities Research and Training, the
international megastar when it comes to captioning research.
What’s he written lately?
Carl Jensema, “Viewer reaction to different
television captioning speeds”
American
Annals of the Deaf, 143(4):318–324 (1998)
- Abstract: Video segments captioned at
different speeds were shown to a group of 578 people that included
deaf, hard-of-hearing, and hearing viewers. Participants used a
five-point scale to assess each segment’s caption speed. The
“OK” speed, defined as the rate at which “caption
speed is comfortable to me,” was found to be about 145 words
per minute (WPM), very
close to the 141 WPM
mean rate actually found in television programs.... Participants
adapted well to increasing caption speeds. Most apparently had
little trouble with the captions until the rate was at least 170
WPM. Hearing people
wanted slightly slower captions. However, this apparently related
to how often they watched captioned television. Frequent viewers
were comfortable with faster captions. Age and sex were not related
to caption-speed preferences, nor was education, with the exception
that people who had attended graduate school showed evidence that
they might prefer slightly faster captions.
- Without a doubt, this study represents the most useful
hard data we’ve got about captioning. Simply put, this paper
tells us how fast captions should be.
- Interestingly, the “presentation rates” typically
used by experienced U.S. captioners are also the ones the subjects
of this experiment found the most comfortable. In other words, the
Americans are generally doing it right, at least when it comes to
caption speed (if not editing, placement, or correctness of
transcription).
- Subjects watched a set of custom-made 30-second videoclips with
caption speeds of 96, 110, 126, 140, 456, 170, 186, and 200
WPM. They marked
comfort levels on a five-point scale. The most comfortable reading
speed would be given a score of 3. The captions were pop-on, not
scrollup.
- Most subjects found slow captions a bit too slow and fast
captions a bit too fast. “A mean score of 3 would be
associated with a caption speed between 140 and 156 WPM. By means of simple
interpolation, an estimated ‘OK’ speed of 145 WPM is derived.”
- Fun little factoid: Hearing subjects, most of whom did not have
as much experience watching captioned TV as deaf or hard-of-hearing
viewers, were the ones who were hitting the panic button during the
very fastest segments. The 186- and 200-WPM segments showed the highest
scores of all subjects and all segments with the hearing
participants. The average score, in fact, for the 200-WPM segment was over 4 out of 5
on the difficulty scale – the only score that high in the
whole experiment. Jensema: “My basic conclusion is that the
more hearing [that] people had, the slower they wanted the captions
to be.”
- Jensema provides evidence, based on surveying his subjects,
that hearing people spend significantly less time watching captions
than deaf or hard-of-hearing people. It is not exactly rocket
science to conclude that experience makes you better at watching
captions.
- I would add a personal interpretation here. The entire
closed-captioning system was invented because
“surveys” done by NCI and PBS in the late
1970s showed that hearing people were quite opposed to open
captioning. I’ve read every paper I could get my hands on
since the mid-’80s on the topic of captioning, including the
very earliest ones, and I’ve never seen the results of those
earliest studies published.
- In other words, we invented an entire technology on the basis
of a single opinion poll.
- If you ask hearing people today if they would accept open
captions on their TV shows, they’d probably say no. If you
then ask “How much time do you spend watching captioned TV at
present?” I suspect you would find that those most resistant
to captioning had watched it the least. Most would have never
watched it at all.
- If, moreover, you stopped hearing people in an electronics
store, pointed to captions displayed on the bank of television
sets, and asked them if they would accept open captions,
the answer would again be a firm no. Who can blame them? Standing
too far away from too many screens and attempting to concentrate on
program dialogue and captions in a crowded, noisy retail store are
more then enough to predispose people against captioning.
- Yet if you asked a group of hearing people to watch all their
TV (and home videos) with captions for two solid weeks, it is my
submission that an unprecedented percentage of those respondents
would agree to either keep captions turned on or to accept a
certain amount of open-captioned programming.
- Closed-captioning is in fact the correct access method
in nearly all cases. But open-captioning can work, too. And
closed-captioning is not as unpopular with hearing people as is
assumed.
- In any event, hearing people are now the majority audience of
captioning, as I
have shown elsewhere.
- One must not interpret Jensema’s study as authorization
to edit down fast television dialogue to the magical 145 WPM speed. “As caption
speed increased, the respondents recognized this, but most seemed
able to adjust and did not appear to consider the captions
unacceptable,” even up to 170 WPM. “Only about 1% would
consider 141 WPM ‘too
fast.’ ” Irrespective of hearing status, caption
viewers can keep up with most captioning in the real world.
- It would be useful to conduct future research on the preferred
“presentation rate” of scrollup captioning. For
technical reasons, it’s possible to display scrollup captions
far faster than anyone could reasonably read them over the span of
an entire television program (300 WPM is easily attained). What
are viewer preferences for:
- live-display vs. stenocaptioning – in other words,
captions that scroll up as complete, ready-made lines or appear
word-by-word?
- captions at bottom, top, or moving between bottom and top (as
in many sportscasts)?
- all-capitals vs. upper-and-lower-case captions?
- short vs. long line lengths?
Carl Jensema, Ralph McCann, and Scott Ramsey,
“Closed-captioned television presentation speed and
vocabulary”
American
Annals of the Deaf, 141(4):284–292 (1996)
- Abstract: [...]Caption data were recorded from
205 television programs. Both roll-up and pop-on captions were
analyzed. In the first part of the study, captions were edited to
remove commercials and then processed by computer to get
caption-speed data. Caption rates among program types varied
considerably. The average caption speed for all programs was 141
words per minute, with program extremes of 74 and 231 words per
minute. The second part of the study determined the amount of
editing being done to program scripts. Ten-minute segments from two
different shows in each of 13 program categories were analyzed by
comparing the caption script to the program audio. The percentage
of script edited out ranged from 0% (in instances of verbatim
captioning) to 19%. In the third part of the study, commonly-used
words in captioning and their frequency of appearance were
analyzed. All words from all the programs in the study were
combined into one large computer file. This file, which contained
834,726 words, was sorted and found to contain 16,102 unique
words.
- The study gives a quickie history lesson of U.S. captioning and
makes a perfectly accurate but unsubstantiated remark: When it
comes to editing captions as extensively as was done on The
Captioned ABC News in the 1970s, “almost everyone now
considers [that] overediting.”
- Deaf viewers have asked for full access to the spoken words of
a program. “Caption companies have tended to interpret this
as meaning deaf people want straight verbatim captioning.”
But how close to verbatim are captions in the real
world?
- For the study, researchers recorded 180 hours of television in
many categories and 22 music videos. (Home videos were also
included, but they were apparently matched against films shown on
TV during the period of the study.)
- Captions were downloaded into a computer file and matched
against time criteria like total program time and appearance and
disappearance of captions. Special consideration was given to
roll-up (“scroll-up,” “scrollup”) captions,
whose lifespans are harder to pin down than pop-on
(“pop-up,” “popup”) captions.
- The study looked at the words edited out of certain selected
programs; the full corpus of words used in all programs and the
list of unique words in that corpus; and caption speed.
- “We found that roll-up captions generally present more
words over a given period than pop-up captions (151 WPM vs. 138 WPM), and that roll-up captions
are used for a wider range of audio speeds, from very slow (74
WPM) to very fast (231
WPM [!]).” It is
not clear just where the “evaluation of audio speeds”
came from or how it was done.
- Many genres of programming “tended to cluster around the
mean captioning speed of 141 WPM.”
- Length of words had no bearing on difficulty of reading. Shows
with very slow or very fast captions had essentially equal average
word lengths.
- Caption editing, if you look at Jensema’s numbers, is not
a huge problem. A subsection of all the recorded programs was
examined for degree of verbatim transcription; “the average
was 94% captioned,” with a low of 81% (explained as an
anomaly; the next one up was 87%) and a high of 100%.
Unfortunately, the qualitative aspects of edited captions could not
quite be considered.
- I remember, back in the dawn of closed-captioning, that the
ill-informed caption “editors” of NCI laboured to shave
individual words off captions. The assumption was that reducing
word length, even by a single word, made a caption easier to read.
- This, of course, is ridiculous: With rare, specific exceptions,
people do not read word-by-word. The eye pogos along the line in
so-called saccades, and words are recognized more by
shape and outline than letter-by-letter.
- NCI would take this mania to extremes, turning copula verbs
into contractions, even when attached to long noun phrases:
“The prime minister of the United Kingdom’s expected to
land in Washington within the hour.”
- I rarely see egregiously inappropriate editing in American
captions – those produced by the big names, at least. It is
entirely commonplace in Canadian captioning to witness an all-out,
indisputable butchering of the source text, particularly for
programs captioned by inept neophytes working in
broadcasters’ own in-house caption departments.
- Yet it is quite likely that even those butchered captions
provide more than 87% captioning, though the goal of equivalence of
access is not met.
- In analyzing the concordance or set of terms in the sample,
“[j]ust 250 words accounted for more than 2/3 of all the
words used in the captions.... [M]astery of fewer than 500 words
will help a viewer to understand most of the vocabulary in any
television program shown in the United States today.”
Carl J. Jensema, Sameh El Sharkawy, Ramalinga Sarma
Danturthi, Robert Burch, and David Hsu, “Eye-movement
patterns of captioned-television viewers”
American
Annals of the Deaf, 145(3):275–285 (2000)
- Abstract: Eye movement of six subjects was
recorded as they watched video segments with and without captions.
It was found that the addition of captions to a video resulted in
major changes in eye-movement patterns, with the viewing process
becoming primarily a reading process. Further, although people
viewing a specific video segment are likely to have similar
eye-movemen tpatterns, there are also distinct individual
differences present in these patterns. For exampole, someone
accustomed to speechreading may spend more time looking at an
actor’s lips, while someone with poor English skills may
spend more time reading the captions. Finally, there is some
preliminary evidence to suggest that higher captioning speed
results in more time spent reading captions than on a video
segment.
- Three deaf/hard-of-hearing and three hearing subjects sat in a
special apparatus and watched brief captioned segments on a
computer monitor. The apparatus followed the movements of the
subjects’ eyes, tracking exactly where they looked.
- Captioned and uncaptioned segments were watched without audio.
The image content was more or less comparable within matched pairs
of captioned and uncaptioned videoclips. Two additional segments,
custom-made for the experiment, contained precisely 80-WPM and 220-WPM captioning.
- For segments with no captions, eye movements tended to zip
around the screen, with the exception of a Peter Jennings newscast,
in which case eyes tended to focus on Jennings’ head (or
thereabouts).
- But for segments with captions, the preponderance of
eye gaze dominated at the bottom of the screen. “The addition
of captions apparently turns television viewing into a reading
task, since the subjects spend most of their time looking at
captions and much less time examining the picture.”
- Rather interestingly, subjects were re-tested with the same
videos a few days later. Subjects spent slightly more time looking
at the picture than reading captions. That was also true for the
hearing subjects, who had initially had less experience watching
captions (“only the deaf subjects watched it
regularly”).
- For the two special segments, the slow-captioned video gave
viewers more of a chance to watch the video, while the
fast-captioned video forced nearly all the subjects’
attention to the captions.
- “Viewers read the caption and then glance at the video
action after they finish reading.” Testify!
- Captions are alleged to be “distracting” to hearing
viewers.
- We have evidence here that viewers of all stripes tend to spend
most of their time watching captions. As we see from this study,
captions technically are distracting.
- However, the purpose of captions is to be read. Undistracting
captions are failed captions.
- Moreover, we do not have any evidence of understanding of the
program, or retention. Caption viewers will all tell you that,
after you get the hang of reading captions, you don’t really
miss the rest of the action. But you aren’t looking directly
at it very often. Do we infer that peripheral vision comes into
play? Based on my experience, that is clearly true. It’s just
that we don’t have any experimental evidence. It is still
possible for captioning detractors to claim that captions are
“distracting” and force you to ignore the all-important
main video.
Carl J. Jensema, Ramalinga Darma Danturthi, and Robert
Burch, “Time spent viewing captions on television
programs”
American
Annals of the Deaf, 145(5):464–468 (2000)
- Abstract: The eye movements of 23 deaf
subjects, ages 14 to 61 years, were recorded 30 times per second
while the subjects watched four 2.5-minute captioned television
programs. The eye-movement data were analyzed to determine the
percentage of time each subject actually looked at the captions on
the screen. It was found that subjects gazed at the captions 84% of
the time, at the video picture 14% of the time, and off the video
2% of the time. Age, sex, and educational level appeared to have
little influence on time spent viewing captions. When caption speed
increased from the slowest speed (100... WPM) to the fastest speed (180
WPM), mean percentage
of time spent gazing at captions increased only from 82% to 86%. A
distinctive characteristic of the data was the considerable
variation from subject to subject and also within subjects (from
video to video) in regard to percentage of time spent gazing at
captions.
- Captioning “is a much more complicated process than it
may seem and requires many decisions concerning timing and screen
placement.”
- Four silent custom videoclips, captioned at 100, 120, 140, 160,
and 180 WPM, were used
in the study.
- All 25 subjects were deaf. On average, subjects spent 84% of
their time looking at captions. The range was 82–86% –
near-uniformity, in other words.
- Along with the previous study, we have ample evidence that the
viewing of captioned programming involves spending most of your
time looking at captions.
Audio description
James Turner, “Some
characteristics of audio descripton and the corresponding moving
image”
Information Access in the Global Information Economy:
Proceedings of the 61st ASIS [American Society for Information
Science] Annual Meeting, 35:108–117 (1998)
- Abstract: Just as closed-captioning adds
visual information for the benefit of hearing-impaired television
viewers, audio description is a technique which adds an audio track
describing the images for the benefit of the visually-impaired.
This research is concerned with reusing the texts produced for use
by audio describers as a source for automatically deriving
shot-level indexing for film and video products. A first step in
studying the question of recycling audio-description text for this
purpose is to identify the characteristics of it in described
productions.... This paper proposes to flesh out those results by
analyzing the characteristics of a few different kinds of described
television productions, drawing some conclusions about the
usefulness of the technique for purposes of automatically deriving
shot-level indexing for moving-image materials.
- Turner, whose interest lies in indexing motion pictures,
examined three 27-minute segments of DVS-described programming (a
Nova episode, Poirot, and Jurassic
Park).
- Using his own techniques, Turner indexed the contents of the
descriptions and when they appeared.
- The number of “episodes” (really,
“instances”) of audio description in the 27-minute
segments was 53, 107, and 197. That equates to 212, 428, and 788
descriptions in an equivalent two-hour program, representing up to
six times as many as the
example of live audio description I examined (165 instances).
The low end of the scale, however, is comparable – 212 vs.
165. The extrapolated figures may, however, be excessive.
- The number of shots accompanied by audio descriptions varied
from 36% to 56%. But “in only 25.8% of cases is the audio
description text spoken entirely during the shot it
describes.” Many descriptions overlap scenes or anticipate
scenes (sometimes two scenes away).
- The research provides an extremely useful list of “types
of information transmitted by the audio-description text”:
- Physical description of characters
- Facial and corporal expressions
- Clothing
- Occupation, roles of characters
- Information about the attitudes of characters
- Spatial relationships between characters
- Movement of characters
- Setting
- Temporal indicators
- Indicators of proportion
- Décor
- Lighting
- Action
- Appearance of titles
- Textual information included in the image
- This study appears to be the first academic research into the
contents and deployment of audio description on television.
(Actually, Turner does have an
earlier study on his site. Indeed, several papers are
available there.)
|