Why I Hate Online Captioning
(Because it sucks compared to real captioning!)

Notes from a presentation delivered 2007.10.04 at An Event Apart San Francisco.

A few odds ’n’ ends I mentioned:

Best practices in online captioning, an old research project
The Open & Closed Project, my accessibility research project still in search of funding
Article about how online versions of TV shows don’t have captioning

And a serious correction about captioning at NBC.com: I really should have double-checked NBC.com while in American domainspace before I got up onstage and said they were the only ones doing anything in online captioning. They are, but it sucks too. Not only is only one show captioned, they have yet again found a new and shocking way to completely screw it up. (Continuous scrolling text in a frame to the right of the image, with upcoming text clearly visible and the current text scrolled upward into a reverse-type field. And! All capitals! 1979 called; it wants its captioning back.)

Notes

Yes, I retired from Web accessibility. But this presentation isn’t about Web accessibility. It’s more about online accessibility.
Captioning vs. subtitling
- Differences
  - Captioning is a transcription of dialogue and meaningful sound effects for deaf viewers.
  - Subtitling is a translation of some dialogue and onscreen type for hearing viewers who do not understand the original language.
  - Subtitlers are the most arrogant and ideological people you’ve ever met. They make Christian fundamentalists look like hippies.
  - Subtitles are usually shown at bottom centre, and some people insist on that. Subtitlers never want subtitles to be more than two lines long. Some people insist that one line be longer than the other.
  - Subtitles edit. Subtitlers believe, without serious evidence, that people cannot read more than 130 to 150 words a minute.
  - Subtitles do not indicate sound effects, they don’t indicate who’s speaking, and they specifically and intentionally leave things out, like words or phrases you are expected to understand. After all, you are a worldly hearing person who knows what oui or non or merci means, and you don’t need those translated. And they usually don’t translate singing.
  - Captioning isn’t like that. Captions move to indicate who’s talking, or they write out the name of the speaker, or both. In principle captioning is verbatim or nearly so, because we know that people can handle captioning well above 200 words per minute for long periods. We notate all the important sound effects. We caption songs.
Kinds
- Let’s look at captioning on television in the U.S. and Canada
  - We’ll start with the concept of the vertical blanking interval: We can hide information in the television picture on the 21st line of the television signal (Line 21).
    - You mostly know captioning from newscasts, which are done using real-time captioning.
    - The method of display there is called scrollup, for obvious reasons. (If you have trouble keeping these separate, scrolling is vertical and crawling is horizontal. You can’t make captioning crawl; it only ever scrolls).
    - The other kind, the kind you see on prerecorded programs, is called pop-on captioning, because one caption appears as a block and is replaced by another caption or a blank screen.
    - You can mix and match those presentation styles in the same program. And there’s a third one that’s rarely used (paint-on).
  - HDTV: You’re all busy buying plasma screens so you can play Wii, and they also can show high-definition television:
    - HDTV does not have a VBI, but carries extra information like captioning in defined data structures.
    - Ostensibly we’ve got better fonts, but it’s all about backward compatibility with Line 21. HDTV doesn’t even use Unicode.
- Decoders
  - They’re built into TVs now and have been since 1993, but back in the day you needed external decoders like these:
And then we come to the Web
- Nobody ever really seriously intended to put captioning on the Web.
  - Every single enterprise is a test project or it relies on croudsourcing or something like that.
    - There’s free captioning software you can download, like Magpie, but it hasn’t really been updated or replaced after all these years.
    - There’s the site called Dotsub, which hasn’t really worked out. It’s supposed to be a central repository of homemade captions and subtitles for other people’s work, which in most parts of the civilized world is illegal right there.
    - Anyway, the results are usually terrible.
  - Nobody seriously intends to make captioning as common online as it is on TV or in home video.
  - There are too many file formats.
    - You can save captions in at least five formats just for QuickTime, RealPlayer, and Windows Media Player.
    - There’s a claimed standard format, called SMIL, from the W3C. QuickTime and RealPlayer understand it, though they don’t understand it the same way.
    - QuickTime and RealPlayer have their own text formats.
    - And Microsoft has its own format, called SAMI. It does less, nothing else uses it, and it isn’t even fully documented on the Microsoft Web site.
    - In theory there is another W3C-standard file format, called Timed Text and developed in a quasi-secret process. I have my suspicions that the chair of that working group is a defence contractor in some way.
  - Doing this is virtually impossible with open-source software. Almost the only person who has even bothered to try is Mark Pilgrim.
    - The exeption here is fansubbing of TV shows and movies, mostly Japanese anime shows. A lot of those fansubbers use Linux and are open-source zealots. They’re also the only people who have that much time to waste dicking around with helper applications and running shell scripts. Everybody else just wants to watch TV. And anyway, that’s subtitling, not captioning.
  - Unlike on television, online we cannot hide caption data in the video signal.
    - You always have to have some kind of external file.
    - Or you could send out a SMIL file that refers to the original video and to the captioning track.
    - Your computer might not know what to do with that file, since lots of different applications can open it, including GoLive, which is what happened on my machine for a long time. If I double-clicked on a SMIL file that was supposed to start up a video with captioning, GoLive opened it instead
  - Converting captions is difficult
    - There are no really reliable ways to translate existing captions to Web formats.
      - You can look around and you’ll find some claimed solutions, but nothing works perfectly.
      - One of the many problems is character encoding. There are three different character encodings just for Line 21 captions, and there’s no way to tell them apart up front.
      - Character encoding is a problem even with online-native captions:
      - And if you try to convert scrollup captions to pop-on captions, the results can be unreadable – several scrolling lines replaced with single pop-on lines.
      - And remember when I said that HDTV doesn’t even use Unicode?
  - Everyone makes the same mistake of insisting on closed captioning.
    - When people online make the mistake of insisting on closed captioning, they’re replicating the broadcasting model: On TV, exactly one signal has to work for everybody. So you have to embed accessibility into the signal.
    - But online, we have as many signals, or channels, as we want. You can have separate open-captioned and uncaptioned video.
    - If you wanted a captioned version, you’d just hit that link. If you didn’t, you wouldn’t. It’s technically much simpler and it’s more native to the online “space,” as it were.
  - No one ever bothers with open captioning. That could be a cheap method for programming that was already shown on TV, or at least programming with pop-on captions
    - Here you just decode the TV closed captions and record the resulting image.
    - All it takes is a broadcast-quality decoder, which costs about 600 bucks, and a willingness to create two video files rather than one.
    - In 2002, we ran an experiment at CBC.CA. We digitized many news reports with captions and without, and saved them with different URLs and links.
  - It is perfectly normal to have the same program captioned and recaptioned and rerecaptioned and rererecaptioned. I have seen five captioned versions of the movie Alien, and that’s without even looking at British or Australian versions. Here’s another version, dumbly rerererecaptioned in scrollup.
  - It’s not uncommon for a TV show to lose its captions going to DVD, or to be completely recaptioned
  - So: Preserving captions across formats is difficult in practice.
- Captions look bad
  - Captions are all displayed in the wrong place, especially on Flash.
    - Most of the time, captions are displayed out of frame:
      - There was always a need for that, especially for really low-vision people, and you can even buy external displays for your captioning. If you watch a letterboxed movie on a fullscreen display, parts of your captions or subtitles will live in the black box under the image, so that’s another precedent.
      - But nobody ever asked for captions to be moved offscreen into a separate box.
      - This has practical implications. A lot of beginners, and some really bad or cheap “professional” captioners, makes the mistake of typesetting captions strictly at bottom centre, as though they were subtitles.
        
        Maybe centred text is “classy” in some way they can’t define.
        
        But it’s also totally wrong for captions, which must move from left to right, and occasionally up and down, to correspond with the location of whoever is speaking.
        
        You can move left and right with captions that are displayed out of frame, but they’re so far away from the speakers that you might as well not bother.
        And that only works in some formats.
      - Caption text is way too big compared to the frame
        You can easily find online captions that are one-third or one-half the size of the video frame
        
        And, since the frame is so small to begin with, just the distance from the edge of the frame to the captions is enough to make it nearly impossible to follow the picture with your peripheral vision.
        Yes, that really is how captioning viewers watch TV, and I have the research papers to back that up
        That’s why if you’re trying to learn about captioning, you have to watch nothing but captioning nonstop for about two weeks just so you can retrain your visual processing.
        
        If you find this topic discussed at all, it’s claimed to be an advantage over TV captioning. Finally the captions aren’t “blocking” the screen! They’re less “distracting”! But you simply cannot compare a 13-inch TV picture with a 2½-inch online video picture. You aren’t watching them the same way.
Aren’t the fonts too small?
- But sometimes you get the opposite complaint. People who never intended to give you online captioning anyway often complain that the captions will be too small to read. False!
- Most of the time, they’re actually way too big for the frame
- If you’re using captions decoded from TV, they’re exactly the same relative size as on TV, and you’re watching the video from a shorter distance away.
- The problem is really one of screenfonts. And I have a Screenfont project just for that, located, logically enough, at Screenfont.CA.
  - By far the biggest problem is letterspacing that’s too tight. But that’s the same problem everywhere in captioning.
  - And of course some people use lousy fonts, like Comic Sans.
- Handhelds
  
  In other words, how can I get captioning on my iPhone?
  - Deaf people have iPods, or, more accurately, hard-of-hearing people do. Here my friend’s iPod is connected to a transceiver that wirelessly beams music into his hearing aids:
  - Allegedly you can watch videos you buy from iTunes with closed captioning, if it exists on the video.
    - I don’t know how they’re managing that technically, since you always need an external file with an MPEG or any kind of computer video file. So are you downloading one file or two?
    - And I haven’t heard of anyone who actually has downloaded and watched an iTunes video with captioning. The whole thing is vapourware at this point.
  - I don’t know how they’re going to make that work. I certainly know they’re doing it wrong. They shouldn’t be bothering with closed captioning. There should simply be separate open- and closed-captioned versions.
Video scales to the Web. Captioning doesn’t
- In the old days, there was one video producer and many broadcasters. Now there are many producers and only a few broadcasters. Today the broadcasters are mostly YouTube or iTunes. We’ve inverted the pyramid, but it’s the same pyramid
  - But one consequence is we’re going from lots of captioning to the inverse of that, almost no captioning.
- For amateur videos:
  - Nobody has the time or the expertise to do it.
  - I’m not wild about civilians captioning their own material. For all my complaints about shitty captioning, even shitty “professional” captioning will be better than amateur captioning.
  - We do hire professional contractors and architects to make buildings accessible; I don’t see why every single part of the medium of online video has to become amateur hour.
- For professional videos:
  - People are acting like it’s 1978 and there is no technical way to caption. They’re just pretending it’s impossible.
  - The fact that there’s no government regulation is another reason for the lack of captioning. And there probably isn’t going to be regulation of online video, at least in constitutional democracies.
- When the day comes that everything is online, next to nothing will be captioned.
Captioning is in steady and possibly irreversible decline. (I could go into a song-and-dance about that for these notes, and I may do so later, although elsewhere.) That’s why I started my research project, which is unpopular.
But there is a bigger ethical issue. Fundamentally, deaf people are not more or less important than blind people or anyone else.
- I am not clear on why we’re always talking about captioning but never talking about audio description.
- The kinds of programming you find online are not super-visual. You can get by somewhat adequately from the soundtrack alone. That’s why so many TV shows have audio-only podcasts that work just fine.
- But there are still a lot of programs that you can’t understand just from the soundtrack.
  - Certainly every fictional narrative program, like a drama or a comedy from television.
  - Even some current-affairs programs don’t really work just from listening to the soundtrack.
  - So if you don’t own a TV, and you’re really righteous about that, but you download Battlestar Galactica and The Daily Show and watch those, you need to know that blind people really cannot figure out those shows without audio description.
- And it is difficult or sometimes impossible just to manipulate the onscreen player controls using keyboard-only or a screen reader.
- So in fact, while deaf people get all the attention, getting enough attention to have me standing up here talking about them for an hour, blind people may have a worse accessibility problem. Even though it comes up less often, the problem may be worse.
Conclusion
- I hate online captioning because it sucks compared to real captioning. There isn’t enough of it, it’s technically too complicated, closed captioning is the wrong idea in the first place, and every aspect of the execution is lousy.
- This is your Internet on captioning. Any questions?

Audience reactions

I have the results of the comment cards from my performance at An Event Apart. I haven’t read them, because I was specifically warned in advance that I scored low. “If only you gave us news we could use,” I was essentially told.

Everybody knew up front what my topic was and it was approved from start to finish in the planning process. Audiencemembers know what kind of trouble they’re in for if they ever have to deal with captioned video. What exact other news could you use?

Posted: 2007.10.08 ¶ Updated: 2007.12.04, 2009.09.10

Why I Hate Online Captioning (Because it sucks compared to real captioning!)

Notes

And then we come to the Web

Converting captions is difficult

Captions look bad

Aren’t the fonts too small?

Handhelds

Video scales to the Web. Captioning doesn’t

Conclusion

Audience reactions

Why I Hate Online Captioning
(Because it sucks compared to real captioning!)