SGML and audio description: Some possible uses

SGML can be applied to the practice of audio-describing film and video. First, go read my discussion of possible uses of SGML in captioning. (Or skip to the new example I’ve worked up.)

One way of employing SGML for A.D. would encompass the dialogue as delivered, the text of whatever A.D. track(s) we’re putting together, and any onscreen text.

Someone would laboriously transcribe the original audio. If the tape were also being captioned, we would simply import the verbatim transcript. If it’s already been captioned, we could download the caption text and check it against delivery to create a verbatim transcript.

We would then optionally add structural encodings of the sort I described in the captioning page. (It’s optional if the ultimate goal is an A.D. file of some sort; it should be mandatory if you don’t know what the ultimate use will be.) So we would mark up all the emphases, names of movies, speaker IDs, and so on, as well as (crucially) when the dialogue starts and stops. We need to know this anyway because typically we provide A.D. during dialogue pauses. We would also encode when music and meaningful sound effects come and go; sometimes we narrate over these and sometimes we don’t.

We’d add notations of onscreen titles and when they appeared and disappeared.

Now we start the A.D. process and write our script using the normal literary techniques. These bursts of text would also carry start and finish (or on and off, or in and out) times. Also, we would alter the A.D. version of the onscreen text accordingly (if you listen to and watch a DVS Home Video, the way the narrators read titles is quite different from the way they are displayed) and mark that up with on and off times.

If we’re using more than one narrator, each narrator would be separately coded. This is pretty simple:

<narrator 1>
<narrator 2>
<narrator all>
<narrator any>
<narrator male>

Our next step depends on the sophistication of the editing suite. We have two options:

The way we do things now, with a large analogue recording studio operated by a skilled technician. We do make use of digital timecode here, but usually only for human interpretation and not as a means of setting processes in motion without human intervention.

In this case, we would do our standard editing thing, making sure to alter the A.D. script in the computer to match what was actually said. (Scripts are rarely adhered to perfectly during actual recording sessions.) We then send the tape out for production and ordinary people buy it and play it.

In the meantime, we use the fact that all the text in the movie-- original dialogue, original titles, sound effects, plus audio descriptions-- is timecoded to create a master text file, in any format you could think of, that comprises the whole script and the whole A.D. track. Voilà! You now have a complete text-only analogue of a motion picture. We have never been able to produce such a thing in the history of cinema. Even screenplays, very much including post-facto screenplays published after a movie makes it big (look at Oliver Stone’s huge and ugly annotated screenplay for Nixon), do not contain this level of detail. (Obviously we could also accommodate the kind of annotations found in published screenplays. Publishers, are you listening?)

Assuming appropriate copyright issues were worked out, this text-only analogue could be uploaded somewhere, transmitted, searched, summarized, or whatever else we wanted to do with it.

We could also use this file to create A.D. for duplicate prints with different framerates (like for a motion picture at 24 frames per second or for overseas TV at 25 or 24 fps, or for a QuickTime video at 10 fps). The software could make a first pass at condensing the A.D. text to fit, or at least it could flag a human operator to make a decision on how to shorten the text to fit. Then that file could be uploaded, transmitted, searched, and so on. We could create a file of contracted Braille for deaf-blind people, who otherwise have no real access to audio-visual media.

It could also be translated into other languages. In fact, the file we have created will be worth its weight in electrons for subtitlers and dubbers. They could buy our file and simply output only the main dialogue and titles, then translate those. Or someone who was clued in could do a translation of the A.D. as well. (Has this been tried yet? We may need to train dubbers for this task.)
In the near future, digital editing software would have provisions for A.D. and we would simply create the A.D. version on a computer. The system would inhale the SGML text and automatically stop and start the video intelligently to allow for recording of description snippets. If we miss a take, in a matter of instants we could be back at the original edit for a retry; currently we have to rewind and try again. And erasing a digital recording is an instantaneous process; we wouldn’t have to record over an analogue recording.

Then we could do all the same things as in case 1, but even more easily because all the data are sitting on the same machine.

Also, SGML would make it easy to do two kinds of A.D. for the same film: What I call conventional or interlude description (what we have now, with narration almost exclusively limited to pauses in dialogue) and continuous description (describe absolutely everything from start to finish). Imagine that we are producing one of those DVD things that everyone is telling us will be fabulous. DVDs carry something like five audio tracks. Assume we have three at our disposal for main dialogue + other information. We could use one track for main audio without descriptions, another for main + interlude descriptions, a third for main + continuous. You would then have the option of watching the film as many times as necessary to understand all the action. The current use of videotape as a medium of distribution à la DVS Home Video has severe limitations here.

In this example, SGML would let us fine-tune the level of description in the ultimate computer file we send out to the world. Maybe for average Internet users reading the text of a film online, interlude description would be sufficient. For cinephiles obsessed by frame-by-frame study, continuous description would be de rigueur.

Back to the Joe Clark main page, or to the SGML for access page. Or go to an example of how SGML could be used for description.