The TILE project is concerned with the creation of accessible learning objects, which may be stored in a repository. Video can be a learning object, and the way to make video accessible to deaf people is to caption it. (Sign language is another option in some cases.)
TILE’s Learning Object Repository aims to make it possible to reuse and modify learning objects according to instructor and learner needs, which may include accessibility.
Online video with captions could be useful in a number of ways, some of them hypothetical at the moment:
What’s standing in the way of the easy reuse of captioned video as a learning object is the incompatibility of data formats.
Original caption files take a number of forms, a fact that is unlikely to change in the foreseeable future.
Even if every set of captions used exactly the same file format, the form and content of different sets might still be incompatible.
It appears that reuse and reformatting of captioned video may not be straightforward for the Repository.
The forms into which online captions are translated will add further constraints all by themselves.
format=flowed
specification is a kind of semistructured plain text.)By far the highest-profile transformation is turning captions into full-text transcripts, since transcripts are permitted as an accessibility measure under WCAG 1.0. Many sites with captioned video also provide transcripts, and some sites provide transcripts instead of captioning.
We draw a distinction between two kinds of transcripts.
A human-usable transcript will usually be machine-readable, unless it’s presented in a format machines cannot understand (e.g., a PDF made from scanned pages or faxes).
Most machine-readable transcripts can also be read by people, but that doesn’t mean it’s a pleasant experience – and for learning-disabled persons, screenfuls of plain text are quite inaccessible. Simple text dumps of Line 21 real-time caption files are difficult to read, since they’re usually set in all-capitals with certain artifacts of closed captioning intact (as, for example, the use of >> to denote a speaker change). These would not be considered native Web documents, given their perfunctory use of HTML and poor typographic presentation.
Alternatives to inaccessible Web content – e.g., transcripts of uncaptioned video – also have to meet accessibility standards. Web Content Accessibility Guidelines Priority 2 requires valid, semantic markup. That applies equally to transcripts created from captions.
Semantic markup is a term that has grown in currency among standards-compliant Web developers, who are now quite numerous. (Wayne Burkett provides a good introduction and links.) Paul Prescod defines “semantic markup” thus:
It is markup which captures sufficient amounts of the structure of the document to allow a reasonably broad set of automated processes to do their jobs.
For Web sites, semantic markup requires authors to use the actual and most correct element or attribute for the content they’re marking up. Not every chunk of text is a paragraph, for example; some text is an address, a heading, a citation, a quotation, or a list item. Since HTML has a limited range of elements and attributes, sometimes an author will have to make a best guess based on the definitions in the HTML specification.
For XHTML documents, things are more complex. XHTML is HTML and XML at the same time. XML, by definition, is eXtensible. In theory, you can define your own elements, though in practice few do. (One counterexample is custom-hacking a DTD to make the embed
element legal.)
Also in theory, different flavours of XML should be translatable. In online captioning, in principle a device should be able to translate SMIL into XHTML or to XHTML+SMIL, since all of those are XML. (At least one such converter has been published, though at time of writing the actual utility was unavailable on the Web.) Worse yet, the World Wide Web Consortium hasn’t even published a DTD for SMIL+XHTML!
But these converters are unlikely to understand the semantics of the underlying caption file. The XHTML they produce may be minimal or semantically incorrect (or faux-generic, as with the use of p
or pre
elements throughout). The converters will use XHTML elements, but won’t necessarily use the correct elements. Still, it is possible to list some requirements for a future conversion system.
A system that converts from a SMIL caption file into XHTML should meet these requirements:
dt
and utterances as dd
.th scope="row"
, utterances as td
, and description text as multicolumn td colspan="2"
. This verbose option, which can be tedious to handle in screen readers, should be used when no other format fits the document structure.Turning captions into transcripts has been presented as an easy and beneficial consequence of doing captioning in the first place. It’s certainly beneficial, but doing it right is not easy.
It may be too much to expect amateur captioners to produce valid SMIL and valid, semantic XHTML transcripts. To make the process easy for authors, a write-once/use-many approach is what we need. We could imagine a transformation utility like Markdown that lets people transcribe audio in a lightly-modified plain text that could be published as-is and also autoconverted to SMIL and XHTML. This is, however, merely hypothetical.
The Learning Object Repository should always attempt to retain an original file format and any helpful transformations. In this case, original caption files in any format (SMIL, SAMI, or whatever else) should always be retained, with due care to maintain binary files.
But when it comes to transformations, a repository of caption files can use any format that machines can understand. If “machines” means search engines in this case, and “search engines” means Google, nearly any common format could be stored – plain text, Word or PDF files, XHTML. Nonetheless, for human readability and longevity of the saved transcripts, the Repository should prefer valid, semantic XHTML caption transcripts even if they are presently difficult to create.
It may seem as though captions displayed using players’ own functions are more desirable to the Learning Repository than burned-in open captions. That’s because the players’ files are composed of actual characters rather than pictures of characters; real characters are easier to transform.
But as we have shown, what’s actually important is valid, semantic XHTML transcripts, or, under battle conditions, plain text or some other form. Burned-in open captions also begin their lives as computer files and are not necessarily any harder to convert to desirable forms. There is no reason related to longevity and reuse to prefer player closed captions over burned-in open captions.