Joe Clark: joeclark.org (E-mail)

You are here: Homepage > Media access > SGML for access > Yes/No


SGML for access: Yes/No

for readers who do know something about access technologies but do not know much about SGML

So let's talk about SGML. The acronym stands for Standard Generalized Markup Language. It's a means of using ordinary ASCII text – what you get when you type away on your computer without adding font specifications or changing the point size or underlining or italicizing anything – to "mark up" other text according to its structure.

The issue of structure (or function) is critical to SGML. We'll define this by example. Let's think of all the ways we can use italics in ordinary English text (which, in the following examples, all make use of the HTML <i> tag for simplicity):

You could think of other uses for italics, I'm sure. Now, the point here is that one overt form – italic type – is used to denote a wide range of structures or functions. In SGML, we would typically try to keep those different structures distinct by using tags, or markups. Tags generally take the form of a word written between angle brackets, like <emphasis> or <title>. To continue with the example of italics, we could use SGML to define various structures that correspond to our uses of italics. (The beauty of SGML is that, while a great many structures are already defined and ready to use, we can define our own structures at will.) We could create tags like:

<emphasis> </emphasis>
<terminology> </terminology>
<title> </title>
<foreign> </foreign>
<ship> </ship>
<annotation> </annotation>

(We use a slash / to turn a tag off. Not all tags need to be turned off, but in this case they all do.)

We could then rewrite our example sentences as follows:

Why would we want to go to all this trouble? Well, having marked up or tagged our text according to structure, we could let the interpreting program decide how to format it. This concept will be familiar to you from using Web browsers, which interpret HTML, a bastardized subset of SGML that underlies all Web pages. If you've ever looked at HTML text directly, you'll know that it's a complicated mass of <s and >s and formatting codes. But when that mishmash is read by a program like Lynx or Netscape, it's turned, in ideal cases, into a visually appealing and semantically meaningful design, with typographic attributes, spacing, graphics, and the like.

In the examples above, if whatever program you were using understood SGML, you could command your program to use italic characters for emphasis, foreign words, and titles, but not for annotations or ship names. Or if you were more concerned with spotting all the foreign words in a document (let's say you were trying to simplify it), you could set up the <foreign> tag to be displayed in a large font.

Also, you can search for and replace these tags, or otherwise manipulate them. And this is where we begin to see relevance in access issues. Let's continue with our example of italics. In captioning in North America (sorry, in quality captioning), captioners use italics for all the traditional applications in print typography, but also to denote interior monologue (thinking), narration, and offscreen speech. But not all captioners have exactly the same overt form. Sometimes, as in a nature documentary where the only human speech is from a narrator, italicizing every caption is seen as unnecessary, and an explicit annotation is used instead:
Narrator:
[NARRATOR]
[ Narrator ]

If our caption file were saved in SGML format with, say, a <narration> tag and a full suite of other tags (this is not a piecemeal approach I'm talking about), we could do various things:

Now that you have some understanding of SGML, you may wish to look at other online resouces on the subject, like sil.org. Also try SoftQuad and the newsgroup comp.text.sgml.

You should now look at the Common Issues homepage.