Text and links

AUTHOR’S NOTE – You’re reading the HTML version of a chapter from the book Building Accessible Websites (ISBN 0-7357-1150-X). Copyright © Joe Clark, 2002 (about the author). All rights reserved. ¶ Back to Contents

Text is the most accessible data format there is. The preponderance of “content” in Websites worldwide is, in fact, text.

Any browser can read plain HTML text, and can at least muddle through even the most convoluted HTML text – e.g., text interspersed with inline images; text in table cells; text marked up with unwise formatting codes like <u></u> and <blink></blink>; text structured with the relatively advanced accessibility codes described in this chapter. Modern browsers even take a stab at rendering characters you may not understand (like Japanese, or even the many symbols – “pi” characters, to use the old typographic terminology – available in HTML).

You would have to search high and wide to find any kind of adaptive technology that doesn’t understand and manipulate text. It stands to reason: HTML and its accessibility codes are expressed in text.

There are, however, many subtle wrinkles involved in accessible text design – and one big myth. First, though, a definition.


In this chapter:

What is text?

In building Websites, we consider text to be any series of recognized characters expressed in a form readable by computers.

Note that comprehension is not part of the definition. Neither, for that matter, is a human being at the receiving end. The issue is computer manipulability, a necessary prerequisite for accessibility.

This definition excludes “text” that is rendered as bitmapped graphics (as a picture of text) that the human eye can interpret as letters, characters, symbols, or words but with no underlying representation. If I press the e key on my computer, I produce a code for that letter (an underlying structure) and, in nearly all cases, a visual representation of it. If I fire up Photoshop and type an e in a certain font, size, and colour and save that combination as a GIF file, all I have produced is a visual representation with no underlying structure.

And remember, adaptive technology relies on structure. A picture of text is essentially invisible or nonexistent to technologies like screen readers and Braille displays. It also disappears entirely when graphics loading is turned off in a graphical browser. For the purpose of our discussion, a picture of text is more picture than text.

Now, as we saw in Chapter 6, “The image problem,” it is easy to produce multiple text equivalents for any kind of graphic, so pictures of text are not necessarily inaccessible. This chapter explains how to make text as defined here accessible.

Basic facts

Note: If you’re already familiar with HTML basics, you can skip this section.

As you likely know, Websites written in the Latin-alphabet languages typically found online are coded in the rudimentary character set known as US-ASCII, the latter term being a quaint acronym for American Standard Code for Information Interchange. (“Quaint” because, like EPS, VHS, and both meanings of ABS, the expansion of the acronym is either forgotten or irrelevant or both.)

The issue of character sets is convoluted. For our purposes, all you need to keep in mind is that the HTML files that cause Websites to come into being in most Western languages are typed with a rudimentary set of characters. Most relevantly, no accented letters are present, as are relatively few punctuation marks.

So how do we refer to accented letters, exotic punctuation, and anything else beyond US-ASCII? By taking advantage of the fact that computers and computer scientists love to number things. Tens of thousands of characters of every description have already been numbered, in some cases several different times in several different encoding schemes, all of which may someday be subsumed under the big tent of Unicode, which seeks to categorize nearly every differentiable character in human writing systems.

These numbering systems are sometimes incompatible, meaning that the use of a character number from one system may not work on a certain browser or platform. HTML may permit any and all representations from various numbering schemes. This complication, however, can sometimes be avoided: Many characters in HTML also have names made up of letters.

In HTML, you can refer to and invoke characters by their number or name. Both the numeric and alphabetic character representations are called entities, with the latter specifically known as literal or named entities. Of course, this quickly threatens to become a circular argument: We may dress them up in the buzzword of “entities,” but we’re still using characters to refer to other characters, so where do you draw the line?

To tell the browser or other device that we are referring to another character rather than using those numbers or names for their own sake, we encode or escape these characters by surrounding numeric entities with ampersand-octothorpe and semicolon, or &#_____;. Literal or named entities use the appropriate name surrounded by ampersand-semicolon, or &_____;.

Example: The symbol for pounds sterling, £, has the name pound and the number 163. To use that character in an HTML document, type either &#163; or &pound;.

(If you ever need an ampersand for its traditional uses, type the ampersand’s HTML name: &amp;. Don’t worry about semicolons used in their traditional way; they’re always mirrored by unencoded ampersands in character entities and are not misinterpreted by browsers or devices.)

A cottage industry has developed online to document the thousands of available characters and the many available encodings. Visibone.com sells a printed card that’s quite useful.

Character encodings

Ideally, your server, if correctly configured, will transmit the character encoding used in a document inside an HTTP header. It is nonetheless considered good authoring practice for you to declare character encoding yourself within a Web page.

Browsers and devices behave unpredictably if your Web pages do not declare whatever character encoding they use. All you need to do is add the following declaration inside the head element of your page:

<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />

Don’t create any pages without a declared character encoding. Add it to your page templates, or just copy and paste or drag and drop.

That declaration is consistent with the advice of this chapter; the ISO 8859-1 character set encompasses US-ASCII. Experts in multilingual Web design will voice the objection that authors can actually use any of numerous character encodings in their Web pages, making it possible to type, for example, accented characters directly into HTML code as long as you declare the correct character encoding. (An attribute like content="text/html; charset=iso-8859-2" lets you write directly in Polish, for example.)

I have assumed that typical readers will find the US-ASCII character set and 8859-1 declaration sufficient. For readers who already know how to use alternate character encodings, the bulk of this chapter’s advice is unaffected.

Headers and tabbing

What’s the best way to keep your text accessible? Use proper markup.

Yes, I know, how tedious and unsexy. You didn’t buy this book to be told to use <p></p> to mark up your paragraphs. Well, the truth hurts: Strict by-the-book HTML is all but universally readable.

Web designers will gnash their teeth at the prospect of a lifetime of header-tag monogamy. Can you imagine using <h1></h1> through <h6></h6> to describe heading text for the rest of your life?

You may have to imagine it. Screen readers and Braille displays churn through a Web page serially or sequentially, not randomly.

How do sighted people read Web pages? In general, they ignore what isn’t interesting and focus on what is. The decision on what is and is not of interest is made quite literally in the blink of an eye.

Skipping entire regions of a Web page is so automatic for sighted people that it has contributed to the economic decline of the commercial Web. The phenomenon of banner blindness is well documented: People with even a modicum of online experience do not so much as bother looking at banner ads, and indeed they sometimes fail to notice anything that merely resembles a banner ad but is in fact a navigation or “content” element. Among other reasons, banner blindness explains why almost no one selects and follows a banner ad.

Banner blindness has an upside: It creates an instant visual sophistication that enables a sighted visitor to locate the useful components of a page, like the search box, headlines, and body copy, and focus on them right away. True enough, this facility is not foolproof, and substandard information architecture can cause even sophisticated Web users to sit there scanning a page for seconds or minutes trying to figure out what’s what, but the fact remains that random access to Web pages is the norm for sighted people.

A screen-reader user has no such luxury. A screen reader provides a form of serial access to a Web page: You hear or read items one after another in sequence. You cannot instantaneously jump from one part of the screen to another, as you can with the naked eye. Now, there are ways to speed navigation in a page, but the fundamental process is one of sequential travel from one item to the next.

What constitutes an item? HTML structural constructs occupy the highest level of abstraction, including headings formatted in hx elements; unordered, ordered, and definition lists; images; tables and frames; and of course paragraphs and links.

At the rudimentary level, the Tab key is what skips the cursor from item to item. Nondisabled visitors can already do that on nearly any Web page with certain browsers, including Internet Explorer on both Mac and Windows; the concept is taken to the max with screen readers, which provide a panoply of keyboard commands for page navigation. The Tab key barely scratches the surface.

Through accessible HTML (the tabindex attribute, discussed in Chapter 8, “Navigation”), you can provide pit stops along the way as one tabs through a document, but even if you do not take that step, navigating through a Web page more or less means tabbing from item to item.

Styled headings

Screen readers and the like let you select the order in which you’d like to tab through elements, but the fundaments lie in HTML. The Web Content Accessibility Guidelines tell us to use heading elements in strict numerical order – <h1></h1>, then, if necessary, <h2></h2> through <h6></h6> in that sequence. That dictum suits androids and Vulcans quite well, but here in the real world you can skip intervening levels and you don’t have to start at <h1></h1>. I am telling you that you can defy the WCAG in this limited way. You must not, however, use heading elements in anything but ascending order.

If you design a page whose headings are marked up with presentational attributes – even legal or tolerated HTML like the <big></big> element or an align="center" attribute on a paragraph element – it may be apparent to sighted visitors that the text is meant as a heading, but there is no reliable way for a screen reader to select that header in sequence. Adaptive technology relies on structure, remember. h means heading. <big></big> and <center></center> mean big and centred.

Plain heading elements have a murky reputation in large part due to clumsy handling by browsers. There is no credible reason why an unadorned <h1></h1> should be displayed in flush-left 28-point bold type by default in a graphical browser, but that’s the sort of typographic atrocity we’ve put up with since the earliest days of Mosaic, which I specifically remember.

Such type is just too big. In print typography, size increments are small: “Display” or headline type typically runs in steps of 14, 16, 18, 20, 22 or 24, and 24 or 28 point, and there is no necessity for boldface.

Moreover, graphical browsers overlook the structure implied by indention. (Not “indentation.” The word is indention.) Heading elements are either all flush-left or all centred. Lynx, the text-only browser, centres <h1></h1> text, runs <h2></h2> text at the left margin, and indents remaining successive hx levels.

Accordingly, visual sophisticates have been insulted over the years by the default appearance of heading elements. Hurt feelings can be assuaged with stylesheets.

Generic text tags

HTML comes equipped with a surprising range of “phrase elements” or tags to mark up text – way more than <strong></strong> or <i></i> or whatever else we keep reading about over and over again in intro-to-HTML courses.

The problem? Practically nobody uses them. There may be good reason for this: Too many of the available tags are suited to computer scientists and pretty much no one else.

Let’s run through the list. As usual, naughty presentational tags are excluded.

  1. <abbr></abbr> and <acronym></acronym>, for abbreviations and acronyms. Important enough to warrant their own section, “Access tags,” coming up shortly.
  2. <em></em> for emphasis of any sort other than the special case of citations. How this differs from <strong></strong> has never been entirely clear. Here’s the entirety of the W3C’s advice in this regard: “em: Indicates emphasis. strong: Indicates stronger emphasis.” Thanks for clearing that up. In differentiating the two, I often permit myself the following forbidden presentationalist thought: Use <strong></strong> for anything whose rendering in a bold font you could put up with, since that’s what graphical browsers have used since time immemorial. Use <em></em> for everything else, except...
  3. <cite></cite> for “citations,” meaning titles (of books, films, plays, television programs, court cases, possibly even ships) and words and phrases quoted for themselves. It must be reiterated that citations are not interchangeable with <em></em> for general emphasis.
  4. The oddball computer-science quintet:
    1. <code></code>, for computer code; <var></var>, for variables, like the numbers 1 through 6 in the hx elements; and <kbd></kbd>, to represent text typed at a keyboard. Such elements are used extensively in a book like this, but rarely at an E-commerce site or pretty much anywhere else on the Web, really.
    2. <dfn></dfn> for definitions. Note that this element applies to the term being defined and not the text of the definition.
    3. <samp></samp> for “samples” – of output from a computer program, for example. The idea of small, discrete samples of output that need to be marked up as such harkens back to 1980-era Radio Shack TRS-80 computers programmed in BASIC. Hats off to you if you can figure out an appropriate real-world use for this one.

Of this list, the elements you will use in day-to-day life are <em></em>, <strong></strong>, and <cite></cite>. If you find yourself needing the Albert Einstein set of elements for definitions, samples, and the like, you may wish to set up stylesheets so they won’t all end up rendered in too-small Courier type, which does seem to be the default in every graphical browser you could name. You will cause no lessening of accessibility by using stylesheets and will improve the appearance of your sites.

For completeness, I’ll mention two more text elements, though they’re unrelated to accessibility: <ins></ins> for inserted text and <del></del> for deleted text. They’re reasonably well-supported, and indeed you find <del></del> in common use at E-commerce sites to strike out old prices in favour of the new, low! prices.

Accessibility-specific elements

HTML 4.0 introduced a pair of sometimes-troublesome conjoined twins, <abbr></abbr> and <acronym></acronym>, for abbreviations and acronyms. You may wonder why anyone bothered: Aren’t abbreviations and acronyms already discernible? Abbreviations tend to end in a period; acronyms tend to be written in capitals.

Screen readers, however, are too stupid to figure out those nuances. Sentences can end in periods; if a screen reader encounters an abbreviation also terminating in a period, does it indicate the end of the sentence? Capital letters are used for many purposes, and all-caps acronyms can be found in all-caps text (“VAT INCREASE PROPOSED”).

Also, let’s not be presumptuous about language diversity. Many languages (and some national English variants) do not use periods in abbreviations. Some acronyms (even in English) use upper and lower case. Some languages lack a concept of case entirely but nonetheless use rules for creating abbreviations.

At first blush, the use of <abbr></abbr> and <acronym></acronym> is straightforward. Just surround the abbreviation or acronym with the element, and type the full expansion inside the title="" attribute:

  1. <acronym title="not in my backyard">NIMBY</acronym>
  2. <abbr title="continued">cont’d</abbr>

If you’re switching languages for some reason – if, for example, you are writing a section in a book on foreign-language acronyms (and you know how often that task comes up) – you can add a lang="" xml:lang="" attribute. Don’t bother if the language of the text (as declared in the documents <head></head>) and of the acronym or abbreviation are the same.

  1. <abbr lang="de" xml:lang="de" title="Hauptbahnhof">Hbf</abbr>
  2. <acronym title="società per azioni" lang="it" xml:lang="it">SpA</acronym>

Do you really have to use these tags for every abbreviation or acronym, no matter how well-known? Technically, yes: The Web Accessibility Initiative Web Content Accessibility Guidelines tell us to “[s]pecify the expansion of each abbreviation or acronym in a document where it first occurs.”

But in reality, sometimes the expansion, as in the case of EPS and both meanings of ABS, is irrelevant. Does anyone really care anymore that a fax is a facsimile?

Legal company designations, like <acronym title="società per azioni" lang="it">SpA</acronym> cited above, are a good example. They tend to be read as irreducible units whether or not every letter is articulated. Rules vary: In English, “Inc.” is read aloud as “incorporated” or “ink,” but either way everyone knows it is a legal company designation. Whether or not people know how “Inc.” varies from “Ltd.,” “PLC,” or any other designation is irrelevant from an accessibility standpoint.

Many abbreviations are so well-known, and so unlikely to trip up a screen reader, that using the “required” tags seems like gilding the lily. An example given in a WAI resource document actually marks up the acronym WWW, rather proving my point.

The unwritten intent of the Web Content Accessibility Guidelines is to mark up unfamiliar or ambiguous acronyms and abbreviations. And yes, you only have to do it the first time it appears in a document. If you’re doing a search-and-replace, however, having decided it is easier to bang out the plain text of an article about EPS graphics and bolt on the access tags later, you might as well replace all occurrences at once. What with search engines, browser Find functions (letting you locate any text on a page), and your own possible use of anchor elements for navigating within a page, you cannot be sure that people will read from the very beginning. And what if the first instance of an abbreviation or acronym comes up at word 300 of a thousand-word document? Should we have to go hunting for the first instance merely because we started reading at word 700 and you’ve used the access element only once?

The <abbr></abbr> and <acronym></acronym> elements are, moreover, an imperfect guarantee of accurate voicing by screen readers. It’s not really their fault. There is no guarantee that a screen reader will pronounce an abbreviation or acronym more accurately merely because it is marked up as such, if only because pronunciations are unpredictable. You should use them anyway; while they are imperfect, they are generally quite functional.

Some exceptions come to mind. Don’t use the elements for initialed proper names: Composer J.S. Bach is J.S. Bach and not <acronym title="Johann Sebastian">J.S.</acronym> Bach. (Writing “J.S. Bach” is more than sufficient to differentiate him from C.P.E., J.C., or W.F. Bach.) Don’t use the elements even if you write initialed proper names without periods and they form possible words: Architect IM Pei is IM Pei and not <acronym title="Ieoh Ming">IM</acronym> Pei.

Units of measure should be exempted. Even imperial units like pounds, with the Latinate abbreviation lbs., are well enough understood to be left alone. (Screen readers ship with pronunciation exception dictionaries. A common abbreviation like lbs. will likely be pronounced correctly. If not, you the user can add it to your custom dictionary.)

Metric symbols are a trickier case, since they are in fact symbols and not abbreviations. While this is something of a legalistic distinction in cases like kg (kilograms) or kWh (kilowatt-hours) that tend to be pronounced letter-by-letter or in their full expansion, we nevertheless find single-letter symbols like m (metres) or a (years: a for annum) whose pronunciation is ambiguous. Rare combinations, like am for attometres, pose the same problem.

But it is a problem that the brute-force remedies of <abbr></abbr> and <acronym></acronym> elements cannot solve. There is no element in HTML for ambiguously-pronounced units of measure. It must also be pointed out that nondisabled people reading printed text may have to stop and think for a moment to decode an ambiguous metric symbol, too. In this case, a screen-reader user may not be significantly worse off than a nondisabled person.

Merely on the face of it, the definitions of the <abbr></abbr> and <acronym></acronym> elements preclude using them for symbols of any kind. Just use the symbol by itself, properly escaped as a numeric or literal entity. Don’t write <abbr title="pounds sterling">£</abbr> or <abbr title="plus or minus">±</abbr>.

(So what do you do if a symbol combines a character outside of the US-ASCII set alongside one that is from that set, as in °C? Best leave it alone. If <abbr></abbr> and <acronym></acronym> are too hamfisted to handle attometres, how can we expect them to handle degrees Celsius?)

Browser support

Support for <abbr></abbr> and <acronym></acronym> elements in real-world browsers is not fantastic. A case could be made that graphical browsers shouldn’t have to support <abbr></abbr> and <acronym></acronym> at all given that they’re intended for adaptive technology, but they’re useful enough to nondisabled people that some kind of interface to their underlying title="" attribute is in order. Besides, some browsers already provide visual support.

  1. Some versions of Internet Explorer on Mac and Windows pop up a tooltip when you hover the mouse over an item coded with <abbr></abbr> and <acronym></acronym>.
  2. Macintosh IE 5 attempts to render all acronyms in small capitals (indistinguishable from all-caps rendering for all-caps acronyms, but not all acronyms are written that way); Mac IE 5 does nothing special to indicate abbreviations.
  3. Some Windows versions render <acronym></acronym> but not <abbr></abbr>. Windows IE 4 truncated the <acronym></acronym> text down to the first word.
  4. Netscape 4 provides no interface for the title="" attribute on <abbr></abbr> and <acronym></acronym>. Netscape 6 and later and Mozilla show a tooltip for both. The latter displays both properties using dotted underlines – by far the most sophisticated approach extant.
  5. iCab on Macintosh – well-behaved as ever with advanced HTML – underlines <abbr></abbr> and <acronym></acronym> text and reveals the title="" attribute in the status line, and actually changes the hover cursor to remind you to look there.
  6. Lynx does nothing special for <abbr></abbr> and <acronym></acronym>.

Stylesheet issues

If you accept the premise that <abbr></abbr> and <acronym></acronym> provide accessibility in a catholic sense, to anyone who cares to learn the expansion of an abbreviation or acronym with or without the use of adaptive technology, you might as well make users of graphical browsers aware that <abbr></abbr> and <acronym></acronym> are actually available.

A quick and dirty way to do so is to set up a stylesheet that underlines <abbr></abbr> and <acronym></acronym> text. Are you aghast? Aghast that I would counsel the use of the typographic abomination known as underlining? Especially since we also use underlines for links, and a certain Danish-American usability “super-expert” has declared that designers should never make links confusable with anything else?

Well, bollocks. Selecting links in graphical browsers is a multi-stage process with feedback all the way.

Let’s envisage the worst-case scenario: Underlining <abbr></abbr> and <acronym></acronym> text causes everyone to mistake the text for links. When you hover the cursor over the text, there is no change to a link cursor. The cursor may otherwise change (as in iCab’s case), but not to a link cursor. If you’re using keyboard access, you’ll skip the <abbr></abbr> or <acronym></acronym> because it isn’t a link.

What’s the problem?

Short answer: There isn’t one. Indeed, there are advantages. Underlining abbreviations and acronyms makes them noticeable and invites visitors to poke at them, causing the desired tooltip to appear. The prime disadvantage is æsthetic, but we’re already underlining links hither and yon, so the waters are already muddied.

How do you do it? Add this to your stylesheet:

abbr, acronym { text-decoration: underline }

What we really need is a different kind of underlining, like dotted underlines, to denote metadata of this sort. Mozilla and Netscape 6 and later do just that, and it’s quite attractive. (If you use a stylesheet like the one just mentioned, those browsers will render a dotted underline and a continuous one, double-underlining your abbreviation or acronym.

And you can actually do that in a stylesheet:

abbr, acronym { border-bottom: 1px dotted gray }

Looks very nice.

Even rarer cases

It is incumbent on me to point out that we do find the occasional term written in capitals or otherwise resembling an acronym that actually is not. Is KISS an acronym or a clownish, superannuated rock embarrassment? The Macintosh file format PICT is not an acronym – and not much of an abbreviation, either. The spy game is replete with oddball acronym–abbreviation amalgams, none of which I particularly understand or will bother to define but which make for a fun list: ACINT, ACOUSTINT, COMINT, ELECTRO-OPTINT, FISINT, HUMINT, IMINT, IRINT, LASINT, MASINT, NUCINT, OSINT, RADINT, RF/EMPINT, RINT, SIGINT, TELINT. None of the examples in this paragraph require HTML accessibility tags as far as I'm concerned.

Also, the Web itself has spawned a rash of abbreviations taking alphanumeric form – e.g., the unholy conjoined twins favoured by marketing poseurs, B2B / B2C (business-to-business and -consumer); P2P (peer-to-peer); i18n (internationalization) and l10n (localization). Certainly those last two deserve proper <abbr></abbr> markup in any document a layperson is likely to read.

Small capitals deserve their own discussion. Some typographic style guides advise typesetting upper-case acronyms and any other sequence of capitals in small caps. Ostensibly, full-height capital letters are too conspicuous. I have never subscribed to this style, which, in my opinion, only ever works for acronyms that are pronounceable words, like NORAD.

Out in the real world of print typography, it is still too difficult to use true small-caps typefaces to set such passages; ignorant designers, or those with no other option, merely use full capitals a point or two smaller in size, which end up looking too light and spindly. Two methods are in use: Typing text and applying a Small Caps character attribute, or typing text and manually resizing it.

In any event, when such documents are “repurposed” for the Web (surely the most disagreeable word ever coined in the English language), invariably the small-cap styling is lost. A chief miscreant here is HTML export from Quark Xpress, a decidedly antitypographical program that practically begs a designer to use fake small caps.

The problem? If, somewhere along the copy chain, the acronym had been typed in lower case or mixed case and merely styled as small caps, unless extraordinary measures are taken that word will be exported to HTML in the original lower case or mixed case.

In the following example – from Playbackmag.com, quite real, and a textbook example of a worst-case scenario – the nonacronym CODCO is written in full capitals (it’s the KISS-like name of a defunct comedy group). MOW, meaning movie of the week, is rendered both correctly and incorrectly, most egregiously in the plural (“mows” – as in “Dad mows the lawn every couple of weeks”?). YTV and CBC (both TV networks) and U.S. are incorrectly written in lower case, while CBS comes through all right. The phrase “delayed cbs mow” is particularly rich. Note also the lack of heading markup on “MOW Power to them” and the absence of anything resembling italics (not <cite></cite>, not <em></em>, not even <i></i>) throughout. And if anyone’s keeping score, Kirstie Alley spells her name thus.

Their CODCO days are over, but the talented Greg Malone and Tommy Sexton are back at the cbc, this time to do the musical MOW for the Mother Corp., Adult Children of Alcoholics. Malone and Sexton are writing and starring in the movie. They have enlisted Paul Brown (I Love a Man in Uniform) to produce. The project was scheduled to shoot this month but has been delayed.

MOW Power to them

Power Pictures is cranking out the mows in association with Hearst Entertainment. And Then There Was One is a movie starring Amy Madigan and Dennis Boutsikaris for Lifetime in the u.s. Angela Bromstad is producing and David Jones is directing. Executive producer is Freyda Rothstein and supervising producer is Julian Marks.

CBS is getting David’s Mother, an mow starring Kirsty Alley that’s being serviced here. Director is Robert Allan Ackerman and producer is Bob Randall. Jennifer Allward is executive producing and supervising producer is Julian Marks. Production on David’s Mother starts Sept. 15.

The delayed cbs mow Ultimate Betrayal is back on track with director and executive producer Donald Wrye. The shoot is scheduled to begin Sept. 27. Producer is Julian Marks. Ultimate Betrayal stars Marlo Thomas, Ally Sheedy and Mel Harris.

As the outdated, gender-specific saying goes, “Men, don’t let this happen to you.” In cases like these, it is arguably less important to ensure that abbreviations and acronyms are marked up in their respective HTML tags than to ensure that the damned things come through in their proper case.

If you wish to retain the small-caps appearance, a stylesheet like the following will work:

abbr, acronym { font-size: smaller; letter-spacing: .1em; text-transform: uppercase }

which you may combine with my previous advice to yield:

abbr, acronym { font-size: smaller; letter-spacing: .1em; text-transform: uppercase; text-decoration: underline }

In this example:

In CSS1, there does exist a small-caps attribute, which sometimes works:

abbr, acronym { font-variant: small-caps }

Quotations and making editing obvious

Back in our halcyon youth, when we ran red lights in our Camaros (or Minis, or Renault 5s, as culturally appropriate), we unabashedly deployed an element called BLOCKQUOTE to indent snippets of text.

We are now all grown up and have various mortgages and certificates of equivalent-to-spouse status weighing us down, and we are sober enough to recognize that the <blockquote></blockquote> element, as it is now known in the minusculist orthography of XHTML, is reserved for quoting blocks of text. We feel shame for having pressed it into unnatural service in the early days of our dalliance with the Web.

We know, moreover, that indenting a section of text is as simple as adding style="margin-left: 2em" or something along those lines to whatever element we’re using, like <p></p>.

But how many of us know that blockquotes have a new younger sibling, and have added a piercing in a place not immediately obvious to outsiders?

Start with the new feature added to <blockquote></blockquote>. In XHTML, you are now encouraged to cite the URL of the document from which you are quoting. This of course presumes you’re quoting from a document that’s actually online.

The syntax is pretty simple: <blockquote cite=""></blockquote>. Just dump in whatever URL is appropriate between the quotation marks. Typical examples will look like this:

Browser support for this feature is terrible. Only iCab, Mozilla, and Netscape 6 and later give you access to the citation.

Now for the little sibling. XHTML provides structured markup for quotations used within paragraphs or in any similar block-level structure. Just wrap the quotation in <q> and </q>. Simple? Yes. If for some reason you wish to cite another Web page as the source of your quotation, you can add cite="", like so:

<q cite="http://www.newsbytes.com/pubNews/00/156859.html">Various quoted words</q>

The <q></q> element allegedly makes it easier for adaptive technology to isolate quotations from the surrounding text. Since the element offers readily-differentiated open and closed states, then yes, it is true that computer software can figure things out more easily. Yet it is spectacularly rare to find a case in which the start or finish of a quotation marked up with simple quotation marks is ambiguous.

Using <q></q> has certain heavily-qualified advantages in multilingual text: You can drop a French quotation into an English paragraph and, if you declare the quotation’s language using the lang="" attribute, a device might use proper French quotation marks rather than English. The sentence –

He was laying on the charm, telling me I had “a certain <q lang="fr"> je ne sais quoi </q>

– might be rendered thus:

He was laying on the charm, telling me I had “a certain «je ne sais quoi»”

But of course this would be improper typography in any context other than language teaching: The dominant language is the language of the paragraph, not the quotation, and its rules for quoting text prevail.

Moreover, current graphical browsers use double neutral quotation marks to render <q></q>, if they render it at all, and several do not, like Netscape 4. A failure to indicate quotations, through lack of support of a “correct” HTML element, is arguably worse than simply using neutral quotation marks. Not all English-speaking countries use double quotation marks as their first choice.

Proper typographer’s quotation marks, as preferred in the English language, are easily used:

If you’re writing in Latin-alphabet languages other than English (I suppose this applies to Greek and Cyrillic-alphabet languages as well), you will need language-specific quotation marks. Working on the assumption you know which character to use in the respective language but do not know how to encode it, here are nearly all the available options:

The numeric entities, at least for English-language characters, are very well-supported, even in Netscape 4, and are unique, hence entirely unambiguous, save for the double application of &#8217; or &rsquo; as apostrophe or closing single quotation mark. But that ambiguity is built into English. True, <q></q> can remedy that extremely modest and rarely-encountered confusion, but that’s hardly a persuasive advantage.

In Cascading Stylesheets Level 2, there is a way to control the quotation-mark characters used to surround <q></q> text. Use this declaration:

q {quotes: '\201C' '\201D' '\2018' '\2019'}

Real-world testing shows one must use the Unicode hexadecimal escaped character sequences, which are so complicated and extraneous I have avoided talking about them so far.

In the declaration, you must list, in order, the high-level opening, high-level closing, low-level opening, and low-level closing quotation marks. Hardly worth the bother. You can vary the declaration by language, should you use multilingual text, using q:lang(languagecode), as q:lang(en) for English or q:lang(fr) for French. I don’t see how this is really worth the bother, either.

Use <q></q> if you want. It won’t hurt – most of the time. But it’s not significantly better than neutral quotation marks and is generally inferior to typographer’s quotation marks.


I’m splitting the discussion of accessibility of links into two parts. In Chapter 8, “Navigation,” you’ll learn about the sexy accesskey and tabindex attributes. But for now, there are two very simple guidelines.

If you want to be thorough, don’t run two consecutive text links together without a printable character between them. Some early screen readers will enunciate the pair of links as one link, and for users of graphical browsers it is next to impossible to tell that adjacent words are different links. Webloggers love the effect produced by four or five separate consecutive links related to the same concept, but it’s inaccessible.

On the other hand, the case can be made that users of old screen readers really should upgrade; even consecutive links are actually perfectly discernible in the underying HTML: The sequence <a></a> <a></a> <a></a> is quite self-evidently a set of three self-contained hyperlinks.

In any event, which printed character should you use to separate links? In a navigation bar or some other region of text whose entire raison d’être is a set of links, a vertical bar (|) will do. (The character is sometimes hard to find on foreign-language keyboards. You can always use &#124; or &verbar;.)

For a better typographic appearance, an en dash (&#8211; or &ndash;) is very nice and is supported in every browser I’ve ever seen (using the numeric entity, at least – Netscape 4 chokes on the literal entity, to no one’s surprise). The pilcrow or paragraph symbol (&#182; or &para;) has its merits. Note that some screen readers have not yet been upgraded to understand every HTML character and may skip over these escaped entities. The goal of separating adjacent links will nonetheless be met.

The big myth

At the outset of this chapter, I warned of one big myth in the accessibility of text. It’s this: The most accessible sites are text-only.

It is not true. Maybe in 1996, sure, but adaptive technology and HTML have advanced a bit since then. Everything you find on the Web that isn’t text – images, multimedia, Flash – can be made at least partially accessible, and in the most common case, GIF and JPEG images, something broadly approaching full accessibility is possible. As I continue to emphasize in this book, only a tiny few access techniques require an overt visual form at all, and in the majority of cases you can provide a fully-accessible page with no changes to your layout whatsoever. Remember, we want beauty and accessibility.

Text-only parallel sites are to be discouraged. They’re too easily neglected. They assume you’re more disabled than you may actually be, and that you are alone: Providing text-only pages as the accessible form of your site assumes that disabled visitors have no use for visual appearance or structure, require something “separate,” and will never look at your pages accompanied by nondisabled people. What, you’ve never been to a business meeting? Blind people don’t have sighted family members? You’ve never had friends over and noodled around online?

A case can actually be made that custom-generated, media-dependent pages enhance accessibility. Stylesheets make provision for this in the aural, braille, and tty media types, among others. In Chapter 15, “Future dreams,” I discuss the options for automatic transformation of Web pages for optimized use with a screen reader. Yet these are not the same as stripping out all the pictures and typographic formatting from your site and calling what you did accessibility.

Sure enough, the separation of style and content espoused by advocates of cascading stylesheets shares many goals with text-only pages. Or seemingly so. But you can’t run stylesheets without HTML markup. And that’s really what people imagine when visualizing a text-only page: Stripping out not only the markup for images, layout, and typography, but every other fragment of markup, too. Goodbye, <h1>! Goodbye, everything, in fact – the entire World Wide Web turned into a mass of Readme files. But that sort of thing is de trop. It throws out the baby with the bathwater.

Moreover, masses of uninterrupted, unadorned text are confusing to learning-disabled people. Your “accessible” text-only page may be less accessible than your illustrated real page to this hard-to-accommodate group.

Text is the most accessible data format there is. But nobody authorized you to give us nothing but text in the name of accessibility. And after reading this book, you will know better than to try.

Bottom-Line Accessibility Advice

Basic, Intermediate, and Advanced accessibility

Previous   ¶   Contents   ¶   Next