Dubliners

If monopolies are undesirable – Microsoft was convicted of monopolistic (“anticompetitive”) behaviour – why has no one objected to the fact that the world now has but one viable search engine, Google?

Back in the old days, search engines “spidered” the full text of documents (Google still does) and also paid attention to the meta keywords element (Google still does). Unintended consequence of the Google monopoly: Its page-ranking system, in which pages linked to more often are presented higher in search results, has obscured the potential for self-categorization. Page rankings prevail; they are the only metadata with enough star power these days to be gossiped about on Page Six.

Now, is there another way to do things?

For some applications, yes. The Open Directory Project, which is pretty much dead in the water these days, gave us a hint of an idyll that would remain unattainable. The ODP’s function of categorizing the Web has legs; sometimes you want to read everything there is on a topic. (Did you just get diagnosed with a disease? Are you suddenly interested in, say, Islam or anthrax?)

The Open Directory Project unwittingly concedes its fatal flaw right up front: It is “the largest, most comprehensive human-edited directory of the Web.” You can’t rely on human beings to scour the Web. Everyone burns out, and anyway, how are they going to do it? By searching Google, which already uses ODP data anyway, triggering a dizzying feedback loop?

What we need is for Web pages to categorize themselves, which categorizations could then be computer-read and -collected. It’s already possible, but it ain’t happening. Dublin Core metadata can be added to any Web page and allow you to categorize their subject-matter on a range of criteria, including Library of Congress and other bibliographic classifications (much more applicable to the Web than is generally known), author name, free-form subject text, and more.

Pretty much any idiot can create Dublin Core metadata using the sexy (it is Swedish) Dublin Core Metadata Template. Just fill in the blanks and out pops a set of meta tags ready for inclusion in a page. (They’re HTML 4–compliant; you need to convert to lowercase and close with space-/> for XHTML.)

This process does not remove the human element; it merely uses the human element where it already exists. You’re already creating your page or template; just add this data during the creation process. It isn’t onerous, and even a multi-topic Web page, like the homepage of a Weblog, can nonetheless be categorized.

Now, if everybody used Dublin Core, the Open Directory Project (or its monopoly instantiation, Google Directory) could auto-link related sites instantaneously. The similar pages links on Google would actually mean something at that point. (We can never get them to work. The “similarity” criteria are never what we’re really interested in.)

Would there really be a need for data-visualization experiments like Langreiter’s or indeed the TouchGraph GoogleBrowser in this alternate universe?

Wouldn’t the Web categorize itself?

Yes, probably – if more than a few zealots online had ever bothered to use Dublin Core. (We use them, and we are sometimes accused of zeal.) It’s a character in search of an author at this point – an easy, systematic, virtually foolproof way to make it easier to find everything there is on a specific topic that, through its very disuse, makes the Web harder to use.

Dublin Core metadata represent an online standard – not one ratified by the World Wide Web Consortium, but they’re not the only game in town. Web authors committed to standards compliance should be committed to Dublin Core usage, too. Be the first on your server.

Posted on 2002-07-07