Sometimes we figure we owe half the Internet to the Finns, the isolated, preternaturally shy nordic people who invented a whole raft of things we are presently too maudlin and shitfaced on chocolate to look up. Certainly shoephones that play “La Marseillaise” as ring tone while you’re out feeling maudlin and trying to get shitfaced on sushi immediately come to mind, but we forgive the Finns their trespasses.

An unheralded star in the Finnish firmament is Jukka Korpela, whose massive contributions to understanding basic HTML development and, more crucially, multilingual Web authoring are staggering in their thoroughness and are about as digestible and comprehensible as these topics, which can be and quickly are becoming the life’s work of thousands worldwide, could possibly hope to be.

If you have ever scratched your head attempting to figure out why a simple word in Spanish won’t display on your Web site, or have suddenly been faced with typesetting an omega online with no hope where to begin, Korpela’s your man.

He’s been bounced around a bit. He’s underappreciated, even in the homeland. But we appreciate him.

In fact, we appreciate Jukka Korpela so much we are engaging in an ongoing E-interview with him, rather after the manner of previous interviews (alpha, beta, gamma).

We are eager to rendez-vous with Korpela so as to get him maudlin and shitfaced on sushi and chocolate. And if his damn shoephone should go off mid-meal, we shall grit our teeth in forbearance and think of Sibelius.

Throughout the following E-interview, we have resisted the temptation to address Jukka Korpela by his allegedly endearing anglicized diminutive, Yucca. You should, too.

NUblog: Where do you think the knowledge gap lies when it comes to multilingual Web development, or even Web development in a language other than English? Is it character encoding, as your pages so thoroughly document, or something different?

Korpela: Let’s start from the simpler case: authoring in a language other than English. There are big differences in the difficulties. After all, there are thousands of languages, with different writing systems, and different people. For Western European languages, the difficulties are relatively small, though people often don’t know the simplest ways of doing things. For other languages that can be written using an 8-bit encoding, like Russian, there’s the difficulty of finding specific information and tools, but then it’s mostly smooth sailing.

For languages like Japanese or Chinese, the character encoding issues are certainly much bigger, but generally solvable under suitable guidance. But things get very difficult if someone needs to deal with authoring in an “exotic” language without actually knowing that language and its writing system. Such problems do arise, due to the division of labor that implies, at worst, that we have a computer specialist who is illiterate in a language, talking to a person who knows that language but is computer illiterate.

There’s much to be said about the way that different writing systems and encodings are handled on the WWW. Quite often, a page in an “exotic” language works in a particular cultural environment where that language is widely spoken but fails in the WWW context. But this is mostly not a problem of knowledge gap among authors; rather, the problems are in servers, browsers, and authoring software.

At a completely different level, there’s the problem that authors are mentally oriented towards authoring in English. But before going into that, I’d like to say that multilingual authoring is really very limited at present on the Web. Superficially, you might find sites that look multilingual. But quite often they contain just English summaries for some key pages, or things like that.

Real multilingualism on the Web requires adequate tools for that, and we mostly haven’t got them. It’s not that much a problem of producing pages but maintaining them. You can always pay some money to someone to translate your pages. Then what? Tomorrow you need to change a factual statement somewhere. How do you make sure it gets correctly changed in all versions, in 42 languages? Dealing with just two languages can be real hard, even if you have people who know the languages. It’s a matter of organizing, coordinating, and supervising the page maintenance process; that’s more or less just “yes, we should...” at present.

I’ll take an example, which is verbose, and probably the point is hard to see, but at present I haven’t got enough time to write a short answer, so you’ll have to deal with a longish one.

Today I visited a local zoo, which has some internationally interesting activities too, and might be interesting worldwide due to its specialities. They have actually fairly good general-purpose Web pages; not too much nonsense, some factual content, what else can you ask for? One thing they (i.e., their potential visitors) would need, a thing that would exceed their current level of expertise by a few orders of magnitude, is a multilingual collection of basic information about the species they have got. (I couldn’t help noticing that there were people speaking Spanish, Russian, and other languages, often without getting a clue of the texts, which are only in Finnish, Swedish, English, and German.) Not that difficult to produce, really, if you can afford to pay to professional translators, as they should.

But consider the situation where changes take place; in a zoo, that’s daily – e.g., some animals need to be taken out from public view due to sickness, repairs of buildings, or whatever. It is clearly impossible to do the changes “by hand.” Some system would be needed that lets a person (who maybe knows his native language only) just enter the information “the Asian lions have been taken to vets, they’ll hopefully be back b ...” somewhere, in a simple manner. What IT specialists should have been designed is a system that propagates this change to all the different language versions.

And with more thorough changes, like the addition of substantially new information – say, just a paragraph – there’s the even more difficult task of propagation. Some people need to do the translations, they need to be merged into the different language versions, and someone needs to supervise that this really happens.

Maybe this is why the few genuinely multilingual sites that there are on the Web are generally so static.

NUblog: What, in your view, has been the impact of the English-language (actually American) domination of the Web from the standpoint of international Web development?

Korpela: I think you mean the overall dominance of English, and the effect that people just write Web pages in English due to that.

Anyway, it is a fact that in several situations an organization (or a person) needs to produce a document in English. Anything else, including a version in one’s native language, might be just an optional extra. I’m thinking about scientific research reports but also various less scientific but technology-oriented things, as well as business-to-business pages by any company that has serious plans for international operations.

But the dominance goes far beyond that. Young people in Finland write their hobby pages (you know, those “me, my computer, and my links” pages) in English, without ever thinking about the question “who might possibly want to read these pages.” English is cool. English has prestige. Never mind that you can’t even spell “English” (but write “english,” for example); you still feel you need to create pages in English. If the Web had been invented in the Middle Ages, who would have even dreamt about authoring in a language other than Latin?

Perhaps Finland is not that typical. After all, we are probably more Americanized culturally than many other European countries, partly because that was a way of saying to ourselves and to others that we are culturally in the West, despite being under heavy USSR pressure politically. Moreover, Finns generally know the English language relatively well, and Finns don’t really expect to be internationally understood in their own language. So it might be different in, say, Germany, France, etc. But the trend seems to be to favor authoring in English even in cases where there is no rational reason to that.

Now get out the milk and cookies and read Part II.

Posted on 2001-07-11