A hyperlink knows no depth

If you believe, as nearly all sentient beings not sitting on the bench in Denmark do, that hyperlinks on the Web are mere statements of fact and do not constitute theft or infringement of the information they link to, are you really prepared for the ramifications of those claims?

Are links really “public facts”? (Cory Doctorow: “The recitation of public facts – this document exists at this location – is never an infringment.”) But the true fact being recited is the URL. A hyperlink hides the URL. Some browsers or devices will display the URL under certain conditions, but by definition and by default, a hyperlink (an anchor) displays the link text and not the link URL. The HTML 4.01 specification for the a(nchor) element reads: “href... This attribute specifies the location of a Web resource, thus defining a link between the current element (the source anchor) and the destination anchor defined by this attribute.” The recitation of the fact that this document exists at this location can be dependably found only in the HTML source code where the href attribute is visible, not in a browser rendering.
1. It’s easy to suppress display of a URL in typical graphical browsers. Simple JavaScript handles that task, abetted by a title="" attribute to suppress tooltips. Under these conditions, which are not hard to duplicate on real-world sites, you the user have no idea where you are being sent until you go there.
2. To truly recite the fact that a document exists at a certain location, its URL would have to be manifest in plain text that any browser or device would invariably display according to HTML specifications. (In practice, that would also require a stylesheet declaration other than display:none and the use of discernible colours in graphical browsers, among other display characteristics.) Indeed, in the MPAA v. 2600.com case, such was one of the ramifications:
  
  Judge Kaplan was looking at the impact of the injunction on the hyperlink ban and determined that a hyperlink is functional more than expressive because the point of the href code that creates the link is meant to provide instructions to the browsers and servers to deliver a specific page located on another server.
  
  Both the trial court and the appellate court made a huge issue that hyperlinks make “the materials ... available for instantaneous worldwide distribution” and that “the linked Web site is just one click away.” What would happen if the Web site merely provided the URL without a hyperlink? In reality, the court might find it two identical in purpose. But the functionality has been taken out and perhaps the URL is more like protected speech. Some news organizations have already begun to refer to the URL without hyperlinks.
  
  2600 was enjoined from linking to “prohibited” information (the DeCSS DVD copying mechanism). By replacing the <a href="URL"></a> hyperlinks with plain text, that is, by publishing the URL in a directly human-readable format, the function of linking vanished.
  
  The converse must also be true. To use a hyperlink hides the URL. It may be disclosed for convenience in the status line of some browsers some of the time, but the only place it resides permanently is in the HTML source code.
3. URLs are, moreover, even more thoroughly hidden in quotations (q) and blockquotations (blockquote), both of which can take a cite="" attribute listing the source URL, if it exists and is relevant. Few browsers and devices give access to the URL in those cases. (iCab, Netscape 6 and later, and Mozilla do.) It’s not widely understood that a hyperlink is not the only way to cite an online source. A blockquote element is by definition not a hyperlink. Deep-linking objectors seem unconcerned about this form of citation, equally discernible in HTML source code.
4. A case could easily be made that publishing a URL in HTML source code actually is reciting a fact about the location of a document. In theory, anyone can look at the HTML source code. And furthermore, this argument strengthens the position of those opposing limitations on linking:
  1. The only place one is actually publishing a URL is in source code, which most people will ignore.
  2. Anyone making the claim that hyperlinks harm the destination site would have to recognize that the only place one may reliably find the URL is in the source code, which nearly everyone will ignore.
  3. The overt manifestation of the URL is actually a functional hyperlink, taking the issue into a rather different realm of freedom of expression (the freedom to create a tool rather than to publish written language). A portion of the 2600 court decision was summarized thus:
    
    A hyperlink is a cross-reference (in a distinctive font or color) appearing on one Web page that, when activated by the point-and-click of a mouse, brings onto the computer screen another Web page. The hyperlink can appear on a screen (window) as text, such as the Internet address (“URL”) of the Web page being called up or a word or phrase that identifies the Web page to be called up... Or the hyperlink can appear as an image, for example, an icon depicting a person sitting at a computer watching a DVD movie and text stating “click here to access DeCSS and see DVD movies for free!” The code for the Web page containing the hyperlink contains a computer instruction that associates the link with the URL of the Web page to be accessed, such that clicking on the hyperlink instructs the computer to enter the URL of the desired Web page and thereby access that page. With a hyperlink on a Web page, the linked Web site is just one click away.
    
    In applying the DMCA to linking (via hyperlinks), Judge Kaplan recognized, as he had with DeCSS code, that a hyperlink has both a speech and a nonspeech component. It conveys information, the Internet address of the linked Web page, and has the functional capacity to bring the content of the linked Web page to the user’s computer screen (or, as Judge Kaplan put it, to “take one almost instantaneously to the desired destination”)... The linking prohibition applies whether or not the hyperlink contains any information, comprehensible to a human being, as to the Internet address of the Web page being accessed. The linking prohibition is justified solely by the functional capability of the hyperlink.
    
    Hyperlinks are not purely expressive; they are also functional.
5. Using an ordinary hyperlink – <a></a> tags surrounding text or an image or both – is not a pure recitation of the fact that a document exists at a location. It is a declaration that another document will load when the hyperlink is activated. The identity of that document is declared conclusively in source code and only optionally in other ways.
Are all hyperlinks equal? Is there really such a thing as a “deep” link?
1. The length or scheme of a uniform resource locator or identifier (“URI”) is irrelevant according to the HTML specification and RFC 2396. A short URL like http://a.dk (11 characters, and, curiously, in Denmark) or a gigantic one like http://directory.google.co.uk/ alpha/Top/Business/ Industries/Transportation/ Forwarding,_NVO_and_Customs/ Air_and_Ocean_Freight_Shipping/ Freight_Forwarders/ Non_Vessel_Operating_Common_Carrier_NVOCC/ (195 characters) have equal validity.
2. URL scheme and data type are irrelevant. There is no difference, as far as the structure and protocols of the Web are concerned, between an http: and an ftp: link, or among .html, .pdf, and indeed .xls documents, among an unlimited number of other data types.
3. Degree of URL specificity is irrelevant. Fragment identifiers (using id="" and/or name="" attributes) can constitute full citation. Even some “proprietary” formats can be identified as a sum of parts – e.g., specific pages in PDF files can be linked to directly, and within a PDF, the equivalent of fragment identifiers (using the same link syntax) is readily available.
4. Hostnames introduce variability.
  1. Very-well-configured Web sites function with a hostname of w., ww., www., wwww., or nothing at all in front of the top-level domain. (Reasonably-well-configured sites, like contenu.nu, can handle either nothing or www. at the front.)
  2. If, according to opponents of deep linking (see page of deep-link coverage), only the homepage may be linked, is the homepage www.example.com, example.com, w.example.com, ww.example.com, or wwww.example.com?
  3. If a hostname redirects to a directory structure – Newsworld.CBC.ca redirecting to http://www.cbc.ca/newsworld/ – hasn’t the site owner enacted its own deep link? (Apart from the obvious changes, http:// is prefixed and / is suffixed.) If deep links of this sort can be computer-generated, how are they different from deep links derived from search engines (also computer-generated)?
It is claimed that linking directly to a graphic on a page is bad practice. (Or the same claim is made about linking to a specific frame – often much more convenient – or iframe.) There’s no difference between the admonition “Do not link directly to graphics on my site” and the admonition “Do not link anywhere but the homepage of my site.” In both cases, the party issuing the admonition attempts to control deep links. But there is no such thing as a deep link; just as a virus knows no morals, a hyperlink knows no depth.
There is no distinction between linking to a page and linking to a graphic on a page. If the intent is to siphon your bandwidth rather than someone else’s, then hyperlinks are no longer an issue and theft of bandwidth is. If a “respectable” editorial purpose is served by the direct link to the image – e.g., if you wish to publish review or commentary of an image and not the page surrounding it, as permitted by copyright law – you may be forced to link directly to the image file because there is no other way to specify the image.
Webmasters who find this happening to them should add an id="" attribute to their img tag and give the URL of the page plus that id to the person originally linking to the image. It then becomes possible to uniquely identify the image while loading the whole page. Compliant browsers will scroll directly to the anchored image.
Is misrepresentation an issue?
1. “Framing” Web content is allegedly wrong.
  
  Framing is the process of allowing a user to view the contents of one website while it is framed by information from another site, similar to the “picture-in-picture” feature offered on some televisions. For example, a user of a search engine may view the contents of an online store that is framed by the search engine’s text and logos.
2. But there are, in fact, two kinds of frames: frames and iframes. Banner ads are routinely served via iframes, and back in the olden days frames were commonly used to isolate banner ads in permanent view of the hapless site visitor (uncommon today). Banner ads are usually served from other sites. So framing other people’s content is not, in fact, always wrong.
3. Permission then becomes an issue. Framing other people’s content without permission is often unintentional, as with some blog-hosting services (Pitas.com, this means you).
4. Objections to framing without permission begin to hold water if and only if the framing is done under false pretenses, in which case deep linking is irrelevant. Commercial misrepresentation then becomes the issue.
Is medium an issue?
1. To include a “deep” link in a printed document, as is commonly done in research papers, one must print it in full.
2. Here the URL becomes purely expressive; its functional character relies on the actions of the reader, which must be carried out somewhere other than the document bearing the citation. (Online, you still have to act to visit the URL, but you are acting on the document that cites it. In the print medium, you schlep the citation with you to the library and dig up the source, in whatever medium.)
3. URLs can be and are publicized in other media. Radio and television are obvious, but URLs appear in film, too. One case: Studio bumpers, which can then trigger re-citation (“recitation”) in another medium, audio description. (“A Web address appears: www.universalstudios.com.”) Audio description can itself introduce a URL not found in the source material. (“For more information on motion-picture access, visit our Web site, www.MoPix.org.”)
4. Opponents of deep linking might find it tricky to object to online publication of URLs in either plain text or as hyperlinks when that is the only way to cite them in print and other media.
Is longevity an issue?
1. The construct “The recitation of public facts – this document exists at this location – is never an infringment” unduly overlooks the fact that items identifiable by URLs can have short lifespans. URLs can die. Easiest example: mailto: URLs, which expire frequently. Or discussion forums, where server-space restrictions or owner policies restrict the lifespan of archives.
2. If it’s OK to cite the URL as reference to a document that exists, it is also OK to do so for documents that no longer exist. We suppose this does not come up very often in the print medium (someone jots down a citation to the only copy of a source, which then is destroyed?), but it certainly comes up in broadcasting. Live newscasts, for example, are essentially never repeated. It is possible to cite a broadcast for which no record exists. The citation can still be true and accurate: This statement was made on this broadcast on this station at this day and time.
3. Accordingly, one may deep-link to Web or online resources that no longer exist. One example: A parody site, or a site publishing “rumours,” that is taken “offline” through threat of legal action; one may publish a screenshot of the site or recapitulate facts from or about it, complete with a link to the item that no longer exists.
4. The URL then becomes almost purely expressive; the tool function of a hyperlink disappears (except inasmuch as it might prompt a 404 message). Another example: Citing, for purposes of review and commentary, a discussion-group posting that has been deleted.
5. As is generally misunderstood, if anything is posted anywhere online, it can be cited or discussed. The general advice is “If you don’t want to get blogged, don’t post.”

Consequences

The consequences here are much larger than is widely accepted.

Everything that was ever published online in any format comprehended by URL syntax can be cited or linked to.
There is no such thing as a deep link.
Hyperlinks are not the only form of online citation.
A URL is not an uncomplicated recitation of a fact.
Hyperlinks are expressive and functional.

Posted on 2002-07-19