In these uses the sound follows the text or presentation (i.e. the visual part of the presentation is primary, the sound is secondary). However, there is a set of applications in which sound occupies the central role. We can call this "hyperspeech," built on the analogous "hypertext." In this case the continous speech stream (or soundtrack) is annotated or punctuated by links to other entities. Perhaps the most common use of hyperspeech is time-aligned text, described (in part) as part of the <timeline> and <when> entities in chapter 11.3.2 of the TEI P3 standard. This standard allows the specification within a text of absolute or relative timepoints, permitting thereby the coindexing of the text with sound.
There is little software that supports such an application directly (correct me if I'm wrong!). That is, there is no off-the-shelf software implement the TEI <timeline> standard. Many technical issues surrounding the representation of the sound and the indexing still need to be addressed. In addtion, sounds, like graphics, require high bandwidths to transmit and to manipulate, creating new technical issues for WWW-based applications. (For an interesting approach to solving one dimension of this problem, see http://www.voyagerco.com/cdlink/.).
There are a wide variety of applications for such hyperspeech technology; I
mention below a few that I happen to be aware of.
[I am working in Paris on a project to archive recordings of speech (made by linguists in the course of field work) along with their (synchronized) text transcriptions. Analogue recordings will be digitized and stored (probably on CD) with time-aligned phonetic transcriptions, interlinear glossing, and translations.]