Audio in Digital Humanities Authorship: A Roadmap (version 0.5)

by Jonathan Sterne on July 24, 2011

Background:

0.1. Existing digital humanities work has largely reproduced visualist biases in the humanities: work with images and audiovisual texts has been thus far assumed to be more primary than work with sound as a text. And yet, sound is one of the major areas where huge gains could be made in digital publication. As with audiovisual texts (film, tv, etc), “born digital” publication allows commentary to be situated alongside or directly within media that unfold over time. Moreover, sound scholarship stands the most to gain from born digital publication. While film and television scholars have long been able to summarize the developments in a scene (whether formal, textual or narrative) in prose because of the available commonplace abstractions in visual language, this road has been much more difficult in sonic fields, whether we are talking about music, speech, architecture, art, geography or other areas.

0.2. With flash and html5, there is a mess of audio standards online. There is no obvious single direction. The purpose of this document is to elaborate some possible principles that might be of use.

0.3. This document make specific reference to my resources and needs but those will eventually be edited out. It makes reference to Scalar because that’s the environment I’m working in.

0.4. This is a work in progress. Comments are welcome and will be credited and incorporated.

Assumptions:

0.5 The purpose of digital humanities scholarship is analysis and criticism of (or in) forms other than print on a page. This may include “production” but it doesn’t have to – it could be as simple as written commentary accompanying sound files on a page. For our purposes we will just deal with the “in” and not worry about the “of” (which is more about the design of research tools).

0.6 With the right ethic and an activist infrastructure, digital humanities platforms have the potential to offer authors unprecedented opportunities to exercise their fair use rights, specifically in the close analysis, criticism and transformation of audio recordings for the purpose of advancing knowledge.

0.7. Such a project requires that the author have control over content when necessary but otherwise technical concerns should not get the author out of the creative flow of writing/producing (the model here is blogging engines like WordPress).

0.8. Similarly, for readers or listeners, audio should be as seamless a part of the experience as possible. This is a little more complicated than images or video in humanistic scholarship. Although the web is an audiovisual medium, the imaginary seamlessness of image and sound in, e.g., radio and television or the subservience of visual to auditory process in, e.g., telephony does not exist. This is true for computers as sound media in general: people tend to turn off the audio cues in programs where they even exist (most famously in Microsoft Word). If a website makes a noise when you arrive it is often considered rude. I would guess that the most common reaction to unexpected sound online is to turn it off. Websites more often function as containers for sound files that happen at the will of the user. This is how sites like Soundcloud and Bandcamp work. Violation of the rule is only acceptable in context, as in the old myspace page where it was expected that music might begin to play upon arrival.

Therefore, a website with sound ought to announce itself as such ahead of time. Old flash-based sites for bands did this very well, where if they did have a soundtrack that came on automatically, there was usually a visible “off” or “stop sound” button.

Basic Functions:

1.1. Ideally, an audio player for a digital humanities publication should be audible across platforms, whether we are talking about computers and operating systems or mobile devices.

1.2. Because of competing web standards (especially the flash/html5 mess) a single format won’t be enough. Although mp3 is the most common audio format in the world, it won’t work without flash in Firefox. Similarly, Safari won’t play ogg files, and I suspect that Internet Explorer won’t either. In some cases, a lossless file may be needed, and right now it appears I don’t think there is a widely used player for .flac or alac files, which means .wav (and hoping the end user has a fast connection) is the best answer here.

1.3. The player should only translate formats when necessary and with minimal damage.

For example: Bandcamp does this well, requiring the musician to upload a lossless or .wav format file which it then encodes through its own algorithms into a wide range of possible formats.

This is great because it allows end users to either go with an acceptable sounding default format (realistically for now, this would be mp3 though perhaps in time this should be reviewed as standards change), or to choose another more to their liking.

1.4. Listeners should be able to start, stop and move around in the file at will.

This is probably the ideal for Scalar as well, since (eg) transcoding an mp3 into an ogg file is going to introduce some new artifacts that weren’t there before. Doing it a lot will make it more noticeable.

Admittedly, this may be more of an issue in terms of how browsers load and cache an audio file.

1.5. Whenever possible, audio should begin its travels to digital humanities publication as a .wav or lossless file. This will allow for maximum possibilities for editing and re-coding later on.

But lots of audio begins its life as mp3 these days (or m4a or ogg), in which case the original is already in a compressed format. The best approach here from a “preserve sound quality” standpoint (notice scarequotes) is a wrapper that allows the original format to play in the end user’s browser.

1.6. On the author’s side, the same sort of defaults ought to be in place, but they should also be defeatable. For instance, if I would like the listener to compare a wav file and an mp3 file because I’m discussing the format, the whole thing would fail if the player automatically converts the wav file to mp3 without giving me a choice.

1.7. The author should be able to loop audio, as well as set in and out points—atleast as defaults that could be defeated by a user.

1.8. An audio player ought to be able to display a still image or caption with the audio, alongside the time-based tagging that is already in Scalar.

Advanced Functions (none of these are essential but eventually someone might want them)

Author Functions:

2.1. It would be nice to have alternate tagging approaches. Soundcloud shows a waveform (a rough measure of amplitude) and allows comments as the waveform moves along. Soundcloud pitches this in terms of social media, as it is normally for listeners, not the musician, but it could easily be made useful as an authoring approach as well.

2.2. Even better would be a range of options for visualizing the audio and annotating it, such as spectral display, rhythm, etc. But this is probably not immediately necessary (I can easily drop a sound spectrogram into the text if I want to).

2.3. One could imagine a scenario where images could come and go or even videos over the course of an audio file, but at this point we are probably better off treating it as video.

2.4. The ability to set the relative volume of different clips.

2.5. The ability to choose between mono, stereo and multichannel audio. Though I don’t think we can assume listeners will have access to 5.1 sound with their computers or mobile devices, so then you’d need folddown techniques and a mess of other stuff.

Listener Functions:

2.6. Close listening: It would be outstanding if a user were able to manipulate the audio in some basic ways that mimic the nonlinearities inherent in “close reading practices:

Not only skipping around, but
a) speeding up, slowing down,
b) scrubbing,
c) marking in and out points,
d) and freezing.

All of the functions except freezing are basic random access functions; the only audio freezes I’ve heard have either involved spectral processing or granular sampling which is probably impossible to do online now. But it would be incredibly cool.

(As an author, I can do this for my audience but it’s a different thing if they can do it themselves.)

2.7. If the audio were available in different channel formats, it would be nice if the listener had a choice to compare.

(All of this sounds esoteric until someone wants to actually do an historical analysis of stereo and I know there’s at least one group of scholars working on this topic).

Previous post:

Next post: