As noted over on the DOM Scripting site, the audio from the presentation I gave with Aaron at South by Southwest is now available for download.
It turned out quite well. The audio quality is good and neither Aaron or myself do too much uhm-ing and ah-ing.
But there is an inherent problem with publishing audio files on the Web. That problem is succinctly summarised in this comment accompanying an entry for an audio file over at Vitamin:
Is there anyway to get a transcription of this? I am deaf so an audio mp3 is not going to help me a bit.
There is another problem, which is that right now audio files can’t be indexed and searched. That problem is secondary to the accessibility issue but, as with so many accessibility solutions, a fix will benefit everybody.
I started looking into podcast transcription services. The most intriguing one I found was a site called Casting Words. It uses Amazon’s Mechanical Turk API to farm out the task of transcribing the audio content.
This is a textbook illustration of the kind of problem Mechanical Turk sets out to solve, namely the kind of problem that requires human beings rather than computers. Speech recognition, like language translation, is a service that is not going to be replaced by machines any time soon.
For the developers at Casting Words, the Mechanical Turk API works like any other Web Service. They send a request with the parameters for the task they want solved. Later, Mechanical Turk sends back a response. What sets it apart from other Web Services is the fact that the response is sent via wetware, rather than hardware. The response isn’t retrieved from a database or algorithm; it’s retrieved from a human brain.
Casting Words put together a simple front-end for all of this. Jeff Barr has written up the process of submitting a podcast for transcription. You choose which audio files you want to have transcribed, you supply any other useful information that might be relevant to the transcriber, and you’re sent off to PayPal.
They charge 42 cents per minute of audio. For the SXSW presentation, which is an hour long, that works out at just over $25. That seems like a reasonable price to me.
I submitted the presentation audio, sat back and waited. A few days later I got an email with the finished transcription. It came in three different formats: Rich Text, HTML, and plain text. You can view the results for yourself.
Overall, it’s very impressive. There were a couple of glitches, but let’s face it, the subject matter was particularly technical. Elvis Costello once said that talking about music was like dancing about architecture. Listening to a presentation about code — and attempting to transcribe it — is an equally quixotic endeavour.
Still, when you combine either the audio or the transcription with the presentation slides, you can follow along pretty well.
I spent about an hour going through the transcription and tweaking the occasional misheard phrase. I’ve posted the final transcript in the articles section.
For a one-off recording like this, getting a transcription was an easy, inexpensive option. If you gave a presentation at SXSW, I encourage you to do the same: if you were on a panel, you could even split the cost four or five ways.
But could this scale to cover regularly scheduled podcast episodes? I think so. It does cost money, but then so does bandwidth. Bandwidth is often covered by sponsorship or a PayPal tip jar, so why not transcriptions?
You could argue that, if anyone wants a transcription, they could commission one themselves. But then the time and effort is repeated. Whereas, if you provide a transcription, there’s just a one-off payment.
By providing a transcription, you’ll also be providing a spiderable resource than can be easily scanned, quoted, cut and pasted. And you’ll get lots of whuffie.