Doing transcription is boring, tedious, frustrating, and exhausting. Like chopping firewood or teaching kindergarten, breaks are needed often, and it’s harder to drag one’s self back to the grind after each intermission.
Producing professional-level transcription takes an enormous amount of patience and effort. Depending on some of the factors that we’ll soon examine, an hour of content could take anywhere from three hours to ten hours to transcribe. We’re going to look at one element that affects the transcription process: the quality of the source.
There are some fundamental factors that determine how difficult content is to transcribe:
Consider the following examples. The first is an excerpt from a documentary called I Voted?. This is a snippet of the narrator introducing himself at the beginning of the film:
This is the rare situation where all four factors are optimal:
The audio quality is pristine
The speaker is clearly reading from a script
The content is easy to understand; no difficult language
The speaker is an experienced narrator with a naturally pleasing timbre to his voice
In fact, if you were to plug this example into an automated transcription engine, you’d probably get almost 100% accurate results.
Fire up your transcription software and try giving this one a whirl:
This is the less rare situation where all four factors are the most unfavorable:
The audio quality is very poor; the speaker may be wearing a lavalier mic or there may just be a room mic
The speaker is not reading from a script and his speech tends to wander
The content is unfamiliar to most people and is full of unfamiliar acronyms and obscure references
The presenter is not an experienced public speaker and does not enunciate many of his words
If you were to plug this example into an automated transcription engine, the results would be literal gibberish. In order to transcribe something like this, certain steps need to be taken by the transcriber:
The same transcriber should transcribe the whole series; over time, the transcriber will grow accustomed to the presenter’s elocution
The transcriber must research the terminology used and become familiar with the jargon
High-quality headphones must be used and an equalizer may be employed in order to optimize the clarity of the speaker’s voice
The latter example is an excerpt from a government training video on cybersecurity. The whole series is over six hours long…
Our next blog post will discuss other challenges of producing high-quality transcription and closed captioning. If you have any examples of exceptionally difficult transcription, send it our way and may feature it in our next post. Send to firstname.lastname@example.org
PS: In case you were wondering, here's what our cybersecurity friend was saying:
And the AIVA, this calls for corrective action and directs acknowledgment and gives you suspense for the first report, and it tells you compliance is done by the commander, next commander in the line. But I'll tell you right now, if you get a CCRI, if you're going to have an IAVB fixed, they're going to give you a <indistinct> for that. So, fix them. You don't have to fix them quite as fast, but make sure you don't let them hang out there for a long time.