NNSquad - Network Neutrality Squad

NNSquad Home Page

NNSquad Mailing List Information

 


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ NNSquad ] An Experiment with YouTube's New Auto-Captioning



               An Experiment with YouTube's New Auto-Captioning

                 http://lauren.vortex.com/archive/000686.html
                

Greetings.  In a move of potentially enormous positive importance to
hearing-impaired Internet Users, Google's YouTube today announced the
deployment of their "auto-captioning" capability across the entire
universe of YouTube videos ( http://bit.ly/dBbPyZ ).

The rapid expansion on the Internet of uncaptioned video has been
increasingly putting hearing-impaired users at a disadvantage, but the
decidedly nontrivial work required to caption videos, especially for
producers with limited resources, has until now greatly limited the
numbers of YouTube vids that were available with full or even partial
captioning.

I've captioned some of my own videos in the past, both with the help
of rudimentary tools and completely manually, and I can definitely
attest to the fact that it can be quite tedious indeed.

So given today's YouTube announcement (and the discovery that some of
my YouTube videos were already enabled for auto-captioning), I decided
to run a quickie experiment using one of my previously uncaptioned
videos.

Automatic speech recognition is a very difficult task, especially in
the presence of music or noise, and YouTube notes that auto-captioning
must be expected to be imperfect.  I was interested in seeing how well
it would function to speed up the process of hand-tuning a completely
accurate captioning transcription.

Executive summary: It helps a great deal, to say the least!

The video I used for this experiment was my "Is Net Neutrality a
Communist Plot?" satire ( http://bit.ly/3eevIl ).

This video includes a number of aspects particularly useful for this
test.  While most of the voiceover is not accompanied by backing
music, there are sections of narration that are layered on music beds.
Also, the audio for almost the entire length of the production is
mixed with a purpose-built "noise" track to simulate a rotting old
reel of film, and there are various other audio timing artifacts that
I had manually introduced as well.

You can inspect the results by watching the video and
enabling captioning ( http://bit.ly/3eevIl ).

At the lower-right of the YouTube playback window is an upward
pointing arrow.  Hover over it (or click) and you'll see a "CC" option
that you can click to enable (it will then turn red).  Also, if you
hover over the small left-pointing arrow on the left side of the CC
option, you can choose between two captioning tracks.

"Hand-Tuned" is the default and is my final captioning track after I
corrected and tuned the results of YouTube's automatically-transcribed
captioning.  "Machine Transcription" is the actual and original
automatically-generated captioning track that YouTube generated on its
own.

The automatic captioning track obviously contains many errors (and is
rather humorous in places).  But we definitely must keep in mind (a)
that the presence of music and noise naturally degrades the
machine-transcription process, and especially (b) the enormous
time-saver that having this automatic track -- even with its errors --
represents when it comes to creating a hand-tuned and polished final
captioning track.

I can't emphasize this latter point enough.  Being able to use the
automatically generated track as a foundation allowed me to create a
finished captioning track in a fraction of the time that would have
been required when working with a script from scratch.

Obviously, expected results will vary from video to video, but I would
expect that many videos, particularly ones with quiet backgrounds,
will yield rather spectacular results with a minimum of hand tuning,
and in many cases will be highly useful to users without any tuning at
all.

And of course, we can expect that over time the quality of the
auto-captioning transcriptions will only improve.

This is a big day for Internet video accessibility in general, and for
YouTube in particular.  Kudos to the YouTube teams!

--Lauren--
Lauren Weinstein
lauren@vortex.com
Tel: +1 (818) 225-2800
http://www.pfir.org/lauren
Co-Founder, PFIR
   - People For Internet Responsibility - http://www.pfir.org
Co-Founder, NNSquad
   - Network Neutrality Squad - http://www.nnsquad.org
Founder, GCTIP - Global Coalition 
   for Transparent Internet Performance - http://www.gctip.org
Founder, PRIVACY Forum - http://www.vortex.com
Member, ACM Committee on Computers and Public Policy
Lauren's Blog: http://lauren.vortex.com
Twitter: https://twitter.com/laurenweinstein