Improving Video Accessibility with WebVTT

Home > Designing, Others > Improving Video Accessibility with WebVTT

Improving Video Accessibility with WebVTT

July 17th, 2019 admin Leave a comment Go to comments

“The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect.”
– Tim Berners-Lee

Accessibility is an important element of web development, and with the ever-growing prevalence of video content, the necessity for captioned content is growing as well. WebVTT is a technology that solves helps with captioned content as a subtitle format that integrates easily with already-existing web APIs.

That’s what we’re going to look at here in this article. Sure, WebVTT is captioning at its most basic, but there are ways to implement it to make videos (and the captioned content itself) more accessible for users.

See the Pen
VJJMZz by Geoff Graham (@geoffgraham)
on CodePen.

Hi, meet the WebVTT format

First and foremost: WebVTT is a type of file that contains the text “WebVTT” and lines of captions with timestamps. Here’s an example:

WEBVTT

00:00:00.000 --> 00:00:03.000
- [Birds chirping]
- It's a beautiful day!

00:00:04.000 --> 00:00:07.000
- [Creek trickling]
- It is indeed!

00:00:08.000 --> 00:00:10.000
- Hello there!

A little weird, but makes pretty good sense, right? As you can see, the first line is “WEBVTT” and it is followed by a time range (in this case, 0 to 3 seconds) on Line 3. The time range is required. Otherwise, the WEBVTT file will not work at all and it won’t even display or log errors to let you know. Finally, each line below a time range represents captions contained in the range.

Note that you can have multiple captions in a single time range. Hyphens may be used to indicate the start of a line, though it’s not required and more stylistic than anything else.

The time range can be one of two formats: hh:mm:ss.tt or mm:ss.tt. Each part follows certain rules:

Hours (hh): Minimum of two digits
Minutes (mm): Between 00 and 59, inclusive
Seconds (ss): Between 00 and 59, inclusive
Milliseconds (tt): Between 000 and 999, inclusive

This may seem rather daunting at first. You’re probably wondering how anyone can be expected to type and tweak this all by hand. Luckily, there are tools to make this easier. For example, YouTube can automatically caption videos for you with speech recognition in addition to allowing you to download the caption as a VTT file as well! But that’s not it. WebVTT can also be used with YouTube as well by uploading your VTT file to your YouTube video.

Once we have this file created, we can then embed it into an HTML5 video element.

<!DOCTYPE HTML>
<html>
  <body>
    <video controls autoplay>
      <source src="your_video.mp4" type="video/mp4"/>
      <track default kind="captions" srclang="en" label="English" src="your_caption_file.vtt"/>
    </video>
  </body>
</html>

The tag is sort of like a script that “plays” along with the video. We can use multiple tracks in the same video element. The default attribute indicates that a the track will be enabled automatically.

Let’s run down all the attributes while we’re at it:

srclang indicates what language the track is in.
kind represents the type of track it is and there are five kinds:
- subtitles are usually translations and descriptions of different parts of a video.
- descriptions help unsighted users understand what is happening in a video.
- captions provide un-hearing users an alternative to audio.
- metadata is a track that is used by scripts and cannot be seen by users.
- chapters assist in navigating video content.
label is a title for the text track that appears in the caption track
src is the source file for the track. It cannot come from a cross-origin source unless crossorigin is specified.

While WebVTT is designed specifically for video, you can still use it with audio by placing an audio file within a element.

Digging into the structure of a WebVTT file

MDN has great documentation and outlines the body structure of a WebVTT file, which consists of up to six components. Here’s how MDN breaks it down:

An optional byte order mark (BOM)

The string “WEBVTT“

An optional text header to the right of WEBVTT.

There must be at least one space after WEBVTT.

You could use this to add a description to the file.

You may use anything in the text header except newlines or the string “-->“.

A blank line, which is equivalent to two consecutive newlines.

Zero or more cues or comments.

Zero or more blank lines.

Note: a BOM is a unicode character that indicates the unicode encoding of the text file.

Bold, italic, and underline — oh my!

We can absolutely use some inline HTML formatting in WebVTT files! These are the ones that everyone is familiar with: , , and . You use them exactly as you would in HTML.

WEBVTT 00:00:00.000 --> 00:00:03.000 align:start This is bold text 00:00:03.000 --> 00:00:06.000 align:middle This is italic text 00:00:06.000 --> 00:00:09.000 vertical:rl align:middle This is <u>underlined text</u>

Cue settings

Cue settings are optional strings of text used to control the position of a caption. It’s sort of like positioning elements in CSS, like being able to place captions on the video.

For example, we could place captions to the right of a cue timing, control whether a caption is displayed horizontally or vertically, and define both the alignment and vertical position of the caption.

Here are the settings that are available to us.

Setting 1: Line

line controls the positioning of the caption on the y-axis. If vertical is specified (which we’ll look at next), then line will instead indicate where the caption will be displayed on the x-axis.

When specifying the line value, integers and percentages are perfectly acceptable units. In the case of using an integer, the distance per line will be equal to the height (from a horizontal perspective) of the first line. So, for example, let’s say the height of the first line of the caption is equal to 50px, the line value specified is 2, and the caption’s direction is horizontal. That means the caption will be positioned 100px (50px times 2) down from the top, up to a maximum equal to coordinates of the boundaries of the video. If we use a negative integer, it will move upward from the bottom as the value decreases (or, in the case of vertical:lr being specified, we will move from right-to-left and vice-versa). Be careful here, as it’s possible to position the captions off-screen in addition to the positioning being inconsistent across browsers. With great power comes great responsibility!

In the case of a percentage, the value must be between 0-100%, inclusive (sorry, no 200% mega values here). Higher values will move the caption from top-to-bottom, unless vertical:lr or vertical:rl is specified, in which case the caption will move along the x-axis accordingly.

As the value increases, the caption will appear further down the video boundaries. As the value decreases (including into the negatives), the caption will appear further up.

Tough picture this without examples, right? Here’s how this translates into code:

00:00:00.000 --> 00:00:03.000 line:50% This caption should be positioned horizontally in the approximate center of the screen.

00:00:03.000 --> 00:00:06.000 vertical:lr line:50% This caption should be positioned vertically in the approximate center of the screen.

00:00:06.000 --> 00:00:09.000 vertical:rl line:-1 This caption should be positioned vertically along the left side of the video.

00:00:09.000 --> 00:00:12.000 line:0 The caption should be positioned horizontally at the top of the screen.

Setting 2: Vertical

vertical indicates the caption will be displayed vertically and move in the direction specified by the line setting. Some languages are not displayed left-to-right and instead need a top-to-bottom display.

00:00:00.000 --> 00:00:03.000 vertical:rl This caption should be vertical.

00:00:00.000 --> 00:00:03.000 vertical:lr This caption should be vertical.

Setting 3: Position

position specifies where the caption will be displayed along the x-axis. If vertical is specified, the position will instead specify where the caption will be displayed on the y-axis. It must be an integer value between 0% and 100%, inclusive.

00:00:00.000 --> 00:00:03.000 vertical:rl position:100% This caption will be vertical and toward the bottom. 00:00:03.000 --> 00:00:06.000 vertical:rl position:0% This caption will be vertical and toward the top.

At this point, you may notice that line and position are similar to the CSS flexbox properties for align-items and justify-content, and that vertical behaves a lot like flex-direction. A trick for remembering WebVTT directions is that line specifies a position perpendicular to the flow of the text, whereas position specifies the position parallel to the flow of the text. That’s why line suddenly moves along the horizontal axis, and position moves along the vertical axis if we specify vertical.

Setting 4: Size

size specifies the width of the caption. If vertical is specified, then it will set the height of the caption instead. Like other settings, it must be an integer between 0% and 100%, inclusive.

00:00:00.000 --> 00:00:03.000 vertical:rl size:50% This caption will fill half the screen vertically.

00:00:03.000 --> 00:00:06.000 position:0% This caption will fill the entire screen horizontally.

Setting 5: Align

align specifies where the text will appear horizontally. If vertical is specified, then it will control the vertical alignment instead.

The values we’ve got are: start, middle, end, left and right. Without vertical specified, the alignments are exactly what they sound like. If vertical is specified, they effectively become top, middle (vertically), and bottom. Using start and end as opposed to left and right, respectively, is a more flexible way of allowing the alignment to be based on the unicode-bidi CSS property’s plaintext value.

Note that align is not unaffected by vertical:lr or vertical:rl.

WEBVTT 00:00:00.000 --> 00:00:03.000 align:start This caption will be on the left side of the screen. 00:00:03.000 --> 00:00:06.000 align:middle This caption will be horizontally in the middle of the screen. 00:00:06.000 --> 00:00:09.000 vertical:rl align:middle This caption will be vertically in the middle of the screen. 00:00:09.000 --> 00:00:12.000 vertical:rl align:end This caption will be vertically at the bottom right of the screen regardless of vertical:lr or vertical:rl orientation. 00:00:12.000 --> 00:00:15.000 vertical:lr align:end This caption will be vertically at the bottom of the screen, regardless of the vertical:lr or vertical:rl orientation. 00:00:12.000 --> 00:00:15.000 align:left This caption will appear on the left side of the screen. 00:00:12.000 --> 00:00:15.000 align:right This caption will appear on the right side of the screen.

WebVTT Comments

WebVTT comments are strings of text that are only visible when reading the source text of the file, the same way we think of comments in HTML, CSS, JavaScript and any other language. Comments may contain a new line, but not a blank line (which is essentially two new lines).

WEBVTT 00:00:00.000 --> 00:00:03.000 - [Birds chirping] - It's a beautiful day! NOTE This is a comment. It will not be visible to anyone viewing the caption. 00:00:04.000 --> 00:00:07.000 - [Creek trickling] - It is indeed! 00:00:08.000 --> 00:00:10.000 - Hello there!

When the caption file is parsed and rendered, the highlighted line above will be completely hidden from users. Comments can be multi-line as well.

There are three very important characters/strings to take note of that may not be used in comments: <, &, and -->. As an alternative, you can use escaped characters instead.

Not Allowed Alternative

NOTE PB&J NOTE PB&J

NOTE 5 < 7 NOTE 5 < 7

NOTE puppy --> dog NOTE puppy --> do

A few other interesting WebVTT features

We’re going to take a quick look at some really neat ways we can customize and control captions, but that are lacking consistent browser support, at least at the time of this writing.

Yes, we can style captions!

WebVTT captions can, in fact, be styled. For example, to style the background of a caption to be red, set the background property on the ::cue pseudo-element:

video::cue { background: red; }

Remember how we can use some inline HTML formatting in the WebVTT file? Well, we can select those as well. For example, to select and italic () element:

video::cue(i) { color: yellow; }

Turns out WebVTT files support a style block, a lot like the way HTML files do:

WEBVTT STYLE ::cue { color: blue; font-family: "Source Sans Pro", sans-serif; }

Elements can also be accessed via their cue identifiers. Note that cue identifiers use the same escaping mechanism as HTML.

WEBVTT STYLE ::cue(#middle cue identifier) { text-decoration: underline; } ::cue(#cue identifier 33) { font-weight: bold; color: red; } first cue identifier 00:00:00.000 --> 00:00:02.000 Hello, world! middle cue identifier 00:00:02.000 --> 00:00:04.000 This cue identifier will have an underline! cue identifier 3 00:00:04.000 --> 00:00:06.000 This one won't be affected, just like the first one!

Different types of tags

Many tags can be used to format captions. There is a caveat. These tags cannot be used in a element where kind attribute is chapters. Here are some formatting tags you can use.

The class tag

We can define classes in the WebVTT markup using a class tag that can be selected with CSS. Let’s say we have a class, .yellowish that makes text yellow. We can use the tag in a caption. We can control lots of styling this way, like the font, the font color, and background color.

/* Our CSS file */ .yellowish { color: yellow; } .redcolor { color: red; }

WEBVTT 00:00:00.000 --> 00:00:03.000 <c.yellowish>This text should be yellow.</c> This text will be the default color. 00:00:03.000 --> 00:00:06.000 <c.redcolor>This text should be red.</c> This text will be the default color.

The timestamp tag

If you want to make captions appear at specific times, then you will want to use timestamp tags. They’re like fine-tuning captions to exact moments in time. The tag’s time must be within the given time range of the caption, and each timestamp tag must be later than the previous.

WEBVTT 00:00:00.000 --> 00:00:07.000 This <00:00:01.000>text <00:00:02.000>will <00:00:03.000>appear <00:00:04.000>over <00:00:05.000>6 <00:00:06.000>seconds.

The voice tag

Voice tags are neat in that they help identify who is speaking.

WEBVTT 00:00:00.000 --> 00:00:03.000 <v Alice>How was your day, Bob? 00:00:03.000 --> 00:00:06.000 <v Bob>Great, yours?

The ruby tag

The ruby tag is a way to display small, annotative characters above the caption.

WEBVTT 00:00:00.000 --> 00:00:05.000 <ruby>This caption will have text above it<rt>This text will appear above the caption.

Conclusion

And that about wraps it up for WebVTT! It’s an extremely useful technology and presents an opportunity to improve your site’s accessibility a great deal, particularly if you are working with video. Try some of your own captions out yourself to get a better feel for it!

Not Allowed	Alternative
`NOTE PB&J`	`NOTE PB&J`
`NOTE 5 < 7`	`NOTE 5 < 7`
`NOTE puppy --> dog`	`NOTE puppy --> do`

Categories: Designing, Others Tags:

Comments (0) Trackbacks (0) Leave a comment Trackback

No comments yet.

No trackbacks yet.

You must be logged in to post a comment.

Multi-Line Truncation with Pure CSS Micro Frontends

RSS

Subscribe for latest Updates

Enter your email address:
Delivered by FeedBurner

Categories

Affiliate Programs

Designing

Domain Names

E-commerce

Internet Directories

Message Boards

Others

Programming

Promotion and Marketing

Scripts and Programming

Search Engines

Social Media

Softwares

Tips and Tutorials

Web Hosting

Webmaster Tools

Webmasters Resources

Website Design

Blogroll

Development Blog

Documentation

Plugins

Suggest Ideas

Support Forum

Themes

WordPress Planet

Archives

February 2025

January 2025

December 2024

November 2024

October 2024

September 2024

August 2024

July 2024

June 2024

May 2024

April 2024

March 2024

February 2024

January 2024

December 2023

November 2023

October 2023

September 2023

August 2023

July 2023

June 2023

May 2023

April 2023

March 2023

February 2023

January 2023

December 2022

November 2022

October 2022

September 2022

August 2022

July 2022

June 2022

May 2022

April 2022

March 2022

February 2022

January 2022

December 2021

November 2021

October 2021

September 2021

August 2021

July 2021

June 2021

May 2021

April 2021

March 2021

February 2021

January 2021

December 2020

November 2020

October 2020

September 2020

August 2020

July 2020

June 2020

May 2020

April 2020

March 2020

February 2020

January 2020

December 2019

November 2019

October 2019

September 2019

August 2019

July 2019

June 2019

May 2019

April 2019

March 2019

February 2019

January 2019

December 2018

November 2018

October 2018

September 2018

August 2018

July 2018

April 2018

January 2018

December 2017

November 2017

September 2017

August 2017

July 2017

June 2017

May 2017

April 2017

March 2017

February 2017

January 2017

December 2016

November 2016

October 2016

September 2016

August 2016

July 2016

June 2016

May 2016

April 2016

March 2016

February 2016

January 2016

December 2015

November 2015

October 2015

September 2015

August 2015

July 2015

June 2015

May 2015

April 2015

March 2015

February 2015

January 2015

December 2014

November 2014

October 2014

September 2014

August 2014

July 2014

June 2014

July 2013

January 2013

December 2012

August 2012

July 2012

June 2012

May 2012

April 2012

January 2012

November 2011

June 2011

March 2011

February 2011

January 2011

December 2010

November 2010

September 2010

July 2010

June 2010

May 2010

February 2010

December 2009

August 2009

July 2009

June 2009

May 2009

April 2009

March 2009

Meta

Log in

Webmasters Gallery

Improving Video Accessibility with WebVTT

Hi, meet the WebVTT format

Digging into the structure of a WebVTT file

Bold, italic, and underline — oh my!

Cue settings

Setting 1: Line

Setting 2: Vertical

Setting 3: Position

Setting 4: Size

Setting 5: Align

WebVTT Comments

A few other interesting WebVTT features

Yes, we can style captions!

Different types of tags

The class tag

The timestamp tag

The voice tag

The ruby tag

Conclusion

Categories

Blogroll

Archives

Meta