VTT


The VTT (Web Video Text Tracks) format is the standard subtitle format for HTML5 video. It offers more advanced features than SRT, such as styling and positioning.

Format

  • The file must start with the header WEBVTT.
  • Subtitle blocks (cues) are separated by blank lines.
  • Each cue consists of:
    1. Cue Identifier (Optional): A string identifying the cue (e.g., 1, intro).
    2. Timecode: The start and end time of the subtitle, separated by -->.
      • Standard format: mm:ss.mss or hh:mm:ss.mss.
      • Leniency: LingoHub supports both period (.) and comma (,) as the decimal separator for milliseconds, and flexible digit counts (1-3 digits) for milliseconds.
      • Cue Settings: Optional settings (e.g., align:start size:50%) can be added after the timestamp.
    3. Payload: The subtitle text content.
  • Comments: Use NOTE blocks to add comments.
  • Styling: STYLE blocks are preserved.
  • Key Mapping: In LingoHub, the timecode line (including any settings) is used as the key for the segment (e.g., 00:00:00.000 --> 00:00:02.500 align:start).

Used by

  • HTML5 <track> element
  • Modern web video players (Video.js, Plyr)
  • Most streaming platforms

Examples

Here is a valid VTT file example showing various features like comments and text identifiers:

WEBVTT

NOTE
This is from a talk Peter gave about WebVTT.

Slide 1
00:00:00.000 --> 00:00:10.700
Title Slide

Slide 2
00:00:10.700 --> 00:00:47.600
Introduction by LingoHub GmbH

Slide 3
00:00:47.600 --> 00:01:50.100
Impact of Captions on the Web

References