Staying up to date on all the tech talk in streaming video is no easy feat. From new video codecs to platform-specific lingo, it can leave you feeling lost.
That’s why we’ve put together this glossary of video engineering terms to help you make sense of some of the more technical (and a few non-technical) video acronyms and words you may come across in discussions around streaming.
AAC, or Advanced Audio Codec, is the follow up to MP3, consider it MP4 for audio :). It is more efficient than MP3 so you get better sounding audio at the same rate (usually 128 or 192kbps). It is very widely supported in almost all devices.
Adaptive bitrate streaming (ABR) is a method of streaming video over HTTP. The source content is encoded at multiple bitrates (renditions), which are small segments or chunks of the overall video content. Segment size can vary anywhere from 2-10 seconds. The streaming client is alerted to the available renditions by a manifest file - .m3u8 playlist for HLS and MPD (Media Presentation Description) for DASH. The client requests a segment rendition based on the available bandwidth of the end-user device on a scaling ladder, starting with the lowest bitrate. As bandwidth increases, a higher bitrate segment will be requested, and conversely, as bandwidth decreases, a lower bitrate segment will be requested. To the end-user, content plays back smoothly without interruption, and bitrate changes may even go unnoticed.
All of the terms here are different containers, and containers are the thing that “contain” the codec, and have instructions on how to play the video file. An example is the container says where the metadata is for the video vs the actual video components themselves, vs the audio components like stereo and 5.1, etc. Some of the above are intended for studio use only and are made for editing very large files, while some of these are consumer facing and you play them on your iDevice every day.
Bitrate is defined as “ the number of bits per second that can be transmitted along a digital network.” but what it really means in video is the amount of data I can send you reliably for you to watch the video. If you want to watch a video in HD, it will take a higher bitrate than SD, but a lower bitrate than 4K because of the amount of information that’s trying to be sent to you. To put it another way, a 1080p stream will have a higher bitrate compared to a 720p stream because you’re sending much more information per frame. In general, you might see a 3Mbps bitrate on the 720 but a 6Mbps bitrate on the 1080 one.
A CDN (Content Delivery Network) is a distributed platform of servers that helps minimize delays in loading web page content by reducing the physical distance between the server and the user. A Video CDN is a specialized server network that delivers live and on-demand, high-quality videos to connected devices through caching--temporarily storing content on multiple servers so when a viewer request is made, the nearest server that has cached the content can deliver the video.
DRM is a way for companies to know that their video is not being pirated. Before DRM, if a video leaked, it could be uploaded and played by anyone. With DRM, content owners can decide who/when/how/what gets played. If they only want you to be able to watch something for a week, they can. Not watchable offline? No problem. Can’t watch outside the US? Done. DRM is also able to be used down to the user, so the same file can work for one login and not for another. It's very powerful but expensive, so mostly the valuable content (movies, TV Shows, etc.) gets DRMed.
Encoding is the process of taking a video signal (usually from a camera or mixing board) and “encoding” it into a stream of data. Think of it like taking the camera signal and putting it through some very impressive mathematical formulas so we can digitize it. One other goal of encoding is to reduce the amount of data in the signal yet still have it look great. For example, we might take a camera signal that's 1 gigabit per second of data and compress it to only 10 megabits per second of data. The encoder is the thing that is converting that video from analog to digital.
FFmpeg is a piece of free software that does the actual encoding. You type all of the settings that you want into a command line command, and then it runs and produces the finished video. It is very powerful but has a steep learning curve. The training wheels version of FFmpeg is called Handbrake and uses a GUI.
FPS is a measure of how many times per second a video frame is made. So if you consider a rate of 24fps, this means that each second of video is showing 24 distinct images. 24fps and 30fps are the most common rates used, whereas a rate of 60fps would be considered slow motion.
H.264 is a video codec, and is the most widely used one today. Almost every device (TVs, Mobile Phones, iPads, etc) can playback H.264 video files efficiently. It has been surpassed by a more efficient technology called H.265 or HEVC, but HEVC has been hampered by patenting issues. If you needed to encode a video to stream online, this would be your best bet to reach as many devices and users as possible.
H.265, or HEVC (High-Efficiency Video Codec), is the follow-up to H.264 and is the current “new thing” in Codec’s. Apple has provided support for it, which should help its adoption and usage, has a very complicated patent/license setup that makes it expensive and scary to put your whole video library into. Technically, H.265 can produce 2x as good of a picture at the same bitrate, or and equivalent picture at half the bitrate, when compared to H.264. This is due to the encoding complexity, it takes much more CPU to encode and decode H.265 vs H.264.
HTML5 allows you to embed videos directly onto a webpage with specific player adjustments, such as autoplay or automatic mute. Prior to HTML5, a plugin like Adobe Flash was required to play videos in a browser. HTML5 video format capabilities include three options to play: MP4, WebM, and Ogg.
Adaptive streaming means the same video is produced in different quality levels that the user can change between based on their bandwidth at the moment. If you’re alone in a coffee shop on the wifi you will probably get the highest quality stream. If a group of gamers come in and start using up all of the bandwidth then your quality level might drop and the video won't look as good. This way we don't have to try and pick a one size fits all approach, you can watch a video on your phone while on the bus and then when you home finish watching it on a big TV and both will look good.
HLS, or HTTP Live Streaming, is an HTTP-based media streaming communications protocol implemented by Apple Inc. as part of their QuickTime, Safari, OS X, and iOS software. It works by breaking the overall stream into a sequence of small HTTP-based file downloads, each download loading one short chunk of an overall potentially unbounded transport stream. As the stream is played, the client may select from a number of different alternate streams containing the same material encoded at a variety of data rates, allowing the streaming session to adapt to the available data rate. At the start of the streaming session, it downloads an extended M3U playlist containing the metadata for the various sub-streams which are available.
Latency in live streaming refers to the delay between the time a video stream is requested versus when it is delivered, or in other words, the delay between the time the video is captured and the time it appears on your screen. Low latency is typically defined as less than 100ms, at which speed, the delay is not noticeable by humans.
A linear livestream is a series of videos stitched together into a single stream and it essentially provides the same experience as a traditional TV channel, but it is delivered over the internet. When you tune into a linear livestream, you're watching whatever is live at that moment (as if you had turned on your TV). You cannot fast forward or skip through.
Metadata is the information about a video, such as title, year made, genre, etc. It can be made very rich and can help a lot with searching for videos based on keywords.
DASH is a container, and contains a CODEC, usually H.264 or H.265. The benefits of DASH over things like HLS is in the complexity of the manifest that is delivered with the video chunks. DASH made some great improvement over HLS, but is more complicated and therefore hasn't taken off as much as expected. Also, Apple has kept up HLS so it has stayed the dominant container. The largest companies use DASH, and so about 50% of internet video traffic is in the format.
An MRSS feed is a media feed of data that's used to syndicate videos from one company to many others. So if MTV wants to send our their videos for other people to watch on their site, they would send an MRSS feed and that’s how people would know where to get the video and all of other metadata (name, year,etc).
OGG is an open source and free version of MP3, so you don't need to pay any licenses to use it, it’s not as good as MP3, nor AAC. it has some device support but should be an option and not the only option for audio on your project.
OTT, or Over The Top video, is the idea of not using the cable box to send video into the home. It might seem weird in today’s world, but it used to be that if you wanted to watch things at home without a VHS/DVD player, you needed to watch it through your cable box. When the internet got fast enough, we started to see video delivered “over the top”, meaning not through the cable wire going into your home, but through the internet wire instead. This was a big problem for the cable companies, who wanted to be your only link to the video in your home.
RTMP is a streaming protocol that is primarily used to deliver content from an encoder to a video host. In order to deliver streams smoothly and transmit as much information as possible, it splits streams into fragments, and their size is negotiated dynamically between the client and server. It began as a proprietary protocol, owned by Adobe, but has since been opened up for public use.
SCTE 35 is a standard created by ANSI and the Society of Cable and Telecommunications Engineers that lays out guidelines for inline cue tone insertion in streams. Also referred to as markers, these signals indicate where content distributors could insert or splice content into a stream. For example, if a local news station received a stream from a national news broadcaster, the stream might be marked in areas where the station could replace national ads with others that were more relevant to the community.
SSAI, also referred to as DAI (dynamic ad insertion), is a means of dynamically inserting ads into video content.
Transcoding is the practice of converting a video from one format to another. For example, you have to create a certain file format to use on Apple devices, so a transcoder might take something created for XBOX and convert, or transcode, it for use on Apple devices. You can think of encoding as getting a signal from analog to digital, and transcoding from digital to digital.
VAST is a way for ad servers and video players to be reading and writing from the same playbook. It makes a standard for ads to be delivered so that the ad companies know the video player has all the information it needs, so there is no finger-pointing. Before VAST, everything was a custom implementation and things were much slower to get done.
A Video Content Management System is a tool that media companies use to tag and manage their video so they can find things easily and approve/deny things for air in the same system. They are very powerful and at the enterprise level can be set up to approve or deny a cable company or even individual user from accessing content based on date and last payment.
Video API is an umbrella term for any API that relates directly to the streaming and facilitating of video content. The API that a company elects to integrate with their video player or application will be the key determining factor that will dictate the quality of the end user’s viewing experience. There are many different types of video APIs with different purposes. A few examples are video platform APIs, content rule APIs, player APIs, authentication APIs, and video analytics APIs.
A Video Origin is where a media company keeps their video files. Some people keep them in a data center that they own and control, and some companies keep them on the public cloud, but it's the place that the CDN's go to receive the content from the content owner
Encoding codecs are the different formats that the analog to digital conversion can end up in. Examples are H.264/5, VP9, ProRes, etc. All of these have pros and cons that are a little out of scope, but generally, there is a trade-off between how much CPU it takes to encode/decode and how much power is available. For example, a CODEC might take lots of CPU to encode and decode but have a very small file size. The problem is your cellular phone only has so much battery, so while the picture looks great it's eating up all of your juice to watch it. The goal is to find a midpoint between CPU cycles and file size.
vMVPDs aggregate live and on-demand video content and deliver this content in the form of a linear channel over the internet. The user/viewer experience resembles traditional cable TV where you can browse through a TV guide or flip through channels to see what is on. A few examples of vMVPDs are Pluto TV, Youtube TV, and Sling TV.
VP9 is very similar to H.264 but was bought by google and is used on some YouTube videos because it takes less time to encode into VP9 vs H.264. It has been superseded by VP10/AV1.
AV1 is based on VP10, but is also the new codec du jour in the industry. It is royalty free and very efficient, although very cpu intensive. Expect to see this in products from Google/Facebook/Amazon etc.