This YouTube dataset is a sampling from thousands of User Generated Content (UGC) as uploaded to YouTube distributed under the Creative Commons license. This dataset was created in order to assist in the advancement of video compression and quality assessment research of UGC videos.
What is a UGC video clip
UGC videos are uploaded by users and creators. These videos are not always professionally curated and often suffer from perceptual artifacts. For the purpose of this dataset, we've selected original videos with specific and sometimes substantial perceptual quality issues, like blockiness, blur, banding, noise, jerkiness, and so on.
Challenges in UGC
A common assumption of much video quality and compression research is that the original video is pristine (as in the top frame), and any operation on the original (processing, compression, etc) makes it worse. Most research measures how good the resulting video is by comparing it to the original. However, such an assumption breaks down in practice as most upoads are not usually pristine (as in the bottom frame).
- Around 1500 video clips with a duration of 20 seconds each.
- Animation, Cover Song, Gaming, HDR, How-To, Lecture, Live Music, Lyric Video, Music Video, News Clip, Sports, Television Clip, Vertical Video, Vlog, and VR
- 360P, 480P, 720P, and 1080P for all categories (except for HDR and VR)
- 4K for HDR, Gaming, Sports, Vertical Video, Vlog, and VR genres.
Subjective Quality Scores
- Mean Opinion Scores (MOS) available for all video clips.
- MOS for entire video clips
- All video clips were rated by 100+ subjects using crowdsourcing.
- The MOS range is [1, 5], where 1 means bad quality and 5 means excellent quality.
- MOS for chunks
- Additional MOS for three overlapping 10 second chunks (the first frame starts at 0, 5, and 10 seconds) are also provided to investigate influence of scene changes.
A measure of spatially correlated noise in the videos. These are typically observed in low light scenarios. The metric is defined in A perceptual quality metric for videos distorted by spatially correlated noise", by Chao Chen et. al., published in ACM Multimedia, 2016.
A measure of luminance or color quantization in a video often observed as perceptual bands. It is detailed in "A perceptual visibility metric for banding artifacts", by Yilin Wang et. al., published in IEEE ICIP, 2016.
An objective measure based on the natural scene statistics of a video. SLEEQ stands for Self-reference based LEarning-free Evaluator of Quality. It is defined in "A no-reference video quality predictor for compression and scaling artifacts", by Deepti Ghadiyaram et. al., published in IEEE ICIP, 2017.