Cost of storing videos as “good enough” frames

I am going to do something terrible — store videos in a SQL database!

I have some requirements that makes it an acceptable plan. I am building a “stream store” S. Once videos are stored in S, users should be able to fetch a video segment (in color or grayscale) between two given timestamps at a given FPS. One should also be able to annotate frame later e.g., “this frame has a face in it”. I’ll read the incoming video stream and save data as frames to a database (in addition to storing raw videos on a backup server).

The primary job of S is to provide frames for CV analysis, so serving pixel-perfect video stream is not a requirement.

  • The stored frames should be “good enough” can our CV algorithms/pipeline works without a performance drop.
  • The cost of storing frames should be as low as possible as long as the above requirement is met.

I am going to extract frames from video as JPEG — a lossy compression format. But what should be the quality of JPEG?

Experiments

I downloaded a sample mkv file — 1080p at 30 FPS to do simple analysis.

https://filesamples.com/samples/video/mkv/sample_1280x720.mkv

FPS 23.976
Duration 28.237
Size on disk 16.63 MB

I wrote a script that extract frames from the mkv using ffmpeg. The argument -qscale:v set the quality of frames: 2 is the best and 32 is the worst.

  • ffmpeg -i ../sample_1280x720.mkv '%04d.png' generates PNG folder which is 1.3 GB, almost 78x of original size. PNG is a lossless format. This is as bad as its get.
  • ffmpeg -i ../sample_1280x720.mkv '%04d.jpg' generates JPEG with default quality picked by ffmpeg. It generates 33 MB of data. Almost 1.94x more. Great!
  • ffmpeg -i ../sample_1280x720.mkv -qscale:v 2 '%04d.jpg' generates JPEG with best possible quality. The generated size is 188 MB (almost 11x more!).

I did the same on a different file recorded at 60fps (original size . The data is below.

Size of frames (various FPS)

A few things to note

  • A video recorded at 60FPS takes more than twice the size compared to fps=30 case. Note video are using codec H264 (MPEG-4 AVC (part 10) (avc1)) which AFAIK compresses pretty well if two consecutive frames don’t differ too much. So this is expected.
  • Our recordings have the same feature. The interesting events happens rarely. We might drop many frames from the database after doing a quick analysis to figure out if they contain something interesting or not. So in the end, we don’t even have to store so many frame.

I think qscale:v=20 is a good default for my use case. Also I don’t have to extract frames at the same rate as they are recording. I am interested in events at the timescale of ~100ms and anything faster than 20FPS is overkill. i can just extract at 30 fps.

ffmpeg has a handy cli option -filter:v "fps=30" to fix the extraction fps to 30. Here is bonus rust code that does this. Don’t copy-paste blindly, it may not work.

/// Explode the given video into JPEG frames.
///
/// - *path*: Path of video file
/// - *fps*: Extract these many frames per seconds. The video may contain
///   more or less frames in a second.
pub fn extract_jpegs<P: AsRef<Path> + std::fmt::Debug>(
    path: P,
    fps: u16,
    recording_start_timestamp_ms: i64
) -> anyhow::Result<usize> {
    anyhow::ensure!(path.as_ref().exists(), "{:?} does not exists", path);
    tracing::debug!(
        "Extracting frames from {:?} for fps={fps} and jpg qscale {:?}.",
        path.as_ref(),
        self.qscale
    );

    let inst = Instant::now();
    let mut cmd = std::process::Command::new(&self.ffmpeg_bin_path);
    cmd.arg("-i").arg(path.as_ref());

    // Extract at a given fps. Thanks <https://askubuntu.com/a/1019417/39035>.
    cmd.arg("-filter:v").arg(format!("fps={fps}"));

    if let Some(qscale) = self.qscale {
        tracing::debug!("Setting qscale to {qscale}");
        cmd.arg("-quality:v");
        cmd.arg(qscale.to_string());
    }

    cmd.arg(self.frame_directory.join("%05d.jpg"));
    let output = cmd.output()?;
    anyhow::ensure!(
        output.status.success(),
        "Command failed\n.{}\n{}",
        String::from_utf8_lossy(&output.stdout),
        String::from_utf8_lossy(&output.stderr)
    );

    tracing::debug!(
        "Extraction to {:?} is complete, took {:?}.",
        &self.frame_directory,
        inst.elapsed()
    );
}

Leave a Reply