Welcome to VideoDataset’s documentation

A GPU-accelerated library that enables random frame access and efficient video decoding for data loading.

Documentation License SS Badge

CI CD CommitLint Renovate Semantic Release Coverage

Release PyPI PyPI - Python Version GitHub

pre-commit Checked with mypy Ruff Conventional Commits Copier Serious Scaffold Python

[!WARNING] VideoDataset is in the Alpha phase. Frequent changes and instability should be anticipated. Any feedback, comments, suggestions and contributions are welcome!

Overview

VideoDataset is a high-performance video decoding multi-framework supporting library. It aims to provide framework-integrated solutions for working with video decoding tasks.

Key Features:

  • GPU-accelerated video decoding using NvCodec library

  • Support for common video formats (H.264, H.265, etc.)

  • Easy integration with multi-frameworks and multi-formats.

Installation

Prerequisites

  • NVIDIA GPU with CUDA support and CUDA Toolkit 12.0+ installed

  • Python 3.10 or later

Install from PyPI

pip install agibot-videodataset

Building from Source

pip install git+https://github.com/AgiBot-World/VideoDataset.git

Quick Start

The complete example can be found in the quickstart documentation.

Documentation

Please refer to full documentation here.

Also, a sphinx-based documentation can be generated by running the following command:

make dev-doc doc

It will generate the documentation in the docs/_build/html directory and serve it on http://localhost:8000.

Performance

VideoDataset is optimized for high-throughput video processing. Benchmark results show:

  • GPU Decoding: A decoding throughput of 20,000 FPS is achieved in a multiprocessing scenario.

  • Random Access: Minimal overhead for non-sequential frame access.

  • GPU Decoder Utilization: Over 90% GPU decoder utilization is achieved in a multiprocessing scenario.

See the benchmark documentation for detailed performance analysis.

Comparison with other CPU decoding solutions

In addition​, we conducted a comprehensive benchmark comparing it against mainstream CPU software decoding solutions, including OpenCV, Torchvision (PyAV), Torchvision (VideoReader), and TorchCodec (CPU).The results demonstrate that VideoDataset achieves a 3 to 4 times improvement in decoding throughput.

CPU Throughput

Furthermore, it also demonstrates outstanding performance in reducing CPU utilization.

CPU Utilization

Development Status

  • [X] GPU acceleration via NvCodec

  • [X] Random frame access

  • [X] PyTorch integration

  • [ ] Compatibility with LeRobot

  • [ ] Asynchronous pipeline optimization

License

MIT License, for more details, see the LICENSE file.

Content

API docs

Indices and tables