# Benchmark ## 1. Overview This document provides a comprehensive performance benchmark for `VideoDataset`, a high-efficiency video decoding backend. The `VideoDataset` is designed to be used by creating a custom dataset class that inherits from `BaseVideoDataset`, enabling efficient video data decoding. This document presents a comprehensive performance benchmark analyzing this approach across multiple metrics to quantify its characteristics. ## 2. Prerequisites ### 2.1 Benchmark Environment To ensure reproducible and fair results, all tests were conducted in the following fixed environment: | Component | Specification | | :--- | :--- | | **Hardware** | - **CPU:** Intel(R) Xeon(R) Platinum 8468 | | | - **GPU:** NVIDIA H100 SXM5 80GB | | | - **GPU Num** 8 | | **Software** | - **OS:** Ubuntu 24.04.3 LTS | | | - **Python:** 3.12.3 | | | - **PyTorch:** 2.7.0a0+79aa17489c.nv25.4 | | | - **CUDA:** 12.9 | | | - **Driver Version:** 560.35.03 | >Note: The Docker image used for running the benchmark will be released later. ### 2.2 Video Transcoding Preparation Since the H100 GPU cannot decode AV1 videos, all test videos were pre-transcoded to H.265 (HEVC) format using the following command: ```bash ffmpeg -i input.mp4 -r 30 -c:v libx265 -crf 24 -g 8 -keyint_min 8 -sc_threshold 0 -vf "setpts=N/(30*TB)" -bf 0 -c:a copy output.mp4 ``` >Note: The test data required to run the benchmark has been uploaded to Hugging Face: [AgiBotWorldAdmin/videodataset-benchmark](https://huggingface.co/datasets/AgiBotWorldAdmin/videodataset-benchmark/tree/main) ## 3. Benchmark ### 3.1 Metrics - **Video Decoding Throughput:**
This metric measures the decoding capability of `VideoDecoder`, expressed in frames per second (FPS), representing the maximum theoretical throughput achievable by the hardware when isolated from dataset operations. - **Single-GPU Random Access Dataset Throughput:**
This metric evaluates the random access throughput​ of the `BaseVideoDataset` under multi-process loading on a single GPU. It tests how efficiently the dataset infrastructure can serve random samples. - **DataLoader Throughput:**
This measures the efficiency of PyTorch's DataLoader with `BaseVideoDataset` across different `num_workers` configurations on a single GPU. It helps identify the optimal worker count for maximizing data loading performance and reveals bottlenecks in the data loading pipeline. - **Multi-GPU Data Loading Throughput:**
This metric evaluates how the data loading performance scales across multiple GPUs, . It's essential for understanding multi-GPU training efficiency and identifying potential scaling limitations. >Note: Since the video encoding uses a GOP size of 8, the decoder's expected actual decoding workload for each output video frame is equivalent to 4 frames. Therefore, when calculating throughput, the count of effectively decoded frames is multiplied by 4. ### 3.2 Execution ### 3.2.1 Video Decoding Throughput You can ​measure​ the `Video Decoding Throughput` metric by running the `benchmarks/decoder_benchmark.py` file. ```bash python benchmarks/decoder_benchmark.py --video-path AgiBotWorldAdmin/videodataset-benchmark/videos/observation.images.top_head/chunk-000/file-000.mp4 --num-processes 4 ``` #### **Parameters** | Parameter | Value | Description | | :--- | :--- | :--- | | **`--video-path`** | `AgiBotWorldAdmin/videodataset-benchmark/videos/observation.images.top_head/chunk-000/file-000.mp4` | Video file path | | **`--max-steps`** | `1000` | Maximum iteration steps| | **`--warmup-steps`** | `10` | Number of warmup steps before timing| | **`--num-processes`** | `4` | Number of processes| ### 3.2.2 Single-GPU Random Access Dataset Throughput You can ​measure​ this metric by running the `benchmarks/dataset_benchmark.py` file. ```bash python benchmarks/dataset_benchmark.py --repo-id AgiBotWorldAdmin/videodataset-benchmark --num-processes 8 ``` #### **Parameters** | Parameter | Value | Description | | :--- | :--- | :--- | | **`--repo-id`** | `AgiBotWorldAdmin/videodataset-benchmark` | Repo of the dataset | | **`--local-dir`** | `./AgiBotWorldAdmin/videodataset-benchmark` | Local dataset path | | **`--warmup-steps`** | `10` | Number of warmup steps before timing| | **`--max-steps`** | `1000` | Maximum iteration steps| | **`--num-processes`** | `4` | Number of processes| ### 3.2.3 DataLoader Throughput You can ​measure​ this metric by running the `benchmarks/base_video_dataset.py` file. ```bash python benchmarks/base_video_dataset.py --repo-id AgiBotWorldAdmin/videodataset-benchmark --num-workers 8 16 32 ``` #### **Parameters** | Parameter | Value | Description | | :--- | :--- | :--- | | **`--repo-id`** | `AgiBotWorldAdmin/videodataset-benchmark` | Repo of the dataset | | **`--local-dir`** | `./AgiBotWorldAdmin/videodataset-benchmark` | Local dataset path | | **`--num-workers`** | `8` | Number of Data Loading Workers | | **`--batch-size`** | `16` | Batch size for data loading | | **`--warmup-steps`** | `10` | Number of warmup steps before timing| | **`--max-steps`** | `1000` | Maximum iteration steps| | **`--world-size`** | `1` | Total number of processes in distributed training| ### 3.2.4 Multi-GPU Data Loading Throughput You can ​measure​ this metric by running the `benchmarks/base_video_dataset.py` file. ```bash python benchmarks/base_video_dataset.py --repo-id AgiBotWorldAdmin/videodataset-benchmark --num-workers 8 --world-size 2 ``` #### **Parameters** | Parameter | Value | Description | | :--- | :--- | :--- | | **`--repo-id`** | `AgiBotWorldAdmin/videodataset-benchmark` | Repo of the dataset | | **`--local-dir`** | `./AgiBotWorldAdmin/videodataset-benchmark` | Local dataset path | | **`--num-workers`** | `8` | Number of Data Loading Workers | | **`--batch-size`** | `16` | Batch size for data loading | | **`--warmup-steps`** | `10` | Number of warmup steps before timing| | **`--max-steps`** | `1000` | Maximum iteration steps| | **`--world-size`** | `1` | Total number of processes in distributed training| ### 3.3 Results > Note: All the following results were obtained with `MPS` enabled. Ensure `MPS` is enabled before executing the benchmark. #### 3.3.1 Video Decoding Throughput We ran the benchmark with the following parameters: ```bash python benchmarks/decoder_benchmark.py \ --video-path AgiBotWorldAdmin/videodataset-benchmark/videos/observation.images.top_head/chunk-000/file-000.mp4 \ --num-processes 8 \ --warmup-steps 10 \ --max-steps 1000 ``` This table show the results: | num-processes | throughput (FPS) | GPU Video Decoder Utilization | | ---------: | ----------: | ----------: | | 8 | 8249.6676 | >=30% | | 16 | 15285.96 | >=60% | | 32 | 22070.7748 | >=90% | #### 3.3.2 Single-GPU Random Access Dataset Throughput We ran the benchmark with the following parameters: ```bash python benchmarks/dataset_benchmark.py \ --repo-id AgiBotWorldAdmin/videodataset-benchmark \ --num-processes 8 \ --warmup-steps 10 \ --max-steps 1000 ``` This table show the results: | num-processes | throughput (FPS) | GPU Video Decoder Utilization | | ---------: | ----------: | ----------: | | 8 | 8286.304 | >=30% | | 16 | 14999.516 | >=60% | | 32 | 22010.9956 | >=85% | #### 3.3.3 DataLoader Throughput We ran the benchmark with the following parameters: ```bash python benchmarks/base_video_dataset.py \ --repo-id AgiBotWorldAdmin/videodataset-benchmark \ --num-workers 8 \ --batch-size 16 \ --warmup-steps 10 \ --max-steps 1000 \ --world-size 1 ``` This table show the results: | num_workers | throughput (FPS) | GPU Video Decoder Utilization | | ---------: | ----------: | ----------: | | 8 | 8011.246 | >=30% | | 16 | 14798.5004 | >=60% | | 32 | 18447.408 | >=80% | #### 3.3.4 Multi-GPU Data Loading Throughput We ran the benchmark with the following parameters: ```bash python benchmarks/base_video_dataset.py \ --repo-id AgiBotWorldAdmin/videodataset-benchmark \ --num-workers 8 \ --batch-size 16 \ --warmup-steps 10 \ --max-steps 1000 \ --world-size 1 ``` This table show the results: | world-size | Total throughput (FPS) | Single-GPU throughput (FPS) | | ---------: | ----------: | ----------: | | 1 | 8004.196 | 8004.196 | | 2 | 14232.9596 | 7116.4796 | | 4 | 25621.792 | 6405.448 | | 8 | 42172.896 | 5271.612 |