# Quickstart ## Prerequisites - NVIDIA GPU with CUDA support and CUDA Toolkit 12.0+ installed - FFmpeg installed - Python 3.10 or later ## Installation ### Install from PyPI ```bash pip install agibot-videodataset ``` ### Building from Source ```bash pip install git+https://github.com/AgiBot-World/VideoDataset.git ``` > Note: If there is no available network to access to github, please add a proxy mirror to the environment variable `GITHUB_PROXY`. ## Data Preparation There are no specific requirements for the data organization format and it can follow the LeRobotDataset format or any other custom structure. ### Video Transcoding To achieve high-performance decoding and precise frame-seeking, the videos must be transcoded. Here is an example of video transcoding using FFmpeg: ```bash ffmpeg -i input.mp4 -r 30 -c:v libx265 -crf 24 -g 8 -keyint_min 8 -sc_threshold 0 -vf "setpts=N/(30*TB)" -bf 0 -c:a copy output.mp4 ``` #### Key parameter explanation - `-g 8​:` Sets the keyframe (I-frame) interval to 8 frames - `-sc_threshold 0​​:` Disables automatic keyframe insertion at scene changes - `-vf "setpts=N/(30*TB)"​:` Synchronize the video to a 30 fps timeline - `-bf 0​:` Sets the number of bidirectional frames (B-frames) to 0 - `-c:v libx265​:` Selects the H.265/HEVC video encoder (libx265) for compression > Note: Since `BaseVideoDataset` uses the Nvidia Codec SDK for decoding, it is essential to ensure that the selected video codec is supported by the GPU on the machine. For specific details, please refer to the official NVIDIA documentation: [Video Encode and Decode Support Matrix](https://developer.nvidia.com/video-encode-decode-support-matrix) ## Quickstart with VideoDataset ### Creating Custom Dataset with BaseVideoDataset To get started quickly, first install the package. Then, you can utilize the `BaseVideoDataset` mixin class for `torch.utils.data.Dataset` to handle your custom data, as long as the __getitem__ method can correctly determine which videos and which frames to parse. The following example demonstrates this using the LeRobotDataset format: ```python import argparse import json import logging from huggingface_hub import snapshot_download from pathlib import Path from torch.utils.data import DataLoader, Dataset from videodataset.dataset import BaseVideoDataset logging.basicConfig( level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s" ) logger = logging.getLogger(__name__) class MyDataset(Dataset, BaseVideoDataset): def __init__( self, root: Path, ): Dataset.__init__(self) BaseVideoDataset.__init__(self) self.root = Path(root) meta_file = self.root / "meta" / "info.json" with meta_file.open() as f: self.meta = json.load(f) self.total_frames = self.meta.get("total_frames", 0) features = self.meta.get("features").keys() self.video_keys = [ key for key in features if key.startswith("observation.images") ] def __len__(self): return self.total_frames def __getitem__(self, idx) -> dict: data = {} for video_key in self.video_keys: # Key Point 1: Initialize the decoder, specifying an efficient video codec (e.g., HEVC) decoder = self.get_decoder(decoder_key=video_key, codec="hevc") video_path = self.root / "videos" / video_key / "chunk-000" / "file-000.mp4" # Key Point 2: Decode the specified frame frame = self.decode_video_frame( decoder=decoder, video_path=video_path, frame_idx=idx ) data[video_key] = frame return data def download_dataset(repo_id: str, local_dir: Path): snapshot_download( repo_id, repo_type="dataset", local_dir=local_dir, ) def main(repo_id: str, local_dir: Path, batch_size: int, num_workers: int): if repo_id: download_dataset(repo_id, local_dir) dataset = MyDataset(root=local_dir) # Key Point 3: Using 'multiprocessing_context="spawn"' when num_workers > 0 dataloader = DataLoader(dataset, batch_size=batch_size, num_workers=num_workers, multiprocessing_context="spawn", ) for epoch in range(2): for batch_idx, batch_data in enumerate(dataloader): logger.info(f"Epoch {epoch} Batch {batch_idx}: {batch_data}") if __name__ == "__main__": parser = argparse.ArgumentParser(description="BaseVideoDataset Example") parser.add_argument("--repo-id", type=str, default="AgiBotWorldAdmin/videodataset-benchmark", help="repo of the dataset") parser.add_argument("--local-dir", type=str, default="./AgiBotWorldAdmin/videodataset-benchmark", help="path to the dataset") parser.add_argument("--batch-size", type=int, default=4, help="Batch size for data loading") parser.add_argument("--num-workers", type=int, default=4, help="Number of Data Loading Workers",) args = parser.parse_args() main(**vars(args)) ``` For more examples, see the [tests directory](https://github.com/AgiBot-World/VideoDataset/tree/main/tests).