The Exceptional Trajectories dataset (E.T.)
E.T. key properties
Cinematic Content: Realistic and cinematic camera trajectories extracted from real-world movies, offering a diverse range of visual styles.
Scale: Comprises 115K samples, 11M frames, and 120 hours of footage across 16,210 scenes, making it one of the largest datasets of its kind.
Controllability: Includes camera and character trajectories, as well as camera-only and camera-character captions, providing users with flexibility and personalized search capabilities.
Dataset creation pipeline
Data Extraction and Pre-processing: SLAHMR is used to extract 3D camera and character poses from each shot, followed by pre-processing steps like alignment, filtering, smoothing, and cropping.
Motion Tagging: Trajectories are partitioned into segments with pure camera motions, including static, lateral, vertical, and depth movements. A thresholding-based method is applied to velocity to identify the nature of the motion, resulting in coarse motion tags.
Caption Generation: Rich textual descriptions of camera trajectories are generated by prompting an LLM, to reference the main character's trajectory as anchor points. The prompt includes context, instruction, constraint, and examples.