🏍️ AKiRa: Augmentation Kit on Rays for optical video generation

1LIX, Ecole Polytechnique, IP Paris, 2Inria, IRISA, CNRS, Univ. Rennes

Abstract

Recent advances in text-conditioned video diffusion have greatly improved video quality. However, these methods offer limited or sometimes no control to users on camera aspects, including dynamic camera motion, zoom, distorted lens and focus shifts. These motion and optical aspects are crucial for adding controllability and cinematic elements to generation frameworks, ultimately resulting in visual content that draws focus, enhances mood, and guides emotions according to filmmakers' controls. In this paper, we aim to close the gap between controllable video generation and camera optics. To achieve this, we propose AKiRa (Augmentation Kit on Rays), a novel augmentation framework that builds and trains a camera adapter with a complex camera model over an existing video generation backbone. It enables fine-tuned control over camera motion as well as complex optical parameters (focal length, distortion, aperture) to achieve cinematic effects such as zoom, fisheye effect, and bokeh. Extensive experiments demonstrate AKiRa's effectiveness in combining and composing camera optics while outperforming all state-of-the-art methods. This work sets a new landmark in controlled and optically enhanced video generation, paving the way for future camera diffusion methods.

Demo Video

(contains voice-over)

AKiRa

Key Contributions

Optical Video Generation Framework: The framework enabling control over both camera motions and optics. This allows for the generation of videos with complex optical effects, such as zoom, fisheye, and focus shifts all together.

Camera Model Representation: Introduces a novel representation that includes optical parameters expressed through a Plücker map, extended with an aperture map to model in- and out-of-focus effects.

Augmentation Kit (AKiRa): A joint camera-frame augmentation tool that models optical effects, facilitating the training of more controllable video generation models.

Illustration of the augmentation

akira_illustration


AKiRa Augmentation Kit: Visualization of various optical augmentations proposed in our system:
zoom, distortion, and bokeh and their impacts on both the camera parameters (top row) and visual output (bottom row).

Various augmentation effects proposed in AKiRa applied on RealEstate10K dataset

Video Results Gallery

Camera Motion

We show our AKiRa achieves camera motion control similar to reference video under the backbone of AnimateDiff.

We show our AKiRa achieves camera motion control similar to reference video under the backbone of SVD.

Focal Length - Zoom

We show our AKiRa achieves smooth and consistent focal length - zoom control under the backbone of AnimateDiff and SVD.

Lens Distortion

We show our AKiRa achieves camera lens distortion control under the backbone of AnimateDiff and SVD.

Bokeh - Aperture

We show that our AKiRa is able to adjust the aperture and focus point to modify the field-of-depth.

Comparison small aperture vs. large aperture

Comparison of videos generated with small and large apertures.

Gradually change the aperture value

We show our AKiRa can enable adjust aperture gradually within one video (we visualise the proposed aperture/bokeh condition map on the righthand side).

Shift the focus point

We show our AKiRa can enable shifting focus point gradually within one video focusing on different depth of content (we visualise the proposed aperture/bokeh condition map on the righthand side).

Dolly Zoom

Dolly zoom is an iconic cinematic effect that produces a dramatic perspective shift, achieved by simultaneously pulling while zooming in reverse direction. We show our AKiRa can successfully reproduces the dolly zoom effect due to the disentangling of key camera parameters, especially focal length and camera translating.

BibTeX


          @article{wang2024akira,
            title={AKiRa: Augmentation Kit on Rays for optical video generation},
            author={Wang, Xi and Courant, Robin and Christie, Marc and Kalogeiton, Vicky},
            journal={arXiv preprint},
            year={2024}
          }