DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization

Ablation Study

Sampling Steps

We investigate the impact of varying the number of sampling steps during distillation, specifically testing 1, 2 and 4 steps with variational score distillation (VSD) method, in comparison with 50-step DDIM sampling for teacher model. The sampled videos are shown below. Models with more inference steps tend to perform better.

A slow cinematic push in on an ostrich standing in a 1980s kitchen.

An astronaut running through an alley in Rio de Janeiro.

FPV moving through a forest to an abandoned house to ocean waves.

An older man playing piano, lit from the side.

A middle-aged sad bald man becomes happy as a wig of curly hair and sunglasses fall suddenly on his head.