Intermediate Frame Estimation

Filming “true” slow motion video is difficult and expensive but it provides us with the unique opportunity to get a new perspective on the world. Due to the inaccessibility of high speed cameras, the technique of slowing down footage shot at a normal frame rate is very appealing. By filming at a normal frame rate of 30 fps or 60 fps then applying smart algorithms to the footage, a very convincing effect can be achieved. This slow-motion effect is achieved by adding additional frames between the actual frames in the video. These new frames must be estimated by the algorithm and created based off of the two reference frames it has to achieve smooth motion. One of NVIDIA’s research teams has achieved this effect while maintaining the high-quality nature of the original footage and they outlined it in a 2018 research paper titled, “Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation” [8]. Their technique utilizes deep learning techniques to train their algorithm to recreate many intermediate frames between real frames.

This software that was developed by NVIDIA takes advantage of optical flow, a technique for estimating an object’s trajectory based on the relationship between the observer and the object first introduced by James Gibson in the 1940s [11]. Optical flow techniques work well for simple videos with little motion but for high speed objects or complex scenes artifacts can be created using this technique. This is where the deep learning aspect of NVIDIA’s research comes into play to reduce the artifacting and allow for many more intermediate frames to be created. NVIDIA had this to say about their technique for minimizing artifacts:

“To address this shortcoming, we employ another UNet to refine the approximated flow and also predict soft visibility maps. Finally, the two input images are warped and linearly fused to form each intermediate frame. By applying the visibility maps to the warped images before fusion, we exclude the contribution of occluded pixels to the interpolated intermediate frame to avoid artifacts. Since none of our learned network parameters are time-dependent, our approach is able to produce as many intermediate frames as needed” [8]

This technique is a huge advantage over normal optical flow as artifacting can be extremely distracting and ruin the shot. At this point it is not known how process intensive this algorithm is as it requires training of the algorithm before it is useful. Implications of this technology can be useful in industries such as live sports as they could harness this power to slow down instant replays instead of relying on large, expensive cameras and tons of high-frame rate video. By implementing this software into live broadcasting software it could make it a very user-friendly experience as you could in theory slow down the footage to any speed you would like regardless of the frame rate that it was shot at. This assumes that the processing can be done on-the-fly.

This algorithm could also be useful in consumer electronics as it could augment phone’s “slow motion” camera modes. Rather than adding a “super slow motion” camera mode such as the 960 fps offered by some phones this feature could take footage filmed at a more manageable 120 or 240 fps and slow it down significantly. This would require much less hardware to be built into the phone and it could save precious storage space. This feature was suggested by another classmate after reviewing our project demo video, and we think that it is a fantastic idea to include in our report.

The Slow-Mo Project

Intermediate Frame Estimation