Apple researchers develop NeuMan: a new computer vision framework capable of generating a neural human radiation field from a single video

Neural Radiation Fields (NeRF) were first developed, greatly improving the quality of new vision synthesis. It was first suggested as a means of reconstructing a static image using a series of posed photographs. However, it was quickly extended to include dynamic and uncalibrated scenarios. With the help of large controlled data sets, recent work is further focusing on animating these human radiation field models, thereby expanding the scope of radiation field-based modeling to provide augmented reality experiences. In this study, they focus on the case where only one video is given. They aim to reconstruct the human and static scene models and allow unique posture rendering of the person without the need for expensive multi-camera setups or manual annotations.

Neural Actor can create inventive human poses, but it takes several movies. Even with the most recent improvements in NeRF techniques, this is far from a simple task. NeRF models must be trained using many cameras, constant lighting and exposure, transparent backgrounds, and precise human geometry. According to the table below, HyperNeRF cannot be controlled by human postures but instead creates a dynamic scene based on a single video. ST-NeRF uses many cameras to reconstruct each person using a time-dependent NeRF model, although editing is only done to modify the bounding box. HumanNeRF creates a human model from a single video with carefully annotated masks; however, it does not demonstrate generalization to new postures.

Source: https://arxiv.org/pdf/2203.12575v1.pdf

With a model trained on a single video, Vid2Actor can produce new human poses, but it cannot model the environment. They solve these problems by offering NeuMan, a system that can create unique human positions and new points of view while reconstructing the person and the scene from a single video in nature. The high-quality pose-based rendering of Figure 1 is made possible by NeuMan, a state-of-the-art framework for training human and stage NeRF models. They first estimate camera poses, sparse scene model, depth maps, human position, human form, and human masks from moving camera video.

Then two NeRF models are trained, one for the subject and one for the scene, both aided by the segmentation masks computed by Mask-RCNN. Additionally, they use depth estimates from multi-view reconstruction and monocular depth regression to regularize the NeRF model of the scene. They train the human NeRF model in a posture-independent canonical volume using a statistical human shape and pose (SMPL) model. In order to better serve the training, they modify the SMPL estimates of the ROMP. These improved estimates aren’t flawless either. Accordingly, they jointly optimize the SMPL estimates and the human NeRF model from start to finish.

Additionally, they build an error-correction network to counter it, since their static canonical human NeRF cannot reflect dynamics that are not included in the SMPL model. During training, the SMPL estimates and the error correction network are tuned simultaneously. In conclusion, they propose a framework for neural rendering of a human and a scene from a single video without any additional device or annotation; They demonstrate that their method enables high-quality rendering of the human in new poses and viewpoints with the scene; They introduce end-to-end SMPL optimization and an error-correction network to enable training with inaccurate estimates of human geometry; and finally, their method allows the composition of the human and the scene.

The code implementation of this research paper is freely available on Apple’s GitHub.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'NeuMan: Neural Human Radiance Field from a Single Video'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and github link.

Please Don't Forget To Join Our ML Subreddit


Consultant intern in content writing at Marktechpost.


Comments are closed.