Advanced Search
Zhifeng Xie, Jiaheng Zheng, Ji Wang, Jiajia Liang, Lizhuang Ma. Speech-driven Facial Reenactment based on Implicit Neural Representations with Structured Latent Codes[J]. Journal of Computer-Aided Design & Computer Graphics.
Citation: Zhifeng Xie, Jiaheng Zheng, Ji Wang, Jiajia Liang, Lizhuang Ma. Speech-driven Facial Reenactment based on Implicit Neural Representations with Structured Latent Codes[J]. Journal of Computer-Aided Design & Computer Graphics.

Speech-driven Facial Reenactment based on Implicit Neural Representations with Structured Latent Codes

  • The goal of speech-driven facial reenactment aims to generate high-fidelity facial animation matching with the input speech content. However, existing methods can hardly achieve high-quality facial reenactment because of the gap between audio and video modals. In order to address the problems of existing methods such as low fidelity and poor lip sync effect, we propose a speech-driven facial reenactment method based on implicit neural representations with structured latent codes, which takes the point cloud sequence of human face as the intermediate representation, decomposing the speech-driven facial reenactment into two tasks: cross-modal mapping and neural radiance fields rendering. Firstly, we predict the facial expression coefficients through cross-modal mapping and get the facial identity coefficients by 3D face reconstruction; then, we synthesize face point cloud sequence based on 3DMM; next, we use the position of vertices constructing the structured implicit neural representations and regress density and color for each sampling points; finally, we render RGB frames of human face through volume rendering techniques and assemble them into original image. Experiments results on multiple 3~5min individual speech videos, including visual comparison, quantitative evaluation, and subjective assessment demonstrate that our method achieves better results than state-of-the-art methods such as AD-NeRF in terms of lip-sync accuracy and image generation precision, which can achieve high-fidelity speech-driven facial reenactment.

  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return