SynGauss: Real-Time 3D Gaussian Splatting for Audio-Driven Talking Head Synthesis

Citation Author(s):
Zhanyi
Zhou
Submitted by:
Zhanyi Zhou
Last updated:
Wed, 01/15/2025 - 08:54
DOI:
10.21227/ap5w-4059
Data Format:
License:
18 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

e used a mixed dataset\cite{ye2023geneface}in our experiments, where part of the data was referenced from the publicly available dataset provided by GaussianTalking\cite{li2025talkinggaussian}, and additional data was collected by ourselves. Specifically, we selected four high-definition talking video clips from the publicly available dataset, including two male portraits, "Macron" and "Obama" and one female portrait, "May". These video clips are centered on the subject, with an average length of 6500 frames and a frame rate of 25 FPS. Among them, the videos for "May" and "Macron" were cropped and resized to $512\times512$ resolution, while the video for "Obama" was resized to $450\times450$ resolution to ensure consistency and compatibility with the model's input requirements.

In addition, we collected two high-definition video clips, featuring one male portrait, "Kanghui" and one female portrait, "Lizimeng". These video clips are recorded at 25 FPS with a duration of 5 minutes and were cropped and resized to $512\times512$ resolution. By introducing our self-collected dataset, we increased the diversity of the data in the experiments, covering different genders and speaking styles. This combination of publicly available and self-collected datasets not only expands the scale of the experimental data but also improves the comprehensiveness and adaptability of the model evaluation.

Instructions: 

e used a mixed dataset\cite{ye2023geneface}in our experiments, where part of the data was referenced from the publicly available dataset provided by GaussianTalking\cite{li2025talkinggaussian}, and additional data was collected by ourselves. Specifically, we selected four high-definition talking video clips from the publicly available dataset, including two male portraits, "Macron" and "Obama" and one female portrait, "May". These video clips are centered on the subject, with an average length of 6500 frames and a frame rate of 25 FPS. Among them, the videos for "May" and "Macron" were cropped and resized to $512\times512$ resolution, while the video for "Obama" was resized to $450\times450$ resolution to ensure consistency and compatibility with the model's input requirements.

In addition, we collected two high-definition video clips, featuring one male portrait, "Kanghui" and one female portrait, "Lizimeng". These video clips are recorded at 25 FPS with a duration of 5 minutes and were cropped and resized to $512\times512$ resolution. By introducing our self-collected dataset, we increased the diversity of the data in the experiments, covering different genders and speaking styles. This combination of publicly available and self-collected datasets not only expands the scale of the experimental data but also improves the comprehensiveness and adaptability of the model evaluation.

Dataset Files

    Files have not been uploaded for this dataset