AGORA: Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars

AGORA
Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars

¹MBZUAI, Abu Dhabi, UAE, ²Polynome AI, Dubai, UAE, ³MTS AI, Moscow, Russia

Abstract

The generation of high-fidelity, animatable 3D human avatars remains a core challenge in computer graphics and vision, with applications in VR, telepresence, and entertainment. Existing approaches based on implicit representations like NeRFs suffer from slow rendering and dynamic inconsistencies, while 3D Gaussian Splatting (3DGS) methods are typically limited to static head generation, lacking dynamic control.

We bridge this gap by introducing AGORA, a novel framework that extends 3DGS within a generative adversarial network to produce animatable avatars. Our key contribution is a lightweight, FLAME-conditioned deformation branch that predicts per-Gaussian residuals, enabling identity-preserving, fine-grained expression control while allowing real-time inference. Expression fidelity is enforced via a dual-discriminator training scheme leveraging synthetic renderings of the parametric mesh.

AGORA generates avatars that are not only visually realistic but also precisely controllable. Quantitatively, we outperform state-of-the-art NeRF-based methods on expression accuracy while rendering at 250+ FPS on a single GPU, and, notably, at ~9 FPS under CPU-only inference – representing, to our knowledge, the first demonstration of practical CPU-only animatable 3DGS avatar synthesis. This work represents a significant step toward practical, high-performance digital humans.

Method Overview

AGORA adopts a 3D GAN framework to learn an animatable 3D head generation model from 2D image datasets. The architecture consists of two main components:

Dual-Branch Architecture

Identity-specific branch: A StyleGAN-based generator G produces canonical 3DGS attributes (position, scale, rotation, color, opacity) in UV space, conditioned on the FLAME shape code β. This encodes the static identity of the avatar.
Expression-specific branch: A lightweight deformation branch G_d takes low-resolution feature maps from the main generator and is conditioned on FLAME expression ψ and jaw pose θ via style modulation. It predicts attribute residuals that are composed with the canonical attributes.

The final 3D position of each Gaussian is obtained via 3D lifting: we interpolate a base position from the articulated FLAME mesh at UV coordinates and add the predicted offset. This design anchors the generated 3DGS to the underlying parametric mesh, providing a structured basis for animation.

Dual-Discrimination

To enforce expression consistency, we adopt a dual-discrimination scheme. The discriminator is conditioned on the target expression by concatenating the rendered image with a synthetic rendering of the FLAME mesh, where vertices are color-coded by their expression-isolated vertex displacement from the neutral pose. This allows the discriminator to penalize fine-grained deviations in expression.

AGORA
Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars

Abstract

Method Overview

Dual-Branch Architecture

Dual-Discrimination

Generated Avatars

Single Image Avatars

More In-the-Wild Avatar Results

Acknowledgements

BibTeX

AGORA Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars

Abstract

Method Overview

Dual-Branch Architecture

Dual-Discrimination

Generated Avatars

Single Image Avatars

More In-the-Wild Avatar Results

Acknowledgements

BibTeX

AGORA
Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars