AGORA
Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars

1MBZUAI, Abu Dhabi, UAE, 2Polynome AI, Dubai, UAE, 3MTS AI, Moscow, Russia

Abstract

The generation of high-fidelity, animatable 3D human avatars remains a core challenge in computer graphics and vision, with applications in VR, telepresence, and entertainment. Existing approaches based on implicit representations like NeRFs suffer from slow rendering and dynamic inconsistencies, while 3D Gaussian Splatting (3DGS) methods are typically limited to static head generation, lacking dynamic control.

We bridge this gap by introducing AGORA, a novel framework that extends 3DGS within a generative adversarial network to produce animatable avatars. Our key contribution is a lightweight, FLAME-conditioned deformation branch that predicts per-Gaussian residuals, enabling identity-preserving, fine-grained expression control while allowing real-time inference. Expression fidelity is enforced via a dual-discriminator training scheme leveraging synthetic renderings of the parametric mesh.

AGORA generates avatars that are not only visually realistic but also precisely controllable. Quantitatively, we outperform state-of-the-art NeRF-based methods on expression accuracy while rendering at 250+ FPS on a single GPU, and, notably, at ~9 FPS under CPU-only inference – representing, to our knowledge, the first demonstration of practical CPU-only animatable 3DGS avatar synthesis. This work represents a significant step toward practical, high-performance digital humans.

Method Overview

AGORA adopts a 3D GAN framework to learn an animatable 3D head generation model from 2D image datasets. The architecture consists of two main components:

Dual-Branch Architecture

  • Identity-specific branch: A StyleGAN-based generator G produces canonical 3DGS attributes (position, scale, rotation, color, opacity) in UV space, conditioned on the FLAME shape code β. This encodes the static identity of the avatar.
  • Expression-specific branch: A lightweight deformation branch Gd takes low-resolution feature maps from the main generator and is conditioned on FLAME expression ψ and jaw pose θ via style modulation. It predicts attribute residuals that are composed with the canonical attributes.

The final 3D position of each Gaussian is obtained via 3D lifting: we interpolate a base position from the articulated FLAME mesh at UV coordinates and add the predicted offset. This design anchors the generated 3DGS to the underlying parametric mesh, providing a structured basis for animation.

Dual-Discrimination

To enforce expression consistency, we adopt a dual-discrimination scheme. The discriminator is conditioned on the target expression by concatenating the rendered image with a synthetic rendering of the FLAME mesh, where vertices are color-coded by their expression-isolated vertex displacement from the neutral pose. This allows the discriminator to penalize fine-grained deviations in expression.

Generated Avatars

Generated avatars (seeds 0-32) reenacted by the driving video on the left.

Single Image Avatars

Avatars generated from single images, driven by the video on the left.

BibTeX

@article{fazylov2025agora,
    author = {Fazylov, Ramazan and Zagoruyko, Sergey and Parkin, Aleksandr and Lefkimmiatis, Stamatis and Laptev, Ivan},
    title = {{AGORA: Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars}},
    journal = {arXiv preprint arXiv: NOT YET ON ARXIV, ETA FEW DAYS},
    year = {2025}
}