Tencent’s Voyager can simulate camera motion in 3D scenes without traditional modeling pipeline

Tencent’s HunyuanWorld-Voyager can generate a spatially consistent 3D scene from a single photo, without relying on traditional 3D modeling pipelines. The system combines RGB and depth data with a memory-efficient “world cache” to produce video sequences that reflect user-defined camera movement.

With Voyager, users upload a photo and specify a camera path through the scene. Voyager then generates a continuous video simulating the camera’s motion, aiming to simplify the creation of virtual 3D environments without extensive modeling or technical setup.

At the core is joint RGB and depth (RGB-D) video generation. The depth information helps Voyager estimate distances in the scene and avoid common errors when objects are viewed from unusual angles.

Video: Tencent

Memory for 3D worlds

Voyager’s “world cache” saves previously seen and generated regions of the scene, updating as the camera moves. When hidden parts of the environment reappear, the system restores them from the cache. Redundant data is removed to optimize memory, keeping long camera paths stable and geometrically consistent.

Tencent trained Voyager on a large dataset of real videos and Unreal Engine scenes, each labeled with estimated camera poses and metric depth. This approach helped the model learn how cameras move through real spaces and how objects look from different angles.

Benchmark performance and direct 3D output

Tencent says Voyager achieved high scores in multiple categories on the WorldScore benchmark, including camera control and spatial consistency. A practical benefit of generating RGB and depth video together is that the system can output direct 3D reconstructions – like point clouds or Gaussian proxies – with less need for post-processing.

Tencent reports that Voyager can also extract 3D objects from single images, analyze depth in video, and transfer styles while preserving geometric structure. The code and inference weights are publicly available. Tencent lists 60 GB GPU RAM as the minimum for 540p output.

Building on HunyuanWorld 1.0

Voyager is designed to complement HunyuanWorld 1.0. HunyuanWorld 1.0 focused on semantic, layered 3D mesh representations with mesh export and interactivity, but faced challenges with exploration range and occluded areas. Voyager addresses these with RGB-depth coupling and the world cache, making longer, more consistent camera paths possible. The two systems are meant to work together: HunyuanWorld 1.0 is best for exporting meshes, while Voyager focuses on stable video and 3D scene generation. HunyuanWorld 1.0 has been available in a “Lite” version since August, Voyager is now being released.

Recommendation

OpenAI's new 'o1' model thinks longer to give smarter answers

Competing systems target different use cases

Other systems take different approaches. Google’s Genie 3 targets interactive worlds where users trigger “world events” via text. Google says scene consistency lasts a few minutes, but access is currently limited to a research preview.

Mirage 2 from Dynamics Lab also offers browser-based interactive demos with keyboard and text input. While these systems focus on live gameplay, interactivity, and robot training, Voyager is aimed at video production and 3D content pipelines.

Tencent’s Voyager can simulate camera motion in 3D scenes without traditional modeling pipeline

Memory for 3D worlds

Benchmark performance and direct 3D output

Building on HunyuanWorld 1.0

Competing systems target different use cases

Continue Reading

More posts

Global Down Syndrome Foundation Invests $1.3 Million,

Percussionist Walfredo de los Reyes Sr. dies at 92

Antioxidant, Neuroprotective, and Anti-Cancer Insights

MIT Scientists May Have Finally Solved the Moon’s Magnetic Mystery