Diffusion for World Modeling: Visual Details Matter in Atari

Alonso, Eloi; Jelley, Adam; Micheli, Vincent; Kanervisto, Anssi; Storkey, Amos; Pearce, Tim; Fleuret, François

Computer Science > Machine Learning

arXiv:2405.12399 (cs)

[Submitted on 20 May 2024 (v1), last revised 30 Oct 2024 (this version, v2)]

Title:Diffusion for World Modeling: Visual Details Matter in Atari

Authors:Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, François Fleuret

View PDF HTML (experimental)

Abstract:World models constitute a promising approach for training reinforcement learning agents in a safe and sample-efficient manner. Recent world models predominantly operate on sequences of discrete latent variables to model environment dynamics. However, this compression into a compact discrete representation may ignore visual details that are important for reinforcement learning. Concurrently, diffusion models have become a dominant approach for image generation, challenging well-established methods modeling discrete latents. Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model. We analyze the key design choices that are required to make diffusion suitable for world modeling, and demonstrate how improved visual details can lead to improved agent performance. DIAMOND achieves a mean human normalized score of 1.46 on the competitive Atari 100k benchmark; a new best for agents trained entirely within a world model. We further demonstrate that DIAMOND's diffusion world model can stand alone as an interactive neural game engine by training on static Counter-Strike: Global Offensive gameplay. To foster future research on diffusion for world modeling, we release our code, agents, videos and playable world models at this https URL.

Comments:	NeurIPS 2024 (Spotlight)
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.12399 [cs.LG]
	(or arXiv:2405.12399v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2405.12399

Submission history

From: Adam Jelley [view email]
[v1] Mon, 20 May 2024 22:51:05 UTC (2,822 KB)
[v2] Wed, 30 Oct 2024 14:34:49 UTC (3,054 KB)

Computer Science > Machine Learning

Title:Diffusion for World Modeling: Visual Details Matter in Atari

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Diffusion for World Modeling: Visual Details Matter in Atari

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators