Magma: A foundation model for multimodal AI agents
Episode 5
Duration:

Talk
Speakers:

Talk
Details
This talk introduced Magma, a new multimodal agentic foundation model designed for UI navigation in digital environments and robotics manipulation in physical settings. It covers two new techniques, Set-of-Mark and Trace-of-Mark, for action grounding and planning, and details the unified pretraining pipeline that learns agentic capabilities.
Speakers (1)
Session Code: E5E