Loading...

Magma: A foundation model for multimodal AI agents

Episode 5

Duration:

Talk

Talk

Speakers:
Share to:
Talk

Talk

Details

This talk introduced Magma, a new multimodal agentic foundation model designed for UI navigation in digital environments and robotics manipulation in physical settings. It covers two new techniques, Set-of-Mark and Trace-of-Mark, for action grounding and planning, and details the unified pretraining pipeline that learns agentic capabilities.

Speakers (1)
 Jianwei  Yang
Principal Researcher Microsoft Research Redmond

Session Code: E5E