Research Scientist, World Models at Google deepmind

Snapshot

Join an ambitious team building generative world models. We believe World Models will power numerous domains, such as visual reasoning, simulation, planning for embodied agents, and real-time interactive entertainment. The team will contribute directly to building the next generation of generative world models, working directly with the Genie team and closely with others such as Robotics.

About Us

Artificial Intelligence could be one of humanity’s most useful inventions. At Google DeepMind, we’re a team of scientists, engineers, machine learning experts and more, working together to advance the state of the art in artificial intelligence. We use our technologies for widespread public benefit and scientific discovery, and collaborate with others on critical challenges, ensuring safety and ethics are the highest priority.

The Role

Implement core infrastructure and conduct research to build generative models of the physical world. Solve essential problems to train world simulators at massive scale, develop metrics and scaling laws for physical intelligence, curate and annotate training data, enable real-time interactive generation, and explore new possibilities for impact with the next generation of models. Embrace the bitter lesson and seek simple methods that survive the test of scale, with emphasis on strong systems and infrastructure.

Areas of focus:

Infrastructure for large-scale video data pipelines and annotation.
Inference optimization and distillation for real-time generation.
Scaling law science for video pretraining.
Next generation forms of interactivity.
Methods for long term memory in world models.
Model research that unlocks additional scaling and improved capabilities.

About You

In order to set you up for success as a Research Scientist at Google DeepMind, we look for the following skills and experience:

Experience with large-scale transformer models and/or large-scale data pipelines.
PhD in computer science or machine learning, or equivalent industry experience.
Track record of releases, publications, and/or open source projects relating to video generation, world models, multimodal language models, or transformer architectures.
Strong systems and engineering skills in deep learning frameworks like JAX or PyTorch.

In addition, the following would be an advantage:

Experience building training codebases for large-scale video or multimodal transformers.
Expertise optimizing efficiency of distributed training systems and/or inference systems.
Experience with distillation of diffusion models.

Competitive salary applies.