MIT Mechanistic Interpretability Conference 2023

Schedule

Saturday May 6

0800: Breakfast
0900: Overview, LLM MI

Max Tegmark (MIT): Welcome (video)
Dylan Hadfield-Menell: interpretability overview (video)
Chris Olah: The SOTA of LLM Mechanistic Interpretability (video)

Coffee break

Mor Geva (Google Deepmind): interpreting LLM’s in embedding space (video)
David Bau (Northeastern): how LLMs remember facts (video)

Panel with Chris Olah, David Bau & Mor Geva, moderated by Neel Nanda: “What do & don’t we understand about LLMs?”
Lightning intros
1200: Lunch
1300: More LLM MI

Jacob Andreas/Evan Hernandez: How LLM’s model people’s beliefs (video)
Ekin Akyürek: How LLM’s can do linear regression at runtime (video)
Eric Michaud (MIT): Understanding LLM scaling in terms of computational quanta (video)
János Kramar: compiling any algorithm into a transformer (video)

1420: Group photo
1430: Poster Session
1530: MI beyond LLM

Tony Wang (MIT): how a human beat AlphaGo (video)
Ellie Pavlick (Brown): Neural network subroutines (video)
Ziming Liu (MIT): MI of knowledge representations, symmetry & modularity (video)
Sharon Li (Wisconsin): How unique are knowledge representations? (video)
Buck Schlegeris (Redwood): Formalism for thinking about MI (video)
Martin Wattenberg: Learned world models and what they’re good for (video)

Panel with Ila Fiete (MIT), Tommy Poggio (MIT), Gabriel Kreiman (Harvard): MI inspiration from neuroscience, physics & math (video)
1800-2100: Dinner Cruise, scintillating conversation

Sunday May 7

0800: Breakfast
0900: Morning session: MI for AI safety

Panel with Viktoriya Krakovna (Google Deepmind/FLI), Connor Leahy (Conjecture), Sharon Li (Wisconsin), Anthony Aguirre (FLI): AGI Safety (video)
Neel Nanda (Google Deepmind): How MI can help AI safety (video)
Connor Leahy (Conjecture): MI for AGI safety (video)

Coffee break

Steve Omohundro: Provably safe AGI (video)
Silviu Marian Udrescu (MIT): Symbolic regression (video)
Marin Soljacic (MIT): Symbolic regression & applications

1200: Lunch
1300: Lightning talks (video)
1400-1800: Project incubation

Neel Nanda II: Whirlwind Tour of MI open problems (video)
Panel with Neel Nanda (Google Deepmind), Steve Omohundro & Martin Wattenberg (MIT), moderated by Chris Olah (Anthropic): promising MI research directions (video)
All group leaders looking for collaborators stand up & introduce themselves, lightning style

Coffee break
1515: Project incubator unconference, block I: Break out across different tables in atrium with one MI research direction per table. In parallel, Wes Gurnee & Neel Nanda MIT run MI tutorial hackathon in Singleton Auditorium for whoever wants to get their feet wet.
1615: Project incubator unconference, block II
1715: Report-back from breakouts, closing remarks (video)
1800: Conference dinner, mingling, scintillating conversation

Photos

Panel shot of Chris Olah, Mor Geva, etc.

Link to full size

Link to full size

Link to full size

Link to full size

Link to full size

Link to full size

Link to full size

Link to full size