It's me!

I'm a PhD candidate at UCL, advised by Ricardo Silva. I care about AI safety because I find myself simultaneously exhilarated and alarmed by the development of broadly capable AI systems. My previous research has ranged from mechanistic interpretability to adversarial robustness. Now, I am interested in the safety of agentic LLM applications, such as computer use.

Reach me at aenguslynch at gmail dot com and @aengus_lynch1.

Research

Best-of-N Jailbreaking (2024)

John Hughes*, Sara Price*, Aengus Lynch*, Rylan Schaeffer, Fazl Barez, Sanmi Koyejo, Henry Sleight, Erik Jones, Ethan Perez, Mrinank Sharma

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs (2024)

Abhay Sheshadri*, Aidan Ewart*, Phillip Guo*, Aengus Lynch*, Cindy Wu*, Vivek Hebbar*, Henry Sleight, Asa Cooper Stickland, Ethan Perez, Dylan Hadfield-Menell, Stephen Casper

Analyzing the generalization and reliability of steering vectors (2024)

Daniel Tan, David Chanin, Aengus Lynch, Adrià Garriga-Alonso, Brooks Paige, Dimitrios Kanoulas, Robert Kirk

Eight methods to evaluate robust unlearning in LLMs (2024)

Aengus Lynch*, Phillip Guo*, Aidan Ewart*, Stephen Casper, Dylan Hadfield-Menell

Towards automated circuit discovery for mechanistic interpretability (2023)

Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, Adrià Garriga-Alonso

Spotlight at NeurIPS 2023

Spawrious: A benchmark for fine control of spurious correlation biases (2023)

Aengus Lynch*, Gbètondji J-S Dovonon*, Jean Kaddour*, Ricardo Silva

Causal machine learning: A survey and open problems (2022)

Jean Kaddour*, Aengus Lynch*, Qi Liu, Matt J. Kusner, Ricardo Silva