
I'm a PhD candidate at UCL, advised by Ricardo Silva. I care about AI safety because I find myself simultaneously exhilarated and alarmed by the development of broadly capable AI systems. My previous research has ranged from mechanistic interpretability to adversarial robustness. Now, I am interested in the safety of agentic LLM applications, such as computer use.
Reach me at aenguslynch at gmail dot com and @aengus_lynch1.
Research
Best-of-N Jailbreaking (2024)
John Hughes*, Sara Price*, Aengus Lynch*, Rylan Schaeffer, Fazl Barez, Sanmi Koyejo, Henry Sleight, Erik Jones, Ethan Perez, Mrinank Sharma
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs (2024)
Abhay Sheshadri*, Aidan Ewart*, Phillip Guo*, Aengus Lynch*, Cindy Wu*, Vivek Hebbar*, Henry Sleight, Asa Cooper Stickland, Ethan Perez, Dylan Hadfield-Menell, Stephen Casper
Analyzing the generalization and reliability of steering vectors (2024)
Daniel Tan, David Chanin, Aengus Lynch, Adrià Garriga-Alonso, Brooks Paige, Dimitrios Kanoulas, Robert Kirk
Eight methods to evaluate robust unlearning in LLMs (2024)
Aengus Lynch*, Phillip Guo*, Aidan Ewart*, Stephen Casper, Dylan Hadfield-Menell
Towards automated circuit discovery for mechanistic interpretability (2023)
Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, Adrià Garriga-Alonso
Spotlight at NeurIPS 2023
Spawrious: A benchmark for fine control of spurious correlation biases (2023)
Aengus Lynch*, Gbètondji J-S Dovonon*, Jean Kaddour*, Ricardo Silva
Causal machine learning: A survey and open problems (2022)
Jean Kaddour*, Aengus Lynch*, Qi Liu, Matt J. Kusner, Ricardo Silva