Peter Anderson

News

NEW! (August 2025) We are hiring a Senior Research Scientist in my team at BAM.

Greenback Bears and Fiscal Hawks: Finance is a Jungle and Text Embeddings Must Adapt

Peter Anderson, Mano Vikash Janardhanan, Jason He, Wei Cheng, Charlie Flanagan

In Conference on Empirical Methods for Natural Language Processing: Industry Track (EMNLP), 2024.

PDF

Prompt expansion for adaptive text-to-image generation

Siddhartha Datta, Alexander Ku, Deepak Ramachandran, Peter Anderson

In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2024

PDF

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-Image Generation

Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang

In International Conference on Learning Representations (ICLR), 2024

PDF Project Code

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

Su Wang*, Chitwan Saharia*, Ceslee Montgomery*, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi†, Peter Anderson†, William Chan†

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. Highlight Presentation [Top 2.5%]

PDF Project EditBench (379MB)

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

Aishwarya Kamath*, Peter Anderson*, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

PDF Video Dataset

Iterative Vision-and-Language Navigation

Jacob Krantz*, Shurjo Banerjee*, Wang Zhu, Jason Corso, Peter Anderson, Stefan Lee, Jesse Thomason

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

PDF Project

Simple and Effective Synthesis of Indoor 3D Scenes

Jing Yu Koh*, Harsh Agrawal*, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

In AAAI Conference on Artificial Intelligence (AAAI), 2023.

PDF Video Code

Less is More: Generating Grounded Navigation Instructions from Landmarks

Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason Baldridge, Peter Anderson

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

PDF Dataset

Pathdreamer: A World Model for Indoor Navigation

Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

In International Conference on Computer Vision (ICCV), 2021.

Blog Project PDF Code Demo Video

PanGEA: The Panoramic Graph Environment Annotation Toolkit

Alex Ku, Peter Anderson, Jordi Pont-Tuset, Jason Baldridge

In 2nd Workshop on Advances in Language and Vision Research (ALVR) at NAACL, 2021.

PDF Code

On the Evaluation of Vision-and-Language Navigation Instructions

Ming Zhao, Peter Anderson, Vihan Jain, Su Wang, Alex Ku, Jason Baldridge, Eugene Ie

In Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021.

PDF Poster Slides

Sim-to-Real Transfer for Vision-and-Language Navigation

Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee

In Conference on Robot Learning (CoRL), 2020.

PDF Code Video

Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding

Alex Ku*, Peter Anderson*, Roma Patel, Eugene Ie, Jason Baldridge

In Conference on Empirical Methods for Natural Language Processing (EMNLP), 2020.

Blog PDF Dataset Leaderboard PanGEA Code

Where Are You? Localization from Embodied Dialog

Meera Hahn, Jacob Krantz, Dhruv Batra, Devi Parikh, James Rehg, Stefan Lee, Peter Anderson

In Conference on Empirical Methods for Natural Language Processing (EMNLP), 2020.

Project PDF Code Video

Spatially Aware Multimodal Transformers for TextVQA

Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal

In Proceedings of the European Conference on Computer Vision (ECCV), 2020.

Project PDF Code Video Slides

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

In Proceedings of the European Conference on Computer Vision (ECCV), 2020. Spotlight Presentation [Top 5%]

PDF Video

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton van den Hengel

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. Oral Presentation [Top 5.7%]

PDF Code Leaderboard

Chasing Ghosts: Instruction Following as Bayesian State Tracking

Peter Anderson*, Ayush Shrivastava*, Devi Parikh, Dhruv Batra, Stefan Lee

In Advances in Neural Information Processing Systems (NeurIPS), 2019.

PDF Video Poster Code

nocaps: novel object captioning at scale

Harsh Agrawal*, Karan Desai*, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

In International Conference on Computer Vision (ICCV), 2019.

Project PDF Code

Audio-Visual Scene-Aware Dialog

Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

Project PDF

Visual Landmark Selection for Generating Grounded and Interpretable Navigation Instructions

Sanyam Agarwal, Devi Parikh, Dhruv Batra, Peter Anderson, Stefan Lee

In CVPR Workshop on Deep Learning for Semantic Visual Navigation, 2019.

PDF

Trainable Decoding of Sets of Sequences for Neural Sequence Models

Ashwin Kalyan, Peter Anderson, Stefan Lee, Dhruv Batra

In International Conference on Machine Learning (ICML), 2019.

PDF Code

Partially-Supervised Image Captioning

Peter Anderson, Stephen Gould, Mark Johnson

In Advances in Neural Information Processing Systems (NeurIPS), 2018.

PDF Poster

Disfluency Detection using Auto-Correlational Neural Networks

Paria Jamshid Lou, Peter Anderson, Mark Johnson

In Conference on Empirical Methods for Natural Language Processing (EMNLP), 2018.

PDF

On Evaluation of Embodied Navigation Agents

Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir

arXiv preprint 1807.06757, 2018.

PDF

Predicting accuracy on large datasets from smaller pilot data

Mark Johnson, Peter Anderson, Mark Dras, Mark Steedman

In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018. Oral Presentation [Top 4.6%]

PDF

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. Spotlight Presentation [Top 8.9%]

Project PDF Code Poster Leaderboard

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. Oral Presentation [Top 2.1%]

Project PDF Features Code Captioning Code Poster Slides

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

Damien Teney, Peter Anderson, Xiaodong He, Anton van den Hengel

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

PDF Slides

Guided Open Vocabulary Image Captioning with Constrained Beam Search

Peter Anderson, Basura Fernando, Mark Johnson and Stephen Gould

In Conference on Empirical Methods for Natural Language Processing (EMNLP), 2017.

PDF

SPICE: Semantic Propositional Image Caption Evaluation

Peter Anderson, Basura Fernando, Mark Johnson and Stephen Gould

In Proceedings of the European Conference on Computer Vision (ECCV), 2016.

Project PDF Code Poster Slides

Discriminative Hierarchical Rank Pooling for Activity Recognition

Basura Fernando, Peter Anderson, Marcus Hutter and Stephen Gould

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

PDF Code

PhD Thesis

Vision and Language Learning: From Image Captioning and Visual Question Answering towards Embodied Agents

Peter Anderson

PhD Thesis, Australian National University, 2018.

Archive PDF

Undergrad

An ICP Inspired Inverse Sensor Model with Unknown Data Association

Peter Anderson, Youssef Hunter and Bernhard Hengst

In IEEE International Conference on Robotics and Automation (ICRA), 2013.

PDF

Fast Monocular Visual Compass for a Computationally Limited Robot

Peter Anderson and Bernhard Hengst

In Proceedings of the RoboCup International Symposium (RoboCup), 2013. Oral Presentation

PDF

Robocup Standard Platform League - rUNSWift 2012 Innovations

Sean Harris, Peter Anderson, Belinda Teh, Youssef Hunter, Roger Liu, Bernhard Hengst, Ritwik Roy, Sam Li, Carl Chatfield

In Australasian Conference on Robotics and Automation (ACRA), 2012.

PDF

Robot Localisation Using Natural Landmarks

Peter Anderson, Yongki Yusmanthia, Bernhard Hengst and Arcot Sowmya

In Proceedings of the RoboCup International Symposium (RoboCup), 2012. Oral Presentation, Best Paper Finalist

PDF

Selected Talks

Synthetic Data for Language-Guided Agents. AI2 Embodied AI Lecture Series invited talk.

Massive Datasets for Language-Guided Navigation Agents and Where to Find Them. CVPR 2021 Embodied AI workshop invited talk.

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments. CVPR 2018 spotlight oral.

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. CVPR 2018 oral presentation.

Vision and language: attention, navigation, and making it work ‘in the wild’. Presented at the VQA Challenge and Visual Dialog Workshop at CVPR 2018.

Bio

I am an AI Researcher specializing in multimodal machine learning. Most of my research has been at the intersection of Natural Language Processing and Computer Vision, developing large-scale generative models that produce language, images, and/or actions. During my career I have moved between the tech (Google, Microsoft) and finance industries (Balyasny, Credit Suisse, Goldman Sachs). My current role developing AI models for investing combines both these aspects of my career. I completed my PhD in Computer Science at Australian National University in 2018 where I was advised by Stephen Gould. Previously I was a sell-side securities analyst with Credit Suisse in Sydney. I have the (fairly rare) distinction of winning two university medals, in Finance (from the University of Sydney) and Computer Engineering (from the University of New South Wales).

2024 -

Head of Research, Applied AI at Balyasny Asset Management

Developing transformative AI models for investing.

2020 - 2023

Senior Research Scientist, Google

Research into multimodal generative AI, from generating synthetic instructions for training robots to text-to-image generation and editing, e.g. Imagen Editor, Pathdreamer.

2018 - 2020

Research Scientist, Georgia Tech

Machine learning research including image captioning, visual question answering (VQA) and language-guided localization and navigation for embodied agents. Collaborating with Dhruv Batra and Devi Parikh.

2015 - 2018

PhD (Computer Science), Australian National University

Machine learning combining computer vision and NLP. Proposed bottom-up and top-down attention for vision-and-language models, the SPICE metric, and Vision-and-Language Navigation (VLN). Affiliated with the Australian Centre for Robotic Vision.

2014 - 2015

Engineer, Sabre Autonomous Solutions

Advancing an autonomous grit-blasting robot from university prototype to commercial product.

2013 - 2014

Flounder, FrameFish

Developing and commercializing virtual try-on technology for glasses and sunglasses.

2009 - 2012

BEng (Computer), University of NSW

1st class honours and university medal.

2005 - 2009

Securities Analyst, Credit Suisse

Sell-side analyst covering Australian financials including commercial banks and investment banks. Top 3 rated team.

2000 - 2004

BComm (Finance & Economics), University of Sydney

1st class honours and university medal.

Other Projects

FrameFish

FrameFish was an eyewear virtual try-on system I developed for ecommerce websites. At the time it was much faster than competing systems, taking around 1 second to generate a virtual try-on image of a person wearing a selected pair of glasses or sunglasses (versus ~10 seconds for other web-based systems in 2013). FrameFish received an Innovate NSW grant and was featured on Sky Business News and in the UNSW Young Entrepreneurs series.

News

Publications [Google Scholar]

Greenback Bears and Fiscal Hawks: Finance is a Jungle and Text Embeddings Must Adapt

Prompt expansion for adaptive text-to-image generation

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-Image Generation

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

Iterative Vision-and-Language Navigation

Simple and Effective Synthesis of Indoor 3D Scenes

Less is More: Generating Grounded Navigation Instructions from Landmarks

Pathdreamer: A World Model for Indoor Navigation

PanGEA: The Panoramic Graph Environment Annotation Toolkit

On the Evaluation of Vision-and-Language Navigation Instructions

Sim-to-Real Transfer for Vision-and-Language Navigation

Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding

Where Are You? Localization from Embodied Dialog

Spatially Aware Multimodal Transformers for TextVQA

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

Chasing Ghosts: Instruction Following as Bayesian State Tracking

nocaps: novel object captioning at scale

Audio-Visual Scene-Aware Dialog

Visual Landmark Selection for Generating Grounded and Interpretable Navigation Instructions

Trainable Decoding of Sets of Sequences for Neural Sequence Models

Partially-Supervised Image Captioning

Disfluency Detection using Auto-Correlational Neural Networks

On Evaluation of Embodied Navigation Agents

Predicting accuracy on large datasets from smaller pilot data

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

Guided Open Vocabulary Image Captioning with Constrained Beam Search

SPICE: Semantic Propositional Image Caption Evaluation

Discriminative Hierarchical Rank Pooling for Activity Recognition

PhD Thesis

Vision and Language Learning: From Image Captioning and Visual Question Answering towards Embodied Agents

Undergrad

An ICP Inspired Inverse Sensor Model with Unknown Data Association

Fast Monocular Visual Compass for a Computationally Limited Robot

Robocup Standard Platform League - rUNSWift 2012 Innovations

Robot Localisation Using Natural Landmarks

Selected Talks

Bio

Head of Research, Applied AI at Balyasny Asset Management

Senior Research Scientist, Google

Research Scientist, Georgia Tech

PhD (Computer Science), Australian National University

Engineer, Sabre Autonomous Solutions

Flounder, FrameFish

BEng (Computer), University of NSW

Securities Analyst, Credit Suisse

BComm (Finance & Economics), University of Sydney

Other Projects

FrameFish