News

NEW! (March 2024) Super excited to be starting a new role developing AI for investing at Balyasny Asset Management!

NEW! (October 2023) Congrats Jaemin on releasing an exciting new T2I evaluation!

NEW! (August 2023) After an amazing journey at Google, I am leaving to take a creative pause.

NEW! (March 2023) Our Imagen Editor/EditBench paper is selected as a CVPR highlight!

NEW! (February 2023) Three papers accepted to CVPR 2023.

NEW! (February 2023) Serving as Area Chair for ICML 2023.

More

Publications [Google Scholar]

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-Image Generation

Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang

In International Conference on Learning Representations (ICLR), 2024

PDF Project Code

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

Su Wang*, Chitwan Saharia*, Ceslee Montgomery*, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi†, Peter Anderson†, William Chan†

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. Highlight Presentation [Top 2.5%]

PDF Project EditBench (379MB)

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

Aishwarya Kamath*, Peter Anderson*, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

PDF Video Dataset

Iterative Vision-and-Language Navigation

Jacob Krantz*, Shurjo Banerjee*, Wang Zhu, Jason Corso, Peter Anderson, Stefan Lee, Jesse Thomason

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

PDF Project

Simple and Effective Synthesis of Indoor 3D Scenes

Jing Yu Koh*, Harsh Agrawal*, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

In AAAI Conference on Artificial Intelligence (AAAI), 2023.

PDF Video Code

Less is More: Generating Grounded Navigation Instructions from Landmarks

Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason Baldridge, Peter Anderson

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

PDF Dataset

Pathdreamer: A World Model for Indoor Navigation

Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

In International Conference on Computer Vision (ICCV), 2021.

Blog Project PDF Code Demo Video

PanGEA: The Panoramic Graph Environment Annotation Toolkit

Alex Ku, Peter Anderson, Jordi Pont-Tuset, Jason Baldridge

In 2nd Workshop on Advances in Language and Vision Research (ALVR) at NAACL, 2021.

PDF Code

On the Evaluation of Vision-and-Language Navigation Instructions

Ming Zhao, Peter Anderson, Vihan Jain, Su Wang, Alex Ku, Jason Baldridge, Eugene Ie

In Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021.

PDF Poster Slides

Sim-to-Real Transfer for Vision-and-Language Navigation

Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee

In Conference on Robot Learning (CoRL), 2020.

PDF Code Video

Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding

Alex Ku*, Peter Anderson*, Roma Patel, Eugene Ie, Jason Baldridge

In Conference on Empirical Methods for Natural Language Processing (EMNLP), 2020.

Blog PDF Dataset Leaderboard PanGEA Code

Where Are You? Localization from Embodied Dialog

Meera Hahn, Jacob Krantz, Dhruv Batra, Devi Parikh, James Rehg, Stefan Lee, Peter Anderson

In Conference on Empirical Methods for Natural Language Processing (EMNLP), 2020.

Project PDF Code Video

Spatially Aware Multimodal Transformers for TextVQA

Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal

In Proceedings of the European Conference on Computer Vision (ECCV), 2020.

Project PDF Code Video Slides

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

In Proceedings of the European Conference on Computer Vision (ECCV), 2020. Spotlight Presentation [Top 5%]

PDF Video

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton van den Hengel

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. Oral Presentation [Top 5.7%]

PDF Code Leaderboard

Chasing Ghosts: Instruction Following as Bayesian State Tracking

Peter Anderson*, Ayush Shrivastava*, Devi Parikh, Dhruv Batra, Stefan Lee

In Advances in Neural Information Processing Systems (NeurIPS), 2019.

PDF Video Poster Code

nocaps: novel object captioning at scale

Harsh Agrawal*, Karan Desai*, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

In International Conference on Computer Vision (ICCV), 2019.

Project PDF Code

Audio-Visual Scene-Aware Dialog

Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

Project PDF

Visual Landmark Selection for Generating Grounded and Interpretable Navigation Instructions

Sanyam Agarwal, Devi Parikh, Dhruv Batra, Peter Anderson, Stefan Lee

In CVPR Workshop on Deep Learning for Semantic Visual Navigation, 2019.

PDF

Trainable Decoding of Sets of Sequences for Neural Sequence Models

Ashwin Kalyan, Peter Anderson, Stefan Lee, Dhruv Batra

In International Conference on Machine Learning (ICML), 2019.

PDF Code

Partially-Supervised Image Captioning

Peter Anderson, Stephen Gould, Mark Johnson

In Advances in Neural Information Processing Systems (NeurIPS), 2018.

PDF Poster

Disfluency Detection using Auto-Correlational Neural Networks

Paria Jamshid Lou, Peter Anderson, Mark Johnson

In Conference on Empirical Methods for Natural Language Processing (EMNLP), 2018.

PDF

On Evaluation of Embodied Navigation Agents

Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir

arXiv preprint 1807.06757, 2018.

PDF

Predicting accuracy on large datasets from smaller pilot data

Mark Johnson, Peter Anderson, Mark Dras, Mark Steedman

In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018. Oral Presentation [Top 4.6%]

PDF

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. Spotlight Presentation [Top 8.9%]

Project PDF Code Poster Leaderboard

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. Oral Presentation [Top 2.1%]

Project PDF Features Code Captioning Code Poster Slides

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

Damien Teney, Peter Anderson, Xiaodong He, Anton van den Hengel

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

PDF Slides

Guided Open Vocabulary Image Captioning with Constrained Beam Search

Peter Anderson, Basura Fernando, Mark Johnson and Stephen Gould

In Conference on Empirical Methods for Natural Language Processing (EMNLP), 2017.

PDF

SPICE: Semantic Propositional Image Caption Evaluation

Peter Anderson, Basura Fernando, Mark Johnson and Stephen Gould

In Proceedings of the European Conference on Computer Vision (ECCV), 2016.

Project PDF Code Poster Slides

Discriminative Hierarchical Rank Pooling for Activity Recognition

Basura Fernando, Peter Anderson, Marcus Hutter and Stephen Gould

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

PDF Code

PhD Thesis

Vision and Language Learning: From Image Captioning and Visual Question Answering towards Embodied Agents

Peter Anderson

PhD Thesis, Australian National University, 2018.

Archive PDF

Undergrad

An ICP Inspired Inverse Sensor Model with Unknown Data Association

Peter Anderson, Youssef Hunter and Bernhard Hengst

In IEEE International Conference on Robotics and Automation (ICRA), 2013.

PDF

Fast Monocular Visual Compass for a Computationally Limited Robot

Peter Anderson and Bernhard Hengst

In Proceedings of the RoboCup International Symposium (RoboCup), 2013. Oral Presentation

PDF

Robocup Standard Platform League - rUNSWift 2012 Innovations

Sean Harris, Peter Anderson, Belinda Teh, Youssef Hunter, Roger Liu, Bernhard Hengst, Ritwik Roy, Sam Li, Carl Chatfield

In Australasian Conference on Robotics and Automation (ACRA), 2012.

PDF

Robot Localisation Using Natural Landmarks

Peter Anderson, Yongki Yusmanthia, Bernhard Hengst and Arcot Sowmya

In Proceedings of the RoboCup International Symposium (RoboCup), 2012. Oral Presentation, Best Paper Finalist

PDF

Selected Talks

Vision and language: attention, navigation, and making it work ‘in the wild’. Presented at the VQA Challenge and Visual Dialog Workshop at CVPR 2018.

Bio

I am a Senior AI Researcher specializing in multimodal machine learning. Most of my research has been at the intersection of Natural Language Processing and Computer Vision, developing large-scale generative models that produce language, images, and/or actions. During my career I have moved between the tech (Google, Microsoft) and finance industries (Balyasny, Credit Suisse, Goldman Sachs). My current role developing AI models for investing combines both these aspects of my career. I completed my PhD in Computer Science at Australian National University in 2018 where I was advised by Stephen Gould. Previously I was a sell-side securities analyst with Credit Suisse in Sydney. I have the (fairly rare) distinction of winning two university medals, in Finance (from the University of Sydney) and Computer Engineering (from the University of New South Wales).

2024 -

Senior AI Researcher, Balyasny Asset Management

Developing transformative AI models for investing.
2020 - 2023

Senior Research Scientist, Google

Research into multimodal generative AI, from generating synthetic instructions for training robots to text-to-image generation and editing, e.g. Imagen Editor, Pathdreamer.
2018 - 2020

Research Scientist, Georgia Tech

Machine learning research including image captioning, visual question answering (VQA) and language-guided localization and navigation for embodied agents. Collaborating with Dhruv Batra and Devi Parikh.
2015 - 2018

PhD (Computer Science), Australian National University

Machine learning combining computer vision and NLP. Proposed bottom-up and top-down attention for vision-and-language models, the SPICE metric, and Vision-and-Language Navigation (VLN). Affiliated with the Australian Centre for Robotic Vision.
2014 - 2015

Engineer, Sabre Autonomous Solutions

Advancing an autonomous grit-blasting robot from university prototype to commercial product.
2013 - 2014

Flounder, FrameFish

Developing and commercializing virtual try-on technology for glasses and sunglasses.
2009 - 2012

BEng (Computer), University of NSW

1st class honours and university medal.
2005 - 2009

Securities Analyst, Credit Suisse

Sell-side analyst covering Australian financials including commercial banks and investment banks. Top 3 rated team.
2000 - 2004

BComm (Finance & Economics), University of Sydney

1st class honours and university medal.

Other Projects

FrameFish

FrameFish was an eyewear virtual try-on system I developed for ecommerce websites. At the time it was much faster than competing systems, taking around 1 second to generate a virtual try-on image of a person wearing a selected pair of glasses or sunglasses (versus ~10 seconds for other web-based systems in 2013). FrameFish received an Innovate NSW grant and was featured on Sky Business News and in the UNSW Young Entrepreneurs series.