NEW! (February 2020) Delighted to be co-organizing workshops on Embodied Vision, Actions and Language (EVAL) at ECCV 2020 and Advances in Language and Vision Research (ALVR) at ACL 2020.

NEW! (October 2019) Excited to announce: from January, I’ll be a Research Scientist at Google in Austin! Looking forward to collaborating with Jason Baldridge, Radu Soricut and others!

NEW! (September 2019) Our paper Chasing Ghosts: Instruction Following as Bayesian State Tracking is accepted to NeurIPS 2019.

NEW! (July 2019) Our nocaps paper is accepted to ICCV 2019.

NEW! (May 2019) Recognized as a CVPR 2019 outstanding reviewer.

NEW! (May 2019) We have released the nocaps benchmark for novel object captioning at scale.


Publications [Google Scholar]


Chasing Ghosts: Instruction Following as Bayesian State Tracking

Peter Anderson*, Ayush Shrivastava*, Devi Parikh, Dhruv Batra, Stefan Lee

In Advances in Neural Information Processing Systems (NeurIPS), 2019.

PDF Video Poster Code

RERERE: Remote Embodied Referring Expressions in Real indoor Environments

Yuankai Qi, Qi Wu, Peter Anderson, Marco Liu, Chunhua Shen, Anton van den Hengel

arXiv preprint 1904.10151, 2019.


nocaps: novel object captioning at scale

Harsh Agrawal*, Karan Desai*, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

In International Conference on Computer Vision (ICCV), 2019.

Project PDF Code

Audio-Visual Scene-Aware Dialog

Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

Project PDF

Visual Landmark Selection for Generating Grounded and Interpretable Navigation Instructions

Sanyam Agarwal, Devi Parikh, Dhruv Batra, Peter Anderson, Stefan Lee

In CVPR Workshop on Deep Learning for Semantic Visual Navigation, 2019.


Trainable Decoding of Sets of Sequences for Neural Sequence Models

Ashwin Kalyan, Peter Anderson, Stefan Lee, Dhruv Batra

In International Conference on Machine Learning (ICML), 2019. Oral Presentation

PDF Code


Partially-Supervised Image Captioning

Peter Anderson, Stephen Gould, Mark Johnson

In Advances in Neural Information Processing Systems (NeurIPS), 2018.

PDF Poster

Disfluency Detection using Auto-Correlational Neural Networks

Paria Jamshid Lou, Peter Anderson, Mark Johnson

In Conference on Empirical Methods for Natural Language Processing (EMNLP), 2018.


On Evaluation of Embodied Navigation Agents

Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir

arXiv preprint 1807.06757, 2018.


Predicting accuracy on large datasets from smaller pilot data

Mark Johnson, Peter Anderson, Mark Dras, Mark Steedman

In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018. Oral Presentation [Top 4.6%]


Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. Spotlight Presentation [Top 8.9%]

Project PDF Code Poster

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. Oral Presentation [Top 2.1%]

Project PDF Features Code Captioning Code Poster Slides

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

Damien Teney, Peter Anderson, Xiaodong He, Anton van den Hengel

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

PDF Slides


Guided Open Vocabulary Image Captioning with Constrained Beam Search

Peter Anderson, Basura Fernando, Mark Johnson and Stephen Gould

In Conference on Empirical Methods for Natural Language Processing (EMNLP), 2017.


SPICE: Semantic Propositional Image Caption Evaluation

Peter Anderson, Basura Fernando, Mark Johnson and Stephen Gould

In Proceedings of the European Conference on Computer Vision (ECCV), 2016.

Project PDF Code Poster Slides

Discriminative Hierarchical Rank Pooling for Activity Recognition

Basura Fernando, Peter Anderson, Marcus Hutter and Stephen Gould

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

PDF Code

PhD Thesis

Vision and Language Learning: From Image Captioning and Visual Question Answering towards Embodied Agents

Peter Anderson

PhD Thesis, Australian National University, 2018.

Archive PDF


An ICP Inspired Inverse Sensor Model with Unknown Data Association

Peter Anderson, Youssef Hunter and Bernhard Hengst

In IEEE International Conference on Robotics and Automation (ICRA), 2013.


Fast Monocular Visual Compass for a Computationally Limited Robot

Peter Anderson and Bernhard Hengst

In Proceedings of the RoboCup International Symposium (RoboCup), 2013. Oral Presentation


Robocup Standard Platform League - rUNSWift 2012 Innovations

Sean Harris, Peter Anderson, Belinda Teh, Youssef Hunter, Roger Liu, Bernhard Hengst, Ritwik Roy, Sam Li, Carl Chatfield

In Australasian Conference on Robotics and Automation (ACRA), 2012.


Robot Localisation Using Natural Landmarks

Peter Anderson, Yongki Yusmanthia, Bernhard Hengst and Arcot Sowmya

In Proceedings of the RoboCup International Symposium (RoboCup), 2012. Oral Presentation, Best Paper Finalist


Selected Talks

Vision and language: attention, navigation, and making it work ‘in the wild’. Presented at the VQA Challenge and Visual Dialog Workshop at CVPR 2018.

Three Minute Thesis (3MT) Talk - The Language of Sight (non-technical). ANU 3MT Finalist, September 2017.

A Practical Introduction to Deep Learning with Caffe. Presented at the Deep Learning Workshop at AI 2015 / ACRA 2015 in December 2015.


I am a Research Scientist in the Language team at Google Research. My research interests include computer vision, natural language processing and AI in general, and problems at the intersection of computer vision and natural language processing in particular. My recent work has focused on grounded language learning, particularly in large-scale visually-realistic 3D environments. I completed my PhD in Computer Science at Australian National University in 2018 where I was advised by Stephen Gould. I was also fortunate to work with Mark Johnson from Macquarie University and Anton van den Hengel from the University of Adelaide. In my previous life I was a sell-side securities analyst with Credit Suisse in Sydney. I have the (fairly rare) distinction of winning two university medals, in Finance (from the University of Sydney) and Computer Engineering (from the University of New South Wales).

2020 -

Research Scientist, Google

Research into large-scale grounded language learning.
2018 - 2020

Research Scientist, Georgia Tech

Collaborating with Dhruv Batra and Devi Parikh on research in vision and language, e.g. image captioning, visual question answering (VQA), vision-and-language navigation (VLN), etc.
2015 - 2018

PhD (Computer Science), Australian National University

Machine learning combining visual and linguistic understanding. Affiliated with the Australian Centre for Robotic Vision, advised by Stephen Gould.
2014 - 2015

Engineer, Sabre Autonomous Solutions

Advancing an autonomous grit-blasting robot from university prototype to commercial product.
2013 - 2014

Flounder, FrameFish

Developing and commercializing virtual try-on technology for glasses and sunglasses.
2009 - 2012

BEng (Computer), University of NSW

1st class honours and university medal.
2005 - 2009

Securities Analyst, Credit Suisse

Sell-side analyst covering Australian financials including commercial banks and investment banks.
2000 - 2004

BComm (Finance & Economics), University of Sydney

1st class honours and university medal.

Other Projects


FrameFish was an eyewear virtual try-on system I developed for ecommerce websites. At the time it was much faster than competing systems, taking around 1 second to generate a virtual try-on image of a person wearing a selected pair of glasses or sunglasses (versus ~10 seconds for other web-based systems in 2013). FrameFish received an Innovate NSW grant and was featured on Sky Business News and in the UNSW Young Entrepreneurs series.