NEW! (July 2022) Serving as an Area Chair for NeurIPS 2022.More
NEW! (July 2022) Serving as an Area Chair for NeurIPS 2022.
NEW! (March 2022) Our paper on generating navigation instructions from landmarks is accepted to CVPR 2022. We have released data from the paper - 1M grounded landmarks and 1M generated navigation instructions.
NEW! (March 2022) Serving as a Senior Area Chair for NAACL 2022.
NEW! (July 2021) One paper accepted to ICCV 2021.
NEW! (June 2021) Presenting an invited talk at the CVPR Embodied AI workshop - ‘Massive Datasets for Language-Guided Navigation Agents and Where to Find Them’.
NEW! (June 2021) Co-organizing a CVPR tutorial - From VQA to VLN: Recent Advances in Vision-and-Language Research
NEW! (February 2021) RxR-Habitat Challenge launched for multilingual instruction-following in continuous environments!
NEW! (January 2021) Our paper on the evaluation of grounded navigation instructions is accepted to EACL 2021.
NEW! (January 2021) Our RxR challenge and leaderboard for multilingual vision-and-language navigation is now open for submissions! We’ve also open-sourced PanGEA, the annotation toolkit we developed to collect RxR.
NEW! (December 2020) Recognized as an EMNLP 2020 outstanding reviewer.
NEW! (October 2020) Excited to release our new RxR dataset for multilingual Vision-and-Language Navigation (VLN).
NEW! (October 2020) Our VLN sim-to-real paper is accepted to CoRL 2020.
NEW! (September 2020) Two papers accepted to EMNLP 2020.
NEW! (July 2020) Two papers accepted to ECCV 2020.
NEW! (February 2020) Our REVERIE paper on remote embodied referring expressions is accepted to CVPR as an oral presentation.
NEW! (September 2019) Our paper Chasing Ghosts: Instruction Following as Bayesian State Tracking is accepted to NeurIPS 2019.
NEW! (May 2019) Recognized as a CVPR 2019 outstanding reviewer.
NEW! (May 2019) We have released the nocaps benchmark for novel object captioning at scale.
NEW! (April 2019) We have a paper accepted to ICML 2019. Congratulations Ashwin.
NEW! (February 2019) We have a paper accepted to CVPR 2019. Congratulations to Huda and Vincent.
NEW! (February 2019) I am serving as an Area Chair for NeurIPS 2019.
NEW! (December 2018) Co-organizing the Visually Grounded Interaction and Language Workshop at NeurIPS.
NEW! (September 2018) Recognized as a NeurIPS 2018 outstanding reviewer.
NEW! (September 2018) Our paper is accepted to NeurIPS 2018.
NEW! (August 2018) We have one paper accepted to EMNLP 2018.
NEW! (June 2018) Presenting an invited talk at the VQA Challenge and Visual Dialog workshop at CVPR.
NEW! (May 2018) Our Vision and Language Navigation (VLN) challenge and leaderboard is now live on EvalAI!
NEW! (May 2018) Very excited to be an organizer of the ECCV 2018 workshop on Visual Learning and Embodied Agents in Simulation Environments.
NEW! (April 2018) We have a paper accepted to ACL 2018.
NEW! (February 2018) We have published code for our recently state-of-the-art image captioning model.
NEW! (Sept 2017) We have been selected to receive a Facebook ParlAI research award.
NEW! (26 July 2017) We are 1st in the 2017 Visual Question Answering (VQA) Challenge at CVPR! We are also 1st on the MSCOCO image captioning leaderboard. Details and code are on the project page.
NEW! (July 2017) Our paper on out-of-domain image captioning is accepted to EMNLP 2017.
NEW! (July 2016) We have released a new image caption evaluation metric (SPICE) that improves on CIDEr and METEOR. The paper will appear at ECCV 2016.Less
Publications [Google Scholar]
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
Su Wang*, Chitwan Saharia*, Ceslee Montgomery*, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi†, Peter Anderson†, William Chan†
arXiv preprint 2212.06909, 2022.
A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Aishwarya Kamath*, Peter Anderson*, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh
arXiv preprint 2210.03112, 2022.
Iterative Vision-and-Language Navigation
Jacob Krantz*, Shurjo Banerjee*, Wang Zhu, Jason Corso, Peter Anderson, Stefan Lee, Jesse Thomason
arXiv preprint 2210.03087, 2022.
Simple and Effective Synthesis of Indoor 3D Scenes
Jing Yu Koh*, Harsh Agrawal*, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
In AAAI Conference on Artificial Intelligence (AAAI), 2023.
Less is More: Generating Grounded Navigation Instructions from Landmarks
Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason Baldridge, Peter Anderson
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton van den Hengel
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. Oral Presentation [Top 5.7%]
Visual Landmark Selection for Generating Grounded and Interpretable Navigation Instructions
Sanyam Agarwal, Devi Parikh, Dhruv Batra, Peter Anderson, Stefan Lee
In CVPR Workshop on Deep Learning for Semantic Visual Navigation, 2019.
Disfluency Detection using Auto-Correlational Neural Networks
Paria Jamshid Lou, Peter Anderson, Mark Johnson
In Conference on Empirical Methods for Natural Language Processing (EMNLP), 2018.
On Evaluation of Embodied Navigation Agents
Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir
arXiv preprint 1807.06757, 2018.
Predicting accuracy on large datasets from smaller pilot data
Mark Johnson, Peter Anderson, Mark Dras, Mark Steedman
In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018. Oral Presentation [Top 4.6%]
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. Spotlight Presentation [Top 8.9%]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. Oral Presentation [Top 2.1%]
Guided Open Vocabulary Image Captioning with Constrained Beam Search
Peter Anderson, Basura Fernando, Mark Johnson and Stephen Gould
In Conference on Empirical Methods for Natural Language Processing (EMNLP), 2017.
An ICP Inspired Inverse Sensor Model with Unknown Data Association
Peter Anderson, Youssef Hunter and Bernhard Hengst
In IEEE International Conference on Robotics and Automation (ICRA), 2013.
Fast Monocular Visual Compass for a Computationally Limited Robot
Peter Anderson and Bernhard Hengst
In Proceedings of the RoboCup International Symposium (RoboCup), 2013. Oral Presentation
Robocup Standard Platform League - rUNSWift 2012 Innovations
Sean Harris, Peter Anderson, Belinda Teh, Youssef Hunter, Roger Liu, Bernhard Hengst, Ritwik Roy, Sam Li, Carl Chatfield
In Australasian Conference on Robotics and Automation (ACRA), 2012.
Robot Localisation Using Natural Landmarks
Peter Anderson, Yongki Yusmanthia, Bernhard Hengst and Arcot Sowmya
In Proceedings of the RoboCup International Symposium (RoboCup), 2012. Oral Presentation, Best Paper Finalist
I am a Research Scientist in the Language team at Google Research. My research interests include computer vision, natural language processing and AI in general, and problems at the intersection of computer vision and natural language processing in particular. My recent work has focused on grounded language learning, particularly in large-scale visually-realistic 3D environments. I completed my PhD in Computer Science at Australian National University in 2018 where I was advised by Stephen Gould. In my previous life I was a sell-side securities analyst with Credit Suisse in Sydney. I have the (fairly rare) distinction of winning two university medals, in Finance (from the University of Sydney) and Computer Engineering (from the University of New South Wales).
Senior Research Scientist, Google
Research Scientist, Georgia Tech
PhD (Computer Science), Australian National University
Engineer, Sabre Autonomous Solutions
BEng (Computer), University of NSW
Securities Analyst, Credit Suisse
BComm (Finance & Economics), University of Sydney
FrameFish was an eyewear virtual try-on system I developed for ecommerce websites. At the time it was much faster than competing systems, taking around 1 second to generate a virtual try-on image of a person wearing a selected pair of glasses or sunglasses (versus ~10 seconds for other web-based systems in 2013). FrameFish received an Innovate NSW grant and was featured on Sky Business News and in the UNSW Young Entrepreneurs series.