NEW! (July 2020) Two papers accepted to ECCV 2020.
NEW! (February 2020) Our REVERIE paper on remote embodied referring expressions is accepted to CVPR as an oral presentation.More
NEW! (July 2020) Two papers accepted to ECCV 2020.
NEW! (February 2020) Our REVERIE paper on remote embodied referring expressions is accepted to CVPR as an oral presentation.
NEW! (September 2019) Our paper Chasing Ghosts: Instruction Following as Bayesian State Tracking is accepted to NeurIPS 2019.
NEW! (May 2019) Recognized as a CVPR 2019 outstanding reviewer.
NEW! (May 2019) We have released the nocaps benchmark for novel object captioning at scale.
NEW! (April 2019) We have a paper accepted to ICML 2019. Congratulations Ashwin.
NEW! (February 2019) We have a paper accepted to CVPR 2019. Congratulations to Huda and Vincent.
NEW! (February 2019) I am serving as an Area Chair for NeurIPS 2019.
NEW! (December 2018) Co-organizing the Visually Grounded Interaction and Language Workshop at NeurIPS.
NEW! (September 2018) Recognized as a NeurIPS 2018 outstanding reviewer.
NEW! (September 2018) Our paper is accepted to NeurIPS 2018.
NEW! (August 2018) We have one paper accepted to EMNLP 2018.
NEW! (June 2018) Presenting an invited talk at the VQA Challenge and Visual Dialog workshop at CVPR.
NEW! (May 2018) Our Vision and Language Navigation (VLN) challenge and leaderboard is now live on EvalAI!
NEW! (May 2018) Very excited to be an organizer of the ECCV 2018 workshop on Visual Learning and Embodied Agents in Simulation Environments.
NEW! (April 2018) We have a paper accepted to ACL 2018.
NEW! (February 2018) We have published code for our recently state-of-the-art image captioning model.
NEW! (Sept 2017) We have been selected to receive a Facebook ParlAI research award.
NEW! (26 July 2017) We are 1st in the 2017 Visual Question Answering (VQA) Challenge at CVPR! We are also 1st on the MSCOCO image captioning leaderboard. Details and code are on the project page.
NEW! (July 2017) Our paper on out-of-domain image captioning is accepted to EMNLP 2017.
NEW! (July 2016) We have released a new image caption evaluation metric (SPICE) that improves on CIDEr and METEOR. The paper will appear at ECCV 2016.Less
Publications [Google Scholar]
Spatially Aware Multimodal Transformers for TextVQA
Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal
In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra
In Proceedings of the European Conference on Computer Vision (ECCV), 2020. Spotlight Presentation [Top 5%]
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton van den Hengel
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. Oral Presentation [Top 5.7%]
Visual Landmark Selection for Generating Grounded and Interpretable Navigation Instructions
Sanyam Agarwal, Devi Parikh, Dhruv Batra, Peter Anderson, Stefan Lee
In CVPR Workshop on Deep Learning for Semantic Visual Navigation, 2019.
Disfluency Detection using Auto-Correlational Neural Networks
Paria Jamshid Lou, Peter Anderson, Mark Johnson
In Conference on Empirical Methods for Natural Language Processing (EMNLP), 2018.
On Evaluation of Embodied Navigation Agents
Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir
arXiv preprint 1807.06757, 2018.
Predicting accuracy on large datasets from smaller pilot data
Mark Johnson, Peter Anderson, Mark Dras, Mark Steedman
In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018. Oral Presentation [Top 4.6%]
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. Spotlight Presentation [Top 8.9%]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang
In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. Oral Presentation [Top 2.1%]
Guided Open Vocabulary Image Captioning with Constrained Beam Search
Peter Anderson, Basura Fernando, Mark Johnson and Stephen Gould
In Conference on Empirical Methods for Natural Language Processing (EMNLP), 2017.
An ICP Inspired Inverse Sensor Model with Unknown Data Association
Peter Anderson, Youssef Hunter and Bernhard Hengst
In IEEE International Conference on Robotics and Automation (ICRA), 2013.
Fast Monocular Visual Compass for a Computationally Limited Robot
Peter Anderson and Bernhard Hengst
In Proceedings of the RoboCup International Symposium (RoboCup), 2013. Oral Presentation
Robocup Standard Platform League - rUNSWift 2012 Innovations
Sean Harris, Peter Anderson, Belinda Teh, Youssef Hunter, Roger Liu, Bernhard Hengst, Ritwik Roy, Sam Li, Carl Chatfield
In Australasian Conference on Robotics and Automation (ACRA), 2012.
Robot Localisation Using Natural Landmarks
Peter Anderson, Yongki Yusmanthia, Bernhard Hengst and Arcot Sowmya
In Proceedings of the RoboCup International Symposium (RoboCup), 2012. Oral Presentation, Best Paper Finalist
Three Minute Thesis (3MT) Talk - The Language of Sight (non-technical). ANU 3MT Finalist, September 2017.
A Practical Introduction to Deep Learning with Caffe. Presented at the Deep Learning Workshop at AI 2015 / ACRA 2015 in December 2015.
I am a Research Scientist in the Language team at Google Research. My research interests include computer vision, natural language processing and AI in general, and problems at the intersection of computer vision and natural language processing in particular. My recent work has focused on grounded language learning, particularly in large-scale visually-realistic 3D environments. I completed my PhD in Computer Science at Australian National University in 2018 where I was advised by Stephen Gould. I was also fortunate to work with Mark Johnson from Macquarie University and Anton van den Hengel from the University of Adelaide. In my previous life I was a sell-side securities analyst with Credit Suisse in Sydney. I have the (fairly rare) distinction of winning two university medals, in Finance (from the University of Sydney) and Computer Engineering (from the University of New South Wales).
Research Scientist, Google
Research Scientist, Georgia Tech
PhD (Computer Science), Australian National University
Engineer, Sabre Autonomous Solutions
BEng (Computer), University of NSW
Securities Analyst, Credit Suisse
BComm (Finance & Economics), University of Sydney
FrameFish was an eyewear virtual try-on system I developed for ecommerce websites. At the time it was much faster than competing systems, taking around 1 second to generate a virtual try-on image of a person wearing a selected pair of glasses or sunglasses (versus ~10 seconds for other web-based systems in 2013). FrameFish received an Innovate NSW grant and was featured on Sky Business News and in the UNSW Young Entrepreneurs series.