Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Focusing attention at the level of objects and other salient image regions.
Focusing attention at the level of objects and other salient image regions.
What’s missing from deep learning?
Evaluating image captions using scene graph tuples.
Picking the low-hanging deep fruit.
If an iPhone can threaten a Grandmaster, why can’t I get a robot to unpack the dishwasher?