Focusing attention at the level of objects and other salient image regions.
What’s missing from deep learning?
Evaluating image captions using scene graph tuples.
Picking the low-hanging deep fruit.
If an iPhone can threaten a Grandmaster, why can’t I get a robot to unpack the dishwasher?