Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Focusing attention at the level of objects and other salient image regions.
The world of chess is in crisis. A top player was caught cheating with their iPhone in a major international tournament. The player in question, the reigning champion of Georgia, had an iPhone wrapped in toilet paper hidden in the bathroom. Apparently, the cheat was running to the bathroom between moves to analyse the game in a chess app.
The episode is an interesting punctuation mark in the history of AI. It’s been almost 20 years now since a purpose-built computer first beat the world champion human. That happened in 1996 - back then, Prince Charles and Diana, Princess of Wales were just getting divorced, Google didn’t exist, and the Macarena was topping the music charts. Now, phones are playing at an international standard. I guess that’s why chess is not considered to be AI anymore.
The surprising thing for many people is that even with the huge advances that have taken place, we still can’t buy a robot that unpacks the dishwasher, or a washing machine that folds clothes. Several high profile researchers first observed this paradox in the late 1980’s, when they noticed that high-level reasoning was much easier to compute than low-level perception and motor skills. Hans Moravec wrote:
…it is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility.
As a society, we’ve always associated ‘intelligence’ with things that educated humans find difficult, like playing chess. However, if ‘intelligence’ has anything to do with the capacity to perceive, retain and apply information, then we are looking at it wrong. Abstract thought is easy, it’s perception and motor skills that are hard. Humans just happen to be very good at perception and motor skills. As Moravec described it in 1988:
Encoded in the large, highly evolved sensory and motor portions of the human brain is a billion years of experience about the nature of the world and how to survive in it… We are all prodigious olympians in perceptual and motor areas, so good that we make the difficult look easy. Abstract thought, though, is a new trick, perhaps less than 100 thousand years old. We have not yet mastered it. It is not all that intrinsically difficult; it just seems so when we do it.
Sorry chess nerds, but in some very real sense, folding clothes or unpacking the dishwasher is actually much harder than playing chess. There’s no better demonstration of this than seeing a Grandmaster threatened by a phone.