Tell Me Dave is a large, vaguely humanoid bot that can cook simple meals according to spoken instructions. But programming Tell Me Dave to understand even one kind of order is tricky: humans have an annoying tendency to ask for the same thing in a variety of different ways, or to combine several discrete steps into one short command.
So Saxena and his colleagues at Cornell University in Ithaca, New York, turned to crowdsourcing to help Tell Me Dave understand these complex requests. They developed a computer game in which human players are placed in a virtual kitchen and asked to follow a set of sample instructions, much like the robot would. These games are used to train the algorithm that guides Tell Me Dave, so when it's later faced with commands like "boil some water" or "cook the ramen" in a real kitchen, it can come up with the appropriate actions. So far, the robot gets it right about two-thirds of the time.
When humans communicate, they can use nonverbal cues like eye-gaze and pointing to help the other person understand what they mean. Or, if the other person is about to make an error, they can quickly step in and fix it ("No, I meant that book, over there"). Robots rarely have the skills or the opportunity to do either.
Tell Me Dave – which will be exhibited next month at the 2014 Robotics: Science and Systems conference in Berkeley, California – tries to dodge much of the confusion that prevents machines from understanding language. The focus of the research is on helping robots connect words to objects and actions in the real world, a skill that computer scientists call grounding.
"Grounding is a complex, hard problem, but this is a pretty good step in terms of improving it," says Bilge Mutlu at the University of Wisconsin-Madison.
Please, no surprises
One hitch with the Tell Me Dave approach is that it cannot handle the unexpected, says Matthias Scheutz of Tufts University at Medford, Massachusetts. If someone uses an unusual word or asks the robot to perform a new type of task, it will not know what to do.
Comprehension is just one piece of the puzzle; getting machines to communicate clearly back to us is much more difficult. At the Massachusetts Institute of Technology, engineer Russ Knepper and his colleagues are developing robots that can help a volunteer assemble IKEA furniture. But when their robot got stuck – because a necessary part was missing, say – the machines were unable to explain the specifics of what was needed, and instead could only ask for generalised "help". While the robot's human helper tried to figure out what the machine wanted, the building process quickly ground to a halt.
To tackle this issue, the team introduced a new approach called inverse semantics, by which the robot tries to choose the right words by looking at its environment. Like Tell Me Dave, the inverse semantics algorithm is informed by real humans.
The researchers asked users on Amazon's Mechanical Turk crowdsourcing site to generate possible help messages for different scenarios, which were then used to train a new algorithm. For the algorithm, it's not only a question of picking the right words to describe the problem, but also the best words. Humans tend to prefer shorter messages, those that communicate the robot's problem in as few words as possible without being ambiguous.
Dialogue is key
A well-built IKEA table is a modest goal, but Knepper says he could envision similar algorithms one day being used in autonomous cars. If the car arrives in a neighbourhood it doesn't know or a traffic situation it cannot handle, it could rely on inverse semantics to ask the driver for help.
As it is, Knepper says that volunteers who work with the robot often find themselves talking to it, though it cannot yet understand them. In the long term, that kind of back and forth is the goal for robot-human communication. "Inverse semantics is a building block," he says. "Where there really would be a win is in dialogue."
Perhaps human languages alone simply aren't up to the task. Researchers at the Eindhoven University of Technology in the Netherlands are working on the Robot Interaction Language (ROILA), a kind of Esperanto for robots. ROILA is an attempt to optimise language for "efficient recognition" by robots, relying on common phonemes and a rigid, regular grammatical structure. Those interested in ROILA – or just impatient for robot-human communication to take off – can enrol in a free course online.