Tell Me Dave
Making Robots Follow Human Commands


In order for robots to perform tasks in real world, they need to be able to understand our natural language commands. While there is a lot of past research that went into the task of language parsing, they often require the instructions to be spelled out in full detail which makes it difficult to use them in real world situations. Our goal is to enable a robots to even take an ill-specified instruction as generic as “Make a cup of coffee” and be able to figure out how to fill a cup with milk or use one if it already has milk etc. depending on how the environment looks.

You can find the details of our research, publications and videos in the research & video section. A demo of our robot working on VEIL-200 dataset can be found here. We look forward to your support in producing more data by playing with our simulator that will help our robots to be more accurate.

Tell Me Dave in popular press:

Teach Robot

Help improve our model by playing with our virtual robot and giving it commands! Below you can see a video of a person controlling our virtual robot in first person perspective to complete the task of make ramen.


New user?   Sign up now!


Our algorithm accepts natural language commands from the user and the environment in which to execute them and outputs a sequence of instructions which can be executed by the robot using a latent-CRF model that is trained by the data given by users playing an online robotic simulator.

Takes environment and sentence as input.
Clausal Segmentation of sentence.
Our latent-CRF model infers a sequence of instructions.
Model is trained by data given by people playing game.

Our model tackles challenges such as handling missing instructions and ground language to appropriate sequence depending upon the new environment. For example, while making coffee we might have a microwave, or a stove in different configurations for boiling and there could be different ways for adding sugar, milk etc. Finally, our model is trained from data given by people playing a virtual game online. We have tested our algorithm on a large variety of tasks (see paper below).

Tell Me Dave: Context-Sensitive Grounding of Natural Language to Mobile Manipulation Instructions, Dipendra K Misra, Jaeyong Sung, Kevin Lee, Ashutosh Saxena. In Robotics: Science and Systems (RSS), 2014. [PDF], [Data coming soon]

Synthesizing Manipulation Sequences for Under-Specified Tasks using Unrolled Markov Random Fields. , Jaeyong Sung, Bart Selman, Ashutosh Saxena. In International Conference on Intelligent Robotics and Systems (IROS), 2014. [PDF]


We also give a end-to-end implementation [above] of our robot making affogato recipe as verbally commanded by a person:

“Take some coffee in a cup. Add ice cream of your choice. Finally add raspberry syrup to the mixture.”

We see that this sentence is fairly ambiguous in that it neither specifies which ice cream to take [which depends upon what is available] and nor does it specify all the details like taking a cup with coffee [if one exists] or firstly making coffee and if so how.

In another recent work from our lab [Sung et al] look at the problem of coming up sequence of actions in an unstructured environment to accomplish a task. In the video above you see the PR2 robot serving sweet tea.


Department of Computer Science,
Cornell University

We thank Kevin K. Lee, Kejia Tao and Aditya Jami for their valuable contribution.