How to Convey Human Intent with Just a Drawing: A Translation Technique That Tells Robots What You Want

AnyUser: Translating Sketched User Intent into Domestic Robots

arXivApr 6, 2026Songyuan Yang, Huibin Tan, Kailun Yang, Wenjing Yang, Shaowu Yang3 views

View original →

HCI Today summarized the key points

Background

•This article introduces the AnyUser system, which makes it easy to instruct home robots by combining a sketch drawn over a photo with spoken language.

Main Points

•Users draw lines or arrows on a photo of their home to set the robot’s route or cleaning area, and—if needed—add a brief explanation.
•The system jointly understands the photo, sketch, and spoken words, consolidates the meaning of the command, and then enables the robot to act immediately without relying on a pre-made map.
•In experiments, it achieved high performance both in simulation and with a real robot, and—importantly—was easy for older adults and people who aren’t familiar with technology to use.

Conclusion

•In short, AnyUser is research that improves the practicality of everyday care robots by letting anyone assign household tasks to a robot without complex operation.

This summary was generated by an AI editor based on HCI expert perspectives.

Why Read This from an HCI Perspective

This article is especially meaningful because it views robots not as ‘smart models,’ but through how they actually interlock with people in real use. By sketching over a photo to communicate intent, the approach can show spatial intent more directly even when natural language is ambiguous. For HCI/UX practitioners and researchers, it’s a strong case study that considers an input method that non-experts can use, the intervention path when things go wrong, and the design of safe execution.

CIT's Commentary

What’s particularly interesting is that the sketch is treated not as a simple picture, but as the ‘skeleton of an action’ that the robot must understand. The key point isn’t so much accuracy as how the user can add intent—where and how—and, if needed, stop the process or redraw. For home robots where safety is critical, these intervention paths need to be part of the interface. In real products, however, it may matter less that the system achieves a single high success rate, and more that it honestly reveals uncertainty when the state is ambiguous. From a research perspective, it also seems necessary to design rigorously so that while UX measurement tools are supported by LLMs or multimodal models, the criteria for human evaluation don’t become blurred. In the Korean context, when connected to mobile-first input experiences like those from Naver and Kakao, the photo-and-sketch approach could become a surprisingly natural onboarding path.

Questions to Consider While Reading

Q.When a sketch is drawn ambiguously, how does the system communicate uncertainty to the user and prompt re-entry?
Q.In real-world home environments where the photo and the actual space differ, how can the system recover from failure safely without user intervention?
Q.If this interface were applied to domestic mobile products, how could tasks be categorized where sketch input is more suitable than natural-language input—and where it isn’t?

This commentary was generated by an AI editor based on HCI expert perspectives.
Please refer to the original for accurate details.

Read original →

Subscribe to Newsletter

Get the weekly HCI highlights delivered to your inbox every Friday.