Clicking Isn’t Enough: ‘GUI Agents’ That Help by Transforming the Screen in Real Time

Beyond Chat and Clicks: GUI Agents for In-Situ Assistance via Live Interface Transformation

arXivApr 16, 2026Pan Hao, Rishi Selvakumaran, Jacob Sun, Qianwen Wang5 views

View original →

HCI Today summarized the key points

Background

•This article introduces DOMSteer, a GUI agent tool that provides help directly within web interfaces.

Main Points

•Conventional chat-based help explains things outside the screen, making it hard for users to follow along, and it’s also difficult to build separately for each app.
•DOMSteer helps users right on the screen by making small changes to a web page’s DOM (Document Object Model)—including updates to explanations, emphasis, and layout.
•The research team analyzed users’ difficulties in six dimensions: what the problem is, where it occurs, and how users handle it, among others.

Conclusion

•In experiments, DOMSteer was faster and more accurate than a chat-based assistant, demonstrating that it can make complex web tools easier to use.

This summary was generated by an AI editor based on HCI expert perspectives.

Why Read This from an HCI Perspective

This article treats AI not as a ‘smart tool that tells you the right answer,’ but as an interaction challenge: how to help users navigate and operate within the interface itself. In particular, it shows why guidance delivered directly on the actual screen is faster and less confusing than relying on a separate chat window—making the implications highly relevant to HCI and UX practitioners. The ideas are applicable to complex web services, workplace tools, and AI agent design.

CIT's Commentary

What’s especially interesting is that the problem is reframed from ‘Is the model smart?’ to ‘How well can the help be seen on the screen, how trustworthy is it, and how easily can it be undone?’ Even when chat-based help explains things well, users still have to go back to the screen, translate or interpret the instructions, and locate the relevant elements again. This piece closes that gap through DOM manipulation. However, in real products, because this immediacy is convenient, the interface can also end up bearing responsibility for malfunctions if the emphasis is wrong or the structure changes unexpectedly. That’s why transparent status indicators, clear undo paths, and explicit exposure of failure modes are crucial. Especially for large-scale services like Naver and Kakao, or domestic B2B tools, where screen structures change frequently and experimentation is common, it may be more important to define a stable intervention scope and provide operational guardrails than to maximize generality.

Questions to Consider While Reading

Q.When this approach is deployed in real services, how clearly should users be made aware that the AI is changing the screen?
Q.If you mix chat-based help with in-situ help, in what situations should each be prioritized?
Q.In domestic service environments where the DOM structure is complex or frequently changes, how far can this approach realistically be maintained?

This commentary was generated by an AI editor based on HCI expert perspectives.
Please refer to the original for accurate details.

Read original →

Subscribe to Newsletter

Get the weekly HCI highlights delivered to your inbox every Friday.