AgentLens: Smart Visual Techniques for Better Communication Between People and AI on Mobile Screens

AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents

arXivApr 22, 2026Jeonghyeon Kim, Byeongjun Joung, Junwon Lee, Joohyung Lee, Taehoon Min0 views

View original →

HCI Today summarized the key points

Background

•This article studies how an AI agent that operates a smartphone app on a user’s behalf should communicate with the user through the screen.

Main Points

•Existing approaches fall into two categories: showing the task upfront and doing it secretly in the background. Each has limitations, making them difficult to use together.
•Through multiple user studies, the research team found that a hybrid approach—showing the screen only when needed—is preferable, but that the way the screen is shown should vary depending on the situation.
•Based on this, the team built AgentLens, which adapts its display using three modes: full screen, partial screen, and a newly generated UI.

Conclusion

•In the experiments, AgentLens received 85.7% of participants’ selections, along with the highest ratings for ease of use and willingness to use it again.

This summary was generated by an AI editor based on HCI expert perspectives.

Why Read This from an HCI Perspective

This article focuses on how mobile AI agents ‘work together with users’ rather than simply what they can do. In other words, beyond evaluating execution performance alone, the key is how well the system communicates its ongoing state and when the user can or should step in. For HCI/UX practitioners and researchers, it provides concrete evidence of why transparency, trust, and intervention design matter in multitasking scenarios.

CIT's Commentary

What’s interesting about AgentLens is that it treats AI not as a standalone automation tool, but as a collaborator that works between the person and the screen. In particular, the idea of switching among Full UI, Partial UI, and GenUI depending on the situation correctly challenges the notion that the answer is simply ‘always show more.’ Background execution may be convenient, but users can feel uneasy; foreground execution may feel reassuring, but it can prevent them from doing other things. This paper experimentally searches for the right compromise between the two. However, in real products, the criteria for switching modes may become more complex, so more fine-grained design is likely needed to determine which visualizations reduce cognitive burden for which kinds of tasks. In high-notification environments like many domestic mobile services, this kind of ‘timely visualization’ could deliver even greater value.

Questions to Consider While Reading

Q.How can we explain the criteria for switching between Full UI, Partial UI, and GenUI to users in a way that increases both trust and predictability?
Q.What measurement approach would best distinguish whether a visual overlay during background execution reduces cognitive load or instead becomes distracting?
Q.In environments like domestic mobile services—where notifications and screen transitions are frequent—which failure modes should such hybrid agents prepare for first?

This commentary was generated by an AI editor based on HCI expert perspectives.
Please refer to the original for accurate details.

Read original →

Subscribe to Newsletter

Get the weekly HCI highlights delivered to your inbox every Friday.