SwEYEpinch: Exploring Intuitive, Efficient Text Entry for Extended Reality via Eye and Hand Tracking

arXivApr 3, 2026Ziheng "Leo" Li, Xichen He, Mengyuan "Millie" Wu, Zeyi Tong, Haowen Wei2 views

View original →

HCI Today summarized the key points

Background

•This article introduces a way to enter text in XR (extended reality) more quickly and easily by using both eye tracking and hand input.

Main Points

•The research team developed SwEYEpinch, where users follow the path of a word with their eyes and indicate the start and end using a finger pinch gesture.
•The baseline version was faster than methods that rely on finger taps or selecting one character at a time with eye gaze, but it showed candidate words too late, leading to slightly more mistakes.
•The improved version displays word candidates during input and allows users to cancel midstream if the selection is wrong, improving both speed and accuracy.

Conclusion

•Some participants who practiced for a week reached speeds close to a standard keyboard, suggesting that this could become a practical text-entry method for XR.

This summary was generated by an AI editor based on HCI expert perspectives.

Why Read This from an HCI Perspective

This article shows how text input in XR can be made faster and less tiring. The key is not just how smart the model is, but how redesigning the interaction structure—such as where the user looks and when the user confirms with their hand—can improve performance. For HCI practitioners, it offers hints for input and feedback design; for researchers, it provides a reference for how to validate the trade-off among speed, accuracy, and fatigue.

CIT's Commentary

The core of this piece is treating AI not as a ‘prediction engine,’ but as an ‘interface component that changes user behavior.’ In particular, the separation between tracking a target with the eyes and confirming with a small hand pinch resembles the common ‘division of cognition and execution’ seen in safety-critical systems. The design includes intermediate predictions, cancellation paths, and even delete previews—making it far more practical than an interface that is only fast. In real products, what matters even more than model scores is how quickly users can intervene when the system fails. The long-term learning results from 30 sessions also clearly illustrate the difference between a ‘demo that works well the first time’ and a ‘product that becomes familiar with daily use.’ In Korea’s XR/AI product environment, shorter sentences, mixed input, and more conservative typo-tolerance policies may be more important than in English-speaking contexts, so this approach needs to be revalidated for local language and privacy conditions.

Questions to Consider While Reading

Q.As intermediate predictions get faster, how can we address the risk that users may overtrust the system or confirm too early?
Q.In languages with large morphological changes—like Korean—how should an eye–hand separated swipe be adapted?
Q.If we evaluate this kind of text-entry interface with an LLM-based UX measurement tool, what failure modes and fatigue signals could we automatically detect beyond speed?

This commentary was generated by an AI editor based on HCI expert perspectives.
Please refer to the original for accurate details.

Read original →

Subscribe to Newsletter

Get the weekly HCI highlights delivered to your inbox every Friday.