HiFiGaze: Improving Eye Tracking Accuracy Using Screen Content Knowledge

arXivMar 20, 2026Taejun Kim, Vimal Mollyn, Riku Arakawa, Chris Harrison2 views

View original →

HCI Today summarized the key points

Background

•HiFiGaze proposes a way to improve eye-tracking accuracy on standard devices by leveraging a high-resolution front camera and knowledge of what’s displayed on the screen.

Main Points

•Conventional gaze tracking requires specialized equipment, but this research suggests an approach that can work using the cameras on smartphones and laptops.
•The core is to separate and interpret the reflections of the screen that appear in the user’s eyes, together with the screen content, to estimate gaze targets reliably.
•In a user study, a model that adds reflection vectors to eye-region video achieved 1.64 cm—about 18% lower than the average error of 2.00 cm.

Conclusion

•Notably, when the camera is positioned at the bottom of the device, accuracy improves an additional 10–20%, bringing it close to practical levels without any separate hardware.

This summary was generated by an AI editor based on HCI expert perspectives.

Why Read This from an HCI Perspective

This is worth reading for HCI/UX practitioners and researchers because it demonstrates a new design possibility to boost high-precision gaze estimation ‘without additional hardware.’ It’s not just a paper that improves model performance; the key idea is that system-specific contextual information—namely the content on the screen—is converted into interaction signals. From the perspectives of mobile, accessibility, and ubiquitous interaction, it has substantial practical implications.

CIT's Commentary

From a CIT perspective, what’s interesting about HiFiGaze is that it redefines gaze estimation not as an isolated vision task, but as a contextual sensing problem shared across the ‘device–screen–user.’ In particular, the notion of using screen content knowledge to segment the reflection region directly addresses an information source that appearance-based approaches have historically overlooked. However, the performance gains strongly depend on conditions such as a 4K-class front camera, specific viewing distances/postures, and non-glasses (no-be-glasses) setups. So rather than viewing it as an immediately universal solution, it’s more appropriate to treat it as a benchmark showing how far gaze-based input can be made practical in next-generation mobile UX without peripheral devices. Going forward, validation must also cover privacy, the sensitivity of gaze data, and robustness under changes in content.

Questions to Consider While Reading

Q.When using screen content knowledge, can we quantitatively break down which types of UI elements or color contrasts contribute most to gaze estimation performance?
Q.If we include not only non-glasses users but also eyeglass wearers, reflective coatings, and a variety of lighting conditions, how much does performance degrade—and what design would compensate for it?
Q.For this approach to become genuinely useful in mobile UX, what tasks require centimeter-level error, and conversely, in which interactions would it already be sufficiently practical?

This commentary was generated by an AI editor based on HCI expert perspectives.
Please refer to the original for accurate details.

Read original →

Subscribe to Newsletter

Get the weekly HCI highlights delivered to your inbox every Friday.