Behavioral Engagement in VR-Based Sign Language Learning: Visual Attention as a Predictor of Performance and Temporal Dynamics

arXivMar 20, 2026Davide Traini, José Manuel Alcalde-Llergo, Mariana Buenestado-Fernández, Domenico Ursino, Enrique Yeguas-Bolívar0 views

View original →

HCI Today summarized the key points

Background

•This study analyzes whether behavioral engagement metrics in a VR-based sign language learning environment can predict learning performance.

Main Points

•After 117 college students trained with 12 sign language learning videos on SONAR, their performance was evaluated through a memory retention test.
•Visual Attention (VA) and Post-Playback Viewing Time (PPVT) showed strong positive correlations with performance, while Video Replay Frequency (VRF) was not related.
•In a Generalized Linear Model (GLM) as well, VA and PPVT were confirmed as significant predictors, explaining learning success substantially.

Conclusion

•Sustaining visual attention and strategically looking at what matters are core to VR sign language learning, and behavioral logs are useful for designing adaptive learning.

This summary was generated by an AI editor based on HCI expert perspectives.

Why Read This from an HCI Perspective

This article is highly meaningful for HCI/UX researchers because it quantitatively demonstrates not just ‘what works’ in VR-based learning, but ‘how learning happens.’ In particular, by linking gaze surrogate metrics based on head pose, replay frequency, and time spent after video playback to learning outcomes, it shows that engagement and performance can be estimated from interaction logs alone. Practically, it provides a foundation for designing adaptive feedback and learning analytics.

CIT's Commentary

From a CIT perspective, the key point is that this study treats VR learning experiences not merely as immersive content, but as learning systems that can interpret traces of interaction. The finding that VA strongly predicts performance leads to an interpretation that ‘appropriate allocation of attention’ matters more than simply ‘replaying more.’ This clearly illustrates the HCI-relevant issues of attention and task relevance. However, since head pose is used as a proxy for gaze, further validation is needed to determine the error relative to actual visual attention and whether the results can be generalized into reusable design guidelines. In particular, confirming whether the same patterns hold for DHH learners and across groups with different proficiency levels would strengthen the connection to design principles for adaptive VR tutoring.

Questions to Consider While Reading

Q.Could we validate how valid head-pose-based VA is compared with actual eye-tracking gaze, and how that error affects the prediction of learning performance?
Q.The result that VRF was not significant supports the interpretation that the quality of attention matters more than mere repetition—so in what situations might rewatching actually be beneficial?
Q.If we design adaptive feedback that intervenes only when the learner’s focus becomes scattered in real time, how could we determine the threshold that supports learning without excessive disruption?

This commentary was generated by an AI editor based on HCI expert perspectives.
Please refer to the original for accurate details.

Read original →

Subscribe to Newsletter

Get the weekly HCI highlights delivered to your inbox every Friday.