How Multimodal and Conversational AI Change Study Performance and Learning Experience

Impact of Multimodal and Conversational AI on Learning Outcomes and Experience

arXivApr 2, 2026Karan Taneja, Anjali Singh, Ashok K. Goel7 views

View original →

HCI Today summarized the key points

Background

•This article reports a study comparing how AI conversational tools affect learning outcomes and experience when learning textbook content.

Main Points

•The research team randomly assigned 124 participants to one of three approaches—MuDoC, TexDoC, and DocSearch—for biology learning.
•As a result, MuDoC, which presents text and images conversationally, produced the highest test scores and the most positive “feeling of learning.”
•TexDoC, which shows only text, felt easy to write with and enjoyable, but it had the lowest actual scores—so perceived understanding and grades did not match.

Conclusion

•The study shows that while conversational AI can help learning, having visual materials such as images alongside it enables deeper understanding and better learning.

This summary was generated by an AI editor based on HCI expert perspectives.

Why Read This from an HCI Perspective

This article does not simply ask whether “AI helps learning well.” Instead, it shows how conversational interfaces and combinations of images and text can change learning experiences and outcomes in different ways. In particular, it highlights that users may feel comfortable and that things seem good, yet actual performance can still be lower. For HCI and UX practitioners, it serves as a warning that a seemingly “good-looking experience” and “real help” may diverge. For researchers, it provides grounds to examine cognitive load and interaction design together.

CIT's Commentary

The most interesting part of this study is that a conversational AI with only text can give users the impression that it is “easier” and “more fun,” yet learning performance turns out to be the lowest. This suggests that it is not enough for the AI to become friendlier; the system needs mechanisms that help users actually understand concepts. A conversational interface can act as a guide, and showing images together with text provides a stepping stone that helps users hold concepts in their minds. However, when bringing this kind of design into a product, trade-offs may be required—such as response length, latency, and screen complexity. In other words, the key issue is not “a smarter model,” but designing “what the user sees at which moments, where they can check again, and when they can intervene.” The same principle applies not only to educational AI, but also to systems where safety is critical. If users can move on too easily, understanding may become shallow; therefore, appropriate friction and intervention pathways must be designed together.

Questions to Consider While Reading

Q.In a text-only conversational AI, why did users feel it was easier even though learning outcomes were lower?
Q.To make a format that shows images together with text help learning, what information should be visualized and what should remain as text?
Q.When applying these results to real products, how should the balance between usability and learning effectiveness be managed?

This commentary was generated by an AI editor based on HCI expert perspectives.
Please refer to the original for accurate details.

Read original →

Subscribe to Newsletter

Get the weekly HCI highlights delivered to your inbox every Friday.