How humans co-create with generative AI—ultimately, it comes down to “incentives”

Incentives shape how humans co-create with generative AI

arXivApr 4, 2026Nathanael Jo, Manish Raghavan1 views

View original →

HCI Today summarized the key points

Background

•This article studies how rewards affect the diversity of outcomes when writing with generative AI.

Main Points

•The research team had 200 participants write short stories and varied whether they used AI and how originality was rewarded.
•Although the AI drafts were similar to each other, people made many revisions and asked follow-up questions, greatly increasing the differences in the final texts.
•In particular, participants who received higher rewards for originality did not reuse AI sentences as-is; they selected them more carefully.

Conclusion

•In the end, how much AI makes writing identical depends not only on the technology, but also on how it’s used and the reward rules.

This summary was generated by an AI editor based on HCI expert perspectives.

Why Read This from an HCI Perspective

This article clearly shows that generative AI is not just a tool for producing “good answers.” The outcome depends on the interaction—how people use it, how far they trust it, and where they choose to intervene. In particular, even with the same AI, the diversity of creativity can change depending on the reward structure and usage strategy, which is a signal that UX design and evaluation criteria should be considered together. For practitioners, it’s a lesson that behavior design matters more than features. For researchers, it raises more realistic questions about how to measure human–AI collaboration.

CIT's Commentary

What’s interesting is that AI’s “homogenization” isn’t necessarily a fixed property of the model itself—it depends on what goals users have and how selectively they accept what the system offers. This resembles a phenomenon often seen in safety-critical systems. As automation becomes smarter, interfaces should actually make users’ intervention paths and failure modes more explicit. However, in real products, “desired diversity” and “fast completion” often conflict. That means you need to design the reward structure, how recommendations are expressed, and feedback pathways that make revisions easy. Also, the logs collected here can lead to research questions beyond simple performance evaluation—using LLMs to read user strategies and UX metrics more rigorously.

Questions to Consider While Reading

Q.How can we better separate whether the reason heavy AI users get more uniform results is due to model quality or due to the interface’s default settings?
Q.In real products, “originality” and “polish” are often both important at the same time. How should we design the incentive structure for these so user behavior doesn’t over-concentrate on one side?
Q.When using generative AI usage logs to measure diversity, degree of intervention, and trust, which metrics are most useful—and which metrics might distort users’ real experiences?

This commentary was generated by an AI editor based on HCI expert perspectives.
Please refer to the original for accurate details.

Read original →

Subscribe to Newsletter

Get the weekly HCI highlights delivered to your inbox every Friday.