I tried to make Gemini a UX researcher that audits websites, results were mixed
HCI Today summarized the key points
- •The article introduces a tool that checks websites using Gemini, Playwright, and Chromium, and simulates user scenarios.
- •The author says this combination is highly effective at getting an agent to explore a website.
- •However, the author also notes that it can easily veer onto the wrong path due to graphic rendering failures, excessive hover-and-click elements, and unclear next actions.
- •The author further explains that even if the latest model looks smarter, a single-step API key has limitations in terms of the tool calls and memory capacity needed for complex usability audits.
- •In conclusion, this free tool is generally useful for providing UX feedback, but because some hallucinations are mixed in, it should be referenced carefully.
This summary was generated by an AI editor based on HCI expert perspectives.
Why Read This from an HCI Perspective
This article is meaningful for both HCI practitioners and researchers because it demonstrates a workflow where an AI agent directly explores the web to perform a UX audit. In particular, it shows how well the Gemini + Playwright + Chromium combination can recreate real interaction contexts—and, conversely, under what conditions hallucinations or incorrect inferences occur. You can read it as a balanced look at both the promise and the limitations of automated evaluation.
CIT's Commentary
From a CIT perspective, the key point of this article isn’t whether AI can replace UX evaluation, but rather how far it can assist. Agent-based exploration is useful for quickly scanning screen elements, exploration depth, and repeated paths. However, it can easily lose context in flows with reduced visibility, hover-dependent interactions, or complex branching structures. In other words, the 70% of actionable feedback is valuable for initial diagnosis, but the 30% of hallucinations can actually lead to false positives in research design. Therefore, CIT considers it more appropriate to view such tools as a ‘heuristic screening + human verification’ pipeline rather than ‘evaluation automation.’ In particular, areas like accessibility, state transitions, and exception paths still require human evaluators’ involvement.
Questions to Consider While Reading
- Q.When people validate AI-based UX audit results, what sampling strategy is most efficient?
- Q.How can you improve the exploration reliability of an AI agent in interfaces with many hover interactions, asynchronous loading, and visual rendering failures?
- Q.If you combine these tools with heuristic evaluation or accessibility checks, how should you restructure the workflow in practice?
This commentary was generated by an AI editor based on HCI expert perspectives.
Please refer to the original for accurate details.
Subscribe to Newsletter
Get the weekly HCI highlights delivered to your inbox every Friday.