Home Selected Works Skills Publications Contact

Gamification and Social Dynamics in AI-Driven Applications

The Case of a Football Prediction App

Goal Increase user time spent within the app
Research Method Qualitative - Interview (semi-structured)
Participants 20 football fans
My Role Researcher
Qualitative Research User Interviews Gamification

Summary

To explore the impact of social and gamification mechanics on user engagement, a qualitative study was conducted on an AI-driven football prediction application. The primary goal was to increase user time spent within the app. A small-scale ethnographic study was performed with 20 highly engaged football fans to understand their habits, social interactions, and desires for a football match app.

Key Findings

  • Social Interaction as a Core Motivator: The research identified a strong desire among users for social interaction, particularly the kind of confrontational but playful communication known as "trash talk" with friends and rival fans.
  • Desire for Collectible Items: Participants expressed a keen interest in collecting "footballish" items, which they wanted to display as part of a personal collection.

This research provided clear, actionable insights into how to increase user stickiness. The findings highlighted that user engagement could be driven by leveraging a deeper understanding of fan culture and social dynamics, rather than focusing solely on the predictive accuracy of the AI.

Design Impact

Based on these findings, two key features were developed and integrated into the application. The implementation of these features resulted in a 20% increase in the average time users spent in the application.

  • Chat function was added at the end of each game, allowing users to engage in direct "trash talk" with other fans.
  • A new "collection page" was implemented where users could showcase virtual football-related items. This feature included two types of items: those available for purchase and exclusive items earned by achieving specific milestones, such as making five correct predictions.

Core Research Domains: A Synthesis of Findings

Student Attitudes Toward LLMs in Education

Goal Examine student attitudes toward LLMs in education
Research Method Quantitative - Survey
Participants 900 undergraduate students
My Role Lead Researcher
Quantitative Research Survey Design AI Literacy

Summary

To make a learning environment, it is important to know our audience, and what is their perception and we did this research to understand: This section examines student attitudes toward LLMs in education, synthesizing the findings from a survey of over 650 undergraduates. The findings provide a critical perspective on the "human factor" and how students perceive the benefits, risks, and their own abilities in an AI-assisted learning environment.

Key Findings

  • Perceived Benefits vs. Actual Reliance: A significant majority of students (67-72%) perceive LLMs as beneficial for comprehending complex material and completing coursework. However, a much smaller percentage (24-30%) expressed high reliance on LLMs for academic success.
  • Confidence in Accuracy Assessment: A large majority of students (76-77%) felt confident in their ability to assess the accuracy of LLM-generated information.
  • Correlations with Individual Traits: The study found significant correlations between personality traits, LLM usage frequency, and perceived benefits. It also investigated the role of gender, education level, and subscription status, although the provided snippets do not detail the specific nature of these correlations.

Impact on Design

The student's perception of their own ability to detect inaccuracies poses a potential risk. A design strategy for learning environments must move beyond student self-regulation and actively foster "AI literacy." To achieve this, educational design should integrate assignments that specifically focus on developing "AI literacy", a set of competencies that enables individuals to critically evaluate AI technologies and communicate with them effectively.

For example, a "Critical AI Analysis Assignment" could require students to use an AI to generate a response and then critique its accuracy, biases, and limitations while providing their own legitimate sources. This process encourages students to compare AI-generated arguments with their own, leading to a deeper understanding of the subject matter.

Furthermore, tools can be designed to directly address and correct overconfidence. One example is an experimental web application that presents students with summaries of historical figures, some of which are intentionally misleading. The tool asks students to determine if the summary is "True" or "False" and, regardless of their answer, it provides a pop-up that highlights the misleading information and provides a corrected paragraph. This method provides concrete, objective feedback that corrects misconceptions before they are memorized. Such tools can be built to measure against specific metrics, including a "hallucination index" to identify fabricated information and "relevance" to ensure the output is appropriate for the query.

The AI at the Center: Learning Style Classification and Bias Mitigation

Examining LLMs' Ability to Classify Learning Styles from Conversational Text

Goal Examine LLMs' ability to classify learning styles from conversational text
Research Method Quantitative - Experiment Design
Participants 270 undergraduate students
My Role Lead Researcher
Quantitative Research Experiment Design Bias Mitigation

Summary

This section synthesizes the findings from the study on LLMs' ability to classify learning styles from conversational text. This capability is foundational for any AI-powered personalized learning platform, as it underpins the ability to dynamically adapt content to individual student needs. It is critically important to ensure that such personalization is fair and equitable, providing a high-quality, customized education for all students, without inheriting or amplifying biases from the underlying technology.

Key Findings

  • LLM Performance by Dimension: GPT-40 and Gemini 2.0 flash demonstrated their strongest performance in the Sensing-Intuitive (SEN-INT) dimension (70% ACC for GPT, 73% for Gemini), followed by Sequential-Global (SEQ-GLO) (67% for GPT, 64% for Gemini). However, both models struggled significantly with the Active-Reflective (ACT-REF) dimension, with accuracy scores hovering near chance (58% for GPT, 53% for Gemini), and exhibited the weakest performance in the Visual-Verbal (VIS-VER) dimension (16% for GPT, 17% for Gemini).
  • Prompting Strategy Trade-offs: The research reveals a crucial nuance in prompting strategies. Few-shot prompting increased performance metrics in the Visual-Verbal and Sensing-Intuitive dimensions but simultaneously amplified existing majority-class biases towards Visual and Sensing labels. Conversely, for the Sequential-Global dimension, zero-shot prompting outperformed few-shot, suggesting that providing additional examples can sometimes intensify biases rather than mitigate them.
  • Persistent Bias: The study's most critical finding is the persistence of systematic biases toward dominant learning styles (Sensing, Sequential, Visual), which aligns with the observed demographic skew of the participant pool.

Design Impact

The effectiveness of LLM-based learner modeling is not a uniform or guaranteed outcome; it is highly dependent on the specific learning style dimension being classified and the prompting strategy employed. A one-size-fits-all approach to LLM prompting, such as a "more examples are better" method (few-shot), is shown to be an ineffective strategy in at least one dimension (Sequential-Global) and can actively worsen bias in others.

A product relying on this technology would provide great personalization for some users, while completely failing others who fall into the underperforming categories. This is an ethical and functional failure. The path to a better-performing model requires more than just careful prompting; it involves taking a pre-trained LLM and continuing its training on a targeted, labeled dataset of prompt-response pairs to refine its capabilities for a specific task or domain, like education.

This fine-tuning process, which requires defining the task and selecting an appropriate pre-trained model, allows for the adjustment of model weights to better fit specific problem requirements, potentially leading to improved accuracy and relevance. However, fine-tuning an advanced model is a complex process that can carry risks, including the "destructive overwriting" of existing knowledge and the amplification of biases present in the new training data. It is a delicate process that, if done correctly with high-quality, representative data and careful hyperparameter tuning, can result in a model that is better adapted to the nuanced demands of personalized education.