https://ojs.aaai.org/index.php/AIES/issue/feedProceedings of the AAAI/ACM Conference on AI, Ethics, and Society2025-10-15T06:05:54+00:00Open Journal Systems<p>AIES is convened each year by program co-chairs from Computer Science, Law and Policy, the Social Sciences, Ethics and Philosophy. Our goal is to encourage talented scholars in these and related fields to submit their best work related to morality, law, policy, psychology, the other social sciences, and AI. Papers are tailored for a multi-disciplinary audience without sacrificing excellence. In addition to<br />the community of scholars who have participated in these discussions from the outset, we want to explicitly welcome disciplinary experts who are newer to this topic, and see ways to break new ground in their own fields by thinking about AI. Recognizing that a multiplicity of perspectives leads to stronger science, the conference organizers actively welcome and encourage people with differing identities, expertise, backgrounds, beliefs, or experiences to participate.</p>https://ojs.aaai.org/index.php/AIES/article/view/36526$100,000 or the Robot Gets It! Tech Workers’ Resistance Guide: Tech Worker Actions, History, Risks, Impacts, and the Case for a Radical Flank2025-10-15T04:41:54+00:00Mohamed Abdallamabdall2@ualberta.caOver the past decade, Big Tech has faced increasing levels of worker activism. While worker actions have resulted in positive outcomes (e.g., cancellation of Google's Project Dragonfly), such successes have become increasingly infrequent. This is, in part, because corporations have adjusted their strategies to dealing with increased worker activism (e.g., increased retaliation against workers, and contracts clauses that prevent cancellation due to worker pressure). This change in company strategy prompts urgent questions about updating worker strategies for influencing corporate behavior in an industry with vast societal impact. Current discourse on tech worker activism often lacks empirical grounding regarding its scope, history, and strategic calculus. Our work seeks to bridge this gap by firstly conducting a systematic analysis of worker actions at Google and Microsoft reported in U.S. newspapers to delineate their characteristics. We then situate these actions within the long history of labour movements and demonstrate that, despite perceptions of radicalism, contemporary tech activism is comparatively moderate. Finally, we engage directly with current and former tech activists to provide a novel catalogue of potential worker actions, evaluating their perceived risks, impacts, and effectiveness (concurrently publishing "Tech Workers' Guide to Resistance"). Our findings highlight considerable variation in strategic thinking among activists themselves. We conclude by arguing that the establishment of a radical flank could increase the effectiveness of current movements. "Tech Workers' Guide to Resistance" can be found at https://www.cs.toronto.edu/~msa/TechWorkersResistanceGuide.pdf or https://doi.org/10.5281/zenodo.16779082 .2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36527Steerable Pluralism: Pluralistic Alignment via Few-Shot Comparative Regression2025-10-15T04:41:56+00:00Jadie Adamsjadie.adams@kitware.comBrian Hubrian.hu@kitware.comEmily Veenhuisemily.veenhuis@kitware.comDavid Joydavid.joy@kitware.comBharadwaj Ravichandranbharadwaj.ravichandran@kitware.comAaron Brayaaron.bray@kitware.comAnthony Hoogsanthony.hoogs@kitware.comArslan Basharatarslan.basharat@kitware.comLarge language models (LLMs) are currently aligned using techniques such as reinforcement learning from human feedback (RLHF). However, these methods use scalar rewards that can only reflect user preferences on average. Pluralistic alignment instead seeks to capture diverse user preferences across a set of attributes, moving beyond just helpfulness and harmlessness. Toward this end, we propose a steerable pluralistic model based on few-shot comparative regression that can adapt to individual user preferences. Our approach leverages in-context learning and reasoning, grounded in a set of fine-grained attributes, to compare response options and make aligned choices. To evaluate our algorithm, we also propose two new steerable pluralistic benchmarks by adapting the Moral Integrity Corpus (MIC) and the HelpSteer2 datasets, demonstrating the applicability of our approach to value-aligned decision-making and reward modeling, respectively. Our few-shot comparative regression approach is interpretable and compatible with different attributes and LLMs, while outperforming multiple baseline and state-of-the-art methods. Our work provides new insights and research directions in pluralistic alignment, enabling a more fair and representative use of LLMs and advancing the state-of-the-art in ethical AI.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36528Sound Check: Auditing Recent Audio Dataset Practices2025-10-15T04:41:57+00:00William Agnewwagnew@andrew.cmu.eduJulia Barnettjuliabarnett@u.northwestern.eduAnnie Chuanniechu@u.northwestern.eduRachel Honghongrach@cs.washington.eduMichael Feffermfeffer@andrew.cmu.eduRobin Netzorgrobert_netzorg@berkeley.eduHarry H. Jianghhj@andrew.cmu.eduEzra Awumeyeawumey@andrew.cmu.eduSauvik Dassauvik@cmu.eduAudio AI models are increasingly used for a broad range of applications including music and sound generation, text-to- speech (TTS), voice cloning, emotion analysis, transcription, and audio classification. However, we have little understanding of the datasets used to create audio AI models, a gap that leaves the field without a powerful tool for understanding potential biases, toxicity, copyright violations, and other ethical and performance issues. We conduct a mapping literature review of hundreds of audio datasets used in recent music, sound, and speech AI papers. We first assess the sourcing, size, and usage of these datasets, finding that while there are hundreds of audio datasets, few are widely used. Next, we identify nine representative datasets and conduct several analyses to understand bias, toxicity, representation, and quality. We find that these datasets are often biased against women, have stereotypes about marginalized communities, and contain significant amounts of copyrighted work. We also find that audio datasets often come with scant documentation. To address this gap, we extend Gebru’s datasheets for datasets to audio data, providing domain-specific documentation guidance. Finally, to facilitate public exploration of dataset contents and accountability, we developed an audio datasets exploration web tool which is available below in our links, along with our code and an extended version of our work including the appendix and augmented datasheets for datasets. Content warning: this paper contains discussions of offensive language.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36529Enhancing Image Comprehension: The Impact of AI-Generated Explanations on Perception of Altered and Synthetic Media2025-10-15T04:41:59+00:00Saquib Ahmedsaquiba1@umbc.eduTejo Gayathri Busireddytejogab1@umbc.eduSanorita Deysanorita@umbc.eduIn the digital era, the exponential growth of images and videos on social platforms has transformed how individuals perceive information and form opinions. However, the escalating prevalence of altered and synthetic visuals poses significant challenges to media trust. These altered visuals often mislead viewers, propagate confusion, and distort public perception. Social media algorithms, optimized for engagement, can inadvertently amplify the dissemination of such content, making simple tagging insufficient to distinguish authentic from altered visuals. Contextual explanations present a promising approach by offering audiences deeper insights and encouraging more informed interpretations. In this study, we developed contextual explanations for 15 altered and synthetic images and conducted a user study to evaluate their effectiveness. Our findings show that contextual explanations consistently outperformed non-contextual ones across all evaluated metrics. We also assessed the capability of large language models (LLMs) to generate these explanations for diverse audiences. While LLM-generated explanations were generally comparable to those created by human experts, the models exhibited limitations in conveying intrinsic motivations in complex scenarios. We conclude with a discussion of the design implications and ethical considerations of this work.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36530Too Focused on Accuracy to Notice the Fallout: Towards Socially Responsible Fake News Detection2025-10-15T04:42:00+00:00Esma Aïmeuraimeur@iro.umontreal.caGilles Brassardbrassard@iro.umontreal.caDorsaf Sallamidorsaf.sallami@umontreal.caThe rise of fake news is one of the most pressing threats to the digital public sphere. Artificial intelligence (AI) systems promise to fight it — but at what cost? Unlike other machine learning applications designed to optimize efficiency in low-stake domains, fake news detection operates at the core of democratic discourse, public trust and epistemic integrity. This paper begins by unpacking the core challenges that make fake news detection uniquely demanding. In response, we argue for a shift toward Socially Responsible AI (SRAI) as a more appropriate framework for addressing these complexities. We map the identified challenges onto the SRAI pyramid—functional, legal, ethical and philanthropic. Finally, we review emerging initiatives, highlight current limitations and propose future directions for developing fake news detection systems that are not only accurate but also socially accountable and publicly trustworthy.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36531An Investigation into Black and Brown Communities’ Engagement with Data & Technology2025-10-15T04:42:01+00:00Ebtesam Al-Haqueehaque4@gmu.eduGabriella Thompsongabriella.thompson@utexas.eduAngela D. R. Smithadrsmith@utexas.eduBrittany Johnsonjohnsonb@gmu.eduOver the years, we have witnessed significant biases in datasets and AI-driven systems. While these biases can impact anyone, there is a heightened risk for disproportionate harm to Black and Brown communities. Despite efforts to address these inequities, a critical gap remains in understanding how Black and Brown communities perceive and interact with data-centric innovations. In this paper, we present findings from a survey of 60 technology users with diverse racial, ethnic, and gender identities. Our findings reveal that while Black and Brown users may frequently contribute data through social media and research participation, discomfort arises when data is used without explicit consent, particularly by for-profit organizations. Transparency, trust, and familiarity with data collection entities and outcomes emerged as key factors influencing engagement. These insights inform our ongoing efforts, as well as the development of ethical, inclusive approaches to data-driven innovation that center marginalized voices and foster equitable outcomes.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36532Exploring “Just Noticeable” Group Fairness in Rankings2025-10-15T04:42:02+00:00Mallak Alkhathlanmalkhathlan@wpi.eduHilson Shresthahilsonshrestha@gmail.comLane Harrisonltharrison@wpi.eduElke Rundensteinerrundenst@wpi.eduThe plethora of fairness metrics developed for ranking-based decision-making raises the question which metrics align best with people’s perceptions of fairness, and why? Most prior studies examining people’s perceptions of fairness metrics tend to use ordinal rating scales (e.g., Likert scales). However, such scales can be ambiguous in their interpretation across participants and offer imprecise connections to specific interface features. We address this gap by adapting two-alternative forced choice methodologies—used extensively outside the fairness community for comparing visual stimuli—to quantitatively compare participant perceptions, fairness metrics, and ranking characteristics. We report a crowdsourced experiment with 224 participants across four conditions: two popular rank fairness metrics—ARP and NDKL—and two ranking characteristics—lists of 20 and 100 candidates—resulting in over 170,000 individual judgments. Our quantitative results show systematic patterns of differences in the metrics, as well as surprising exceptions where fairness metrics disagree with people’s perceptions. Our qualitative analysis reveals an interplay between cognitive and visual strategies that affects people’s perceptions of fairness. From these results, we discuss future work in aligning fairness metrics with people’s perceptions, and highlight the need and benefits of expanding methodologies for fairness studies.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36533Informing AI Risk Assessment with News Media: Analyzing National and Political Variation in the Coverage of AI Risks2025-10-15T04:42:04+00:00Mowafak Allahammowafakallaham2021@u.northwestern.eduKimon Kieslichk.kieslich@uva.nlNicholas Diakopoulosnad@northwestern.eduRisk-based approaches to AI governance often center the technological artifact as the primary focus of risk assessments, overlooking systemic risks that emerge from the complex interaction between AI systems and society. One potential source to incorporate more societal context into these approaches is the news media, as it embeds and reflects complex interactions between AI systems, human stakeholders, and the larger society. News media is influential in terms of which AI risks are emphasized and discussed in the public sphere, and thus which risks are deemed important. Yet, variations in the news media between countries and across different value systems (e.g. political orientations) may differentially shape the prioritization of risks through the media's agenda setting and framing processes. To better understand these variations, this work presents a comparative analysis of a cross-national sample of news media spanning 6 countries (the U.S., the U.K., India, Australia, Israel, and South Africa). Our findings show that AI risks are prioritized differently across nations and shed light on how left vs. right leaning U.S. based outlets not only differ in the prioritization of AI risks in their coverage, but also use politicized language in the reporting of these risks. These findings can inform risk assessors and policy-makers about the nuances they should account for when considering news media as a supplementary source for risk-based governance approaches.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36534Ten Insights from Other Domains That Inform Responsible AI Frameworks2025-10-15T04:42:05+00:00Dunstan Allison-Hopedunstan@dunstanhope.comPatrick Gage Kelleypatrickgage@gmail.comReena Janarmjana@google.comAngela McKayangelamckay@microsoft.comAllison Woodruffwoodruff@acm.orgAs AI rapidly evolves, so too do the guidelines, principles, best practices, standards, and regulations that seek to ensure the responsible development and use of AI systems. This article outlines ten insights from other domains that responsible AI frameworks can draw upon. We highlight the importance of using well-established international human rights standards and emphasize the value of tailoring risk assessment methodologies to suit the AI context, deploying system-wide strategies, and undertaking meaningful and effective stakeholder engagement, amongst other durable learnings. Through a mix of continuity and adaptation of frameworks in other domains—and knowing how and when to deploy each—we chart a more practical, policy-based path that supports the responsible design, development, deployment, and use of AI systems.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36535A Comprehensive Evaluation of the Sensitivity of Density-Ratio Estimation Based Fairness Measurement in Regression2025-10-15T04:42:07+00:00Abdalwahab Almajedanalmajed@iau.edu.saMaryam Tabarmaryam.tabar@utsa.eduPeyman Najafiradpeyman.najafirad@utsa.eduThe prevalence of algorithmic bias in Machine Learning (ML)-driven approaches has inspired growing research on measuring and mitigating bias in the ML domain. Accordingly, prior research studied how to measure fairness in regression which is a complex problem. In particular, recent research proposed to formulate it as a density-ratio estimation problem and relied on a Logistic Regression-driven probabilistic classifier-based approach to solve it. However, there are several other methods to estimate a density ratio, and to the best of our knowledge, prior work did not study the sensitivity of such fairness measurement methods to the choice of underlying density ratio estimation algorithm. To fill this gap, this paper develops a set of fairness measurement methods with various density-ratio estimation cores and thoroughly investigates how different cores would affect the achieved level of fairness. Our experimental results show that the choice of density-ratio estimation core could significantly affect the outcome of fairness measurement method, and even, generate inconsistent results with respect to the relative fairness of various algorithms. These observations suggest major issues with density-ratio estimation based fairness measurement in regression and a need for further research to enhance their reliability.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36536Adaptive Accountability in Networked Multi-Agent Systems2025-10-15T04:42:08+00:00Saad Alqithamialqithami@gmail.comIn multi-agent systems, emergent norms and distributed decision-making often produce unanticipated behaviors that complicate traditional AI governance frameworks. This paper introduces an adaptive accountability method that traces responsibility flows among networked agents, continuously detects adverse emergent norms, and intervenes to recalibrate local objectives or policies in near real time. By combining lifecycle-based auditing, decentralized governance, and norm detection algorithms, our approach enables robust oversight in dynamic, evolving environments. To validate its scalability and effectiveness, we conduct a series of large-scale simulation experiments on up to 100 agents using an HPC environment. Our ablation studies—covering multiple seeds, varied penalty settings, and different intervention policies—demonstrate that the framework can preserve high collective reward while significantly reducing inequality. In particular, we show that adaptive interventions prevent harmful collusion or hoarding in over 90% of tested configurations, even under partial observability. These results indicate that our method not only mitigates unforeseen disruptions but also aligns agent behaviors with ethical and legal guidelines at scale. Overall, the resulting framework offers a practical path toward ethically sound, multi-agent AI systems that remain responsive to shifting data distributions, organizational policies, and real-world complexity.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36537Machine Learning and Public Health: Identifying and Mitigating Algorithmic Bias Through a Systematic Review2025-10-15T04:42:09+00:00Sara Altamiranos.e.altamirano@uva.nlArjan Vreekena.vreeken@uva.nlSennay Ghebreabs.ghebreab@uva.nlMachine learning (ML) promises to revolutionize public health through improved surveillance, risk stratification, and resource allocation. However, without systematic attention to algorithmic bias, ML may inadvertently reinforce existing health disparities. We present a systematic literature review of algorithmic bias identification, discussion, and reporting in Dutch public health ML research from 2021 to 2025. To this end, we developed the Risk of Algorithmic Bias Assessment Tool (RABAT) by integrating elements from established frameworks (Cochrane Risk of Bias, PROBAST, Microsoft Responsible AI checklist) and applied it to 35 peer-reviewed studies. Our analysis reveals pervasive gaps: although data sampling and missing data practices are well documented, most studies omit explicit fairness framing, subgroup analyses, and transparent discussion of potential harms. In response, we introduce a four-stage fairness-oriented framework called ACAR (Awareness, Conceptualization, Application, Reporting), with guiding questions derived from our systematic literature review to help researchers address fairness across the ML lifecycle. We conclude with actionable recommendations for public health ML practitioners to consistently consider algorithmic bias and foster transparency, ensuring that algorithmic innovations advance health equity rather than undermine it.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36538Disciplinary Practices in the Generation of Text Synthetic Data: A Critical Discourse Analysis2025-10-15T04:42:10+00:00Adriana Alvarado Garciaadriana.ag@ibm.comNishanshi Atulkumar Shuklanishanshi.shukla@utdallas.eduMuneeza Azmatmuneeza.azmat@ibm.comMarisol Wong-Villacreslvillacr@espol.edu.ecSynthetic data has emerged as an alternative or supplement to human-generated data, driven by several underlying assumptions that motivate its growing adoption among practitioners. These include the promise of increased efficiency by reducing the cost, time, and human labor involved in data collection and labeling, which is expected to potentially overcome data scarcity. Thus, as synthetic data becomes increasingly adopted to alleviate the data needs for Large Language Model development, it is critical to understand better the surrounding discourses and practices associated with their creation. We conducted a Critical Discourse Analysis on a corpus of 52 research articles from the Artificial Intelligence and Computational Linguistics conferences. As a result of our analysis, we identify three recurring disciplinary practices in establishing and reinforcing Cultural Scarcity and propose a set of recommendations to counteract it.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36539Toward A Causal Framework for Modeling Perception2025-10-15T04:42:11+00:00Jose M. Alvarezjosemanuel.alvarez@kuleuven.beSalvatore Ruggierisalvatore.ruggieri@unipi.itPerception occurs when individuals interpret the same information differently. It is a known cognitive phenomenon with implications for bias in human decision-making. Perception, however, remains understudied in machine learning (ML). This is problematic as modern decision flows, whether partially or fully automated by ML applications, always involve human experts. For instance, how might we account for cases in which two experts interpret differently the same deferred instance or explanation from a ML model? Addressing this and similar questions requires first a formulation of perception, particularly, in a manner that integrates with ML-enabled decision flows. In this work, we present a first approach to modeling perception causally. We define perception under causal reasoning using structural causal models (SCMs). Our approach formalizes individual experience as additional causal knowledge that comes with and is used by the expert decision-maker in the form of a SCM. We define two kinds of probabilistic causal perception: structural and parametrical. We showcase our framework through a series of examples of modern decision flows. We also emphasize the importance of addressing perception in fair ML, discussing relevant fairness implications and possible applications.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36540Model Misalignment and Language Change: Traces of AI-Associated Language in Unscripted Spoken English2025-10-15T04:42:13+00:00Bryce Andersonba24a@fsu.eduRiley Galpinriley.p.galpin@gmail.comTom S. Juzektjuzek@fsu.eduIn recent years, written language, particularly in science and education, has undergone remarkable shifts in word usage. These changes are widely attributed to the growing influence of Large Language Models (LLMs), which frequently rely on a distinct lexical style. Divergences between model output and target audience norms can be viewed as a form of misalignment. While these shifts are often linked to using Artificial Intelligence (AI) directly as a tool to generate text, it remains unclear whether the changes reflect broader changes in the human language system itself. To explore this question, we constructed a dataset of 22.1 million words from unscripted spoken language drawn from conversational science and technology podcasts. We analyzed lexical trends before and after ChatGPT's release in 2022, focusing on commonly LLM-associated words. Our results show a moderate yet significant increase in the usage of these words post-2022, suggesting a convergence between human word choices and LLM-associated patterns. In contrast, baseline synonym words exhibit no significant directional shift. Given the short time frame and the number of words affected, this may indicate the onset of a remarkable shift in language use. Whether this represents natural language change or a novel shift driven by AI exposure remains an open question. Similarly, although the shifts may stem from broader adoption patterns, it may also be that upstream training misalignments ultimately contribute to changes in human language use. These findings parallel ethical concerns that misaligned models may shape social and moral beliefs.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36541Evaluating Goal Drift in Language Model Agents2025-10-15T04:42:14+00:00Rauno Arikerauno.arike@gmail.comElizabeth Donowayelizabeth.donoway@berkeley.eduHenning Bartschhenning@matsprogram.orgMarius Hobbhahnmarius@apolloresearch.aiAs language models (LMs) are increasingly deployed as autonomous agents, their robust adherence to human-assigned objectives becomes crucial for safe operation. When these agents operate independently for extended periods without human oversight, even initially well-specified goals may gradually shift. Detecting and measuring goal drift - an agent's tendency to deviate from its original objective over time - presents significant challenges, as goals can shift gradually, causing only subtle behavioral changes. This paper proposes a novel approach to analyzing goal drift in LM agents. In our experiments, agents are first explicitly given a goal through their system prompt, then exposed to competing objectives through environmental pressures. We demonstrate that while the best-performing agent (a scaffolded version of Claude 3.5 Sonnet) maintains nearly perfect goal adherence for more than 100,000 tokens in our most difficult evaluation setting, all evaluated models exhibit some degree of goal drift. We also find that goal drift correlates with models' increasing susceptibility to pattern-matching behaviors as the context length grows.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36542Learning to Unlearn, Failing to Forget? Assessing Machine Unlearning Through Ethics and Epistemology2025-10-15T04:42:16+00:00Iqra Aslamiqra.aslam@stud.uni-hannover.deDonal Khosrowidonal.khosrowi@philos.uni-hannover.deRahul Nagshirahul.nt@outlook.comMachine Unlearning (MU) aims to remove the influence of unwanted data from trained AI models, driven by ethical/legal concerns like privacy (e.g., the Right to be Forgotten), bias mitigation, security, and copyright protection. This paper critically examines MU, arguing that it is currently unclear whether its technical methods and ethical goals are suitably aligned. Currently, important questions around what MU does, what it should do, and how its efforts align with stakeholder needs remain unaddressed. Drawing on insights from social epistemology and the ethics of forgetting, the paper makes progress in clarifying what MU is and whether it aligns with the relevant goals. It does so by distinguishing three different senses of unlearning that vary in regard to what stakeholder needs they can cater to. Building upon cases regarding copyright and data privacy, the paper highlights potential alignment gaps between MU’s methods and its wider goals, and emphasizes the need for more concrete guidelines to assess MU’s effectiveness, clearer ethical foundations, and improved stakeholder engagement.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36543Sacred or Synthetic? Evaluating LLM Reliability and Abstention for Religious Questions2025-10-15T04:42:17+00:00Farah Atiffarah.atif@mbzuai.ac.aeNursultan Askarbekulyn.askarbekuly@innopolis.universityKareem Darwishkadarwish@hbku.edu.qaMonojit Choudhurymonojit.choudhury@mbzuai.ac.aeDespite the increasing usage of Large Language Models (LLMs) in answering in questions in a variety of domains, their reliability and accuracy remain unexamined for a a plethora of domains, including the religious domains. In this paper, we introduce a novel benchmark FiqhQA focused on the LLM generated Islamic rulings explicitly categorized by the four major Sunni schools of thought, in both Arabic and English. Unlike prior work, which either overlooks the distinctions between religious schools of thought or fails to evaluate abstention behavior, we assess LLMs not only on their accuracy but also on their ability to recognize when not to answer. Our zero-shot and abstention experiments reveal significant variation across LLMs, languages, and legal schools of thought. While GPT-4o outperforms all other models in accuracy, Gemini and Fanar demonstrate superior abstention behavior critical for minimizing confident incorrect answers. Notably, all models exhibit a performance drop in Arabic, highlighting the limitations in religious reasoning for languages other than English. To the best of our knowledge, this is the first study to benchmark the efficacy of LLMs for fine-grained Islamic school of thought specific ruling generation and to evaluate abstention for Islamic jurisprudence queries. Our findings underscore the need for task-specific evaluation and cautious deployment of LLMs in religious applications.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36544The Disparate Effects of Partial Information in Bayesian Strategic Learning2025-10-15T04:42:18+00:00Srikanth Avasaralasavasarala9@gatech.eduSerena Wangserenalwang@g.harvard.eduJuba Zianijziani3@gatech.eduWe study how partial information about scoring rules affects fairness in strategic learning settings. In strategic learning, a learner deploys a scoring rule, and agents respond strategically by modifying their features---at some cost--—to improve their outcomes. However, in our work, agents do not observe the scoring rule directly; instead, they receive a noisy signal of said rule. We consider two different agent models: (i) naive agents, who take the noisy signal at face value, and (ii) Bayesian agents, who update a prior belief based on the signal. Our goal is to understand how disparities in outcomes arise between groups that differ in their costs of feature modification, and how these disparities vary with the level of transparency of the learner's rule. For naive agents, we show that utility disparities can grow unboundedly with noise, and that the group with lower costs can, perhaps counter-intuitively, be disproportionately harmed under limited transparency. In contrast, for Bayesian agents, disparities remain bounded. We provide a full characterization of disparities across groups as a function of the level of transparency and show that they can vary non-monotonically with noise; in particular, disparities are often minimized at intermediate levels of transparency. Finally, we extend our analysis to settings where groups differ not only in cost, but also in prior beliefs, and study how this asymmetry influences fairness.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36545AI-OCI: A Novel Framework for Assessing AI’s Workforce Impact Using LLMs2025-10-15T04:42:19+00:00Frederick Awuah-Gyasifawuahgyasi5@unm.eduTrilce Estradatrilce@unm.eduWe introduce the AI Occupational Capability Index (AI-OCI), a novel methodology for quantifying the alignment between AI model capabilities and the tasks that define human occupations. Unlike prior automation risk metrics, which rely on expert heuristics or job-level generalizations, AI-OCI operates at the task level by embedding and comparing over 19,000 occupational tasks with 338 AI capabilities using state-of-the-art language models. The resulting scores reveal how well AI systems can perform specific human functions, enabling interpretable, task-aligned assessments of labor exposure. Empirical evaluations show strong correlations with benchmark indices such as AIOE and GPT-4 Beta exposure scores, while diverging from legacy automation risk measures. We demonstrate AI-OCI’s utility through case-based analyses of employment and wage shifts across high-alignment occupations during the era of large language model adoption. The framework supports scalable, real-time tracking of AI’s workforce impact and provides a foundation for integrating labor intelligence into education, policy, and economic planning.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36546Who Owns the Robot?: Four Ethical and Socio-Technical Questions About Wellbeing Robots in the Real World Through Community Engagement2025-10-15T04:42:21+00:00Minja Axelssonminjaaxelsson@gmail.comJiaee Cheongjc2208@cam.ac.ukRune Nyruprune.nyrup@css.au.dkHatice Guneshatice.gunes@cl.cam.ac.ukRecent studies indicated that robotic coaches can play a crucial role in promoting wellbeing. However, the real-world deployment of wellbeing robots raises numerous ethical and socio-technical questions and concerns. To explore these questions, we undertake a community-centered investigation to examine three different communities' perspectives on the ethical questions related to using robotic wellbeing coaches in real-world environments. We frame our work as an anticipatory ethical investigation, which we undertake to better inform the development of robotic technologies with communities' opinions, with the ultimate goal of aligning robot development with public interest. In our study, we conducted interviews and workshops with three communities who are under-represented in robotics development: 1) members of the public at a science festival, 2) women computer scientists at a conference, and 3) humanities researchers interested in history and philosophy of science. In the workshops, we collected qualitative data by using the Social Robot Co-Design Canvas on Ethics, which participants filled in individually. We used this tool as it is designed to investigate ethical issues of robots with multiple stakeholders. We analysed the collected qualitative data with Thematic Analysis, informed by notes we took during the workshops. Through our analysis, we identify four themes regarding key ethical and socio-technical questions about the real-world use of wellbeing robots. We group participants' insights and discussions around these broad thematic questions, discuss them in light of state-of-the-art literature, and highlight areas for future investigation. Finally, we provide the four questions as a broad framework that roboticists can and should use during robotic development and deployment, in order to reflect on the ethics and socio-technical dimensions of their robotic applications, and to engage in dialogue with communities of robot users. The four questions are: 1) Is the robot safe and how can we know that?, 2) Who is the robot built for and with?, 3) Who owns the robot and the data?, and 4) Why a robot?.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36547A Mathematical Philosophy of Explanations in Mechanistic Interpretability2025-10-15T04:42:22+00:00Kola Ayonrindekoayon@gmail.comLouis Jaburilouis.yodj@gmail.comMechanistic Interpretability aims to understand neural net- works through causal explanations. We argue for the Explanatory View Hypothesis: that Mechanistic Interpretability re- search is a principled approach to understanding models be- cause neural networks contain implicit explanations which can be extracted and understood. We hence show that Explanatory Faithfulness, an assessment of how well an explanation fits a model, is well-defined. We propose a definition of Mechanistic Interpretability (MI) as the practice of producing Model-level, Ontic, Causal-Mechanistic, and Falsifiable explanations of neural networks, allowing us to distinguish MI from other interpretability paradigms and detail MI’s inherent limits. We formulate the Principle of Explanatory Optimism, a conjecture which we argue is a necessary precondition for the success of Mechanistic Interpretability.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36548Accountability Framework for Healthcare AI Systems: Towards Joint Accountability in Decision Making2025-10-15T04:42:23+00:00Prachi Bagavep.bagave@tudelft.nlMarcus Westbergm.westberg@tudelft.nlMarijn Janssenm.f.w.h.a.janssen@tudelft.nlAaron Yi Dingaaron.ding@tudelft.nlAI is transforming the healthcare domain and is increasingly helping practitioners to make health-related decisions. Therefore, accountability becomes a crucial concern for critical AI-driven decisions. Although regulatory bodies, such as the EU Commission, provide guidelines, they are highlevel and focus on the “what” that should be done and less on the “how”, creating a knowledge gap for actors. Through an extensive analysis, we found that the term accountability is perceived and dealt with in many different ways, depending on the actor’s expertise and domain of work. With increasing concerns about AI accountability issues and the ambiguity around this term, this paper bridges the gap between the “what” and “how” of AI accountability, specifically for AI systems in healthcare. We do this by analysing the concept of accountability, formulating an accountability framework, and providing a three-tier structure for handling various accountability mechanisms. Our accountability framework positions the regulations of healthcare AI systems and the mechanisms adopted by the actors under a consistent accountability regime. Moreover, the three-tier structure guides the actors of the healthcare AI system to categorise the mechanisms based on their conduct. Through our framework, we advocate that decision-making in healthcare AI holds shared dependencies, where accountability should be dealt with jointly and should foster collaborations. We highlight the role of explainability in instigating communication and information sharing between the actors to further facilitate the collaborative process.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36549Disaggregated Health Data in LLMs: Evaluating Data Equity in the Context of Asian American Representation2025-10-15T04:42:25+00:00Uvini Balasuriya Mudiyanselageubalasur@asu.eduBharat Jayprakashbjayprak@asu.eduKookjin Leekookjin.lee@asu.eduK. Hazel Kwonkhkwon@asu.eduLarge language models (LLMs), such as ChatGPT and Claude, have emerged as essential tools for information retrieval, often serving as alternatives to traditional search engines. However, ensuring that these models provide accurate and equitable information tailored to diverse demographic groups remains an important challenge. This study investigates the capability of LLMs to retrieve disaggregated health-related information for sub-ethnic groups within the Asian American population, such as Korean and Chinese communities. Data disaggregation has been a critical practice in health research to address inequities, making it an ideal domain for evaluating representation equity in LLM outputs. We apply a suite of statistical and machine learning tools to assess whether LLMs deliver appropriately disaggregated and equitable information. By focusing on Asian American sub-ethnic groups—a highly diverse population often aggregated in traditional analyses—we highlight how LLMs handle complex disparities in health data. Our findings contribute to ongoing discussions about responsible AI, particularly in ensuring data equity in the outputs of LLM-based systems.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36550A Critical Look at a Critical Care Dataset: MIMIC-IV's Construction, Contents, & Consequences2025-10-15T04:42:26+00:00Pınar Barlaspbarlas3@uwo.caMIMIC (Medical Information Mart for Intensive Care) is one of the largest, most commonly-used, freely available datasets containing intensive care unit data. I conduct denotative, connotative, and deconstructive readings of the MIMIC-IV dataset through an analysis of the data sources, dataset structure, and the process for getting access to the data, as well as documents and concepts related to the dataset. As a result, I demonstrate that the MIMIC-IV dataset requires more documentation, including an expansion of the existing descriptions, in order to ensure the data is used appropriately and allow for maximum benefit. I make recommendations for future users of the MIMIC-IV dataset, creators of datasets in general, and researchers in the Critical Data Studies field based on my findings.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36551Scenarios in Computing Research: A Systematic Review of the Use of Scenario Methods for Exploring the Future of Computing Technologies in Society2025-10-15T04:42:27+00:00Julia Barnettjuliabarnett2026@u.northwestern.eduKimon Kieslichk.kieslich@uva.nlJasmine Sinchaijasmine.sinchai@gmail.comNicholas Diakopoulosnicholas.diakopoulos@gmail.comScenario building is an established method to anticipate the future of emerging technologies. Its primary goal is to use narratives to map future trajectories of technology development and sociotechnical adoption. Following this process, risks and benefits can be identified early on, and strategies can be developed that strive for desirable futures. In recent years, computer science has adopted this method and applied it to various technologies, including Artificial Intelligence (AI). Because computing technologies play such an important role in shaping modern societies, it is worth exploring how scenarios are being used as an anticipatory tool in the field---and what possible traditional uses of scenarios are not yet covered but have the potential to enrich the field. We address this gap by conducting a systematic literature review on the use of scenario building methods in computer science over the last decade (n=59). We guide the review along two main questions. First, we aim to uncover how scenarios are used in computing literature, focusing especially on the rationale for why scenarios are used. Second, in following the potential of scenario building to enhance inclusivity in research, we dive deeper into the participatory element of the existing scenario building literature in computer science.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36552GermanPartiesQA: Benchmarking Commercial Large Language Models and AI Companions for Political Alignment and Sycophancy2025-10-15T04:42:28+00:00Jan Batznerjan.batzner@gmail.comVolker Stockervolker.stocker@weizenbaum-institut.deStefan Schmidstefan.schmid@tu-berlin.deGjergji Kasnecigjergji.kasneci@tum.deLarge language models (LLMs) are increasingly shaping citizens’ information ecosystems. Products incorporating LLMs, such as chatbots and AI Companions, are now widely used for decision support and information retrieval, including in sensitive domains, raising concerns about hidden biases and growing potential to shape individual decisions and public opinion. This paper introduces GermanPartiesQA, a benchmark of 418 political statements from German Voting Advice Applications across 11 elections to evaluate six commercial LLMs. We evaluate their political alignment based on role-playing experiments with political personas. Our evaluation reveals three specific findings: (1) Factual limitations: LLMs show limited ability to accurately generate factual party positions, particularly for centrist parties. (2) Model-specific ideological alignment: We identify consistent alignment patterns and degree of political steerability for each model across temperature settings and experiments. (3) Claim of sycophancy: While models adjust to political personas during role-play, we find this reflects persona-based steerability rather than the increasingly popular, yet contested concept of sycophancy. Our study contributes to evaluating the political alignment of closed-source LLMs that are increasingly embedded in electoral decision support tools and AI Companion chatbots.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36553Whose Personae? Synthetic Persona Experiments in LLM Research and Pathways to Transparency2025-10-15T04:42:30+00:00Jan Batznerjan.batzner@gmail.comVolker Stockervolker.stocker@weizenbaum-institute.deBingjun Tangbt2637@columbia.eduAnusha Natarajanan3244@columbia.eduQinhao Chenqc2354@columbia.eduStefan Schmidschmiste@gmail.comGjergji Kasnecigjergji.kasneci@tum.deSynthetic personae experiments have become a prominent method in Large Language Model alignment research, yet the representativeness and ecological validity of these personae vary considerably between studies. Through a review of 63 peer-reviewed studies published between 2023 and 2025 in leading NLP and AI venues, we reveal a critical gap: task and population of interest are often underspecified in persona-based experiments, despite personalization being fundamentally dependent on these criteria. Our analysis shows substantial differences in user representation, with most studies focusing on limited sociodemographic attributes and only 35% discussing the representativeness of their LLM personae. Based on our findings, we introduce a persona transparency checklist that emphasizes representative sampling, explicit grounding in empirical data, and enhanced ecological validity. Our work provides both a comprehensive assessment of current practices and practical guidelines to improve the rigor and ecological validity of persona-based evaluations in language model alignment research.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36554Aggregation Problems in Machine Ethics and AI Alignment2025-10-15T04:42:31+00:00Kevin Baumkevin.baum@dfki.deMarija Slavkovikmarija.slavkovik@uib.noArtificial agents increasingly make decisions with far-reaching consequences. It is therefore imperative to ensure that their actions are not only functionally effective but also normatively appropriate. Two major paradigms address this challenge: machine ethics and value alignment. Machine ethics typically engages in \textit{moral} aggregation, especially through value and (descriptive) uncertainty aggregation. Value alignment approaches tend to rely on \textit{social} aggregation to manage value pluralism and moral uncertainty, often implicitly or indirectly. This paper disentangles these forms of aggregation and analyzes their roles across three stages of machine moral reasoning: moral evaluation, moral assessment, and moral decision. Rather than favoring one paradigm, we expose their mutual dependencies and respective blind spots, particularly under conditions of persistent moral disagreement. We argue that social aggregation cannot bypass deep normative commitments. Alignment by social aggregation cannot replace moral aggregation but merely relocates it---often opaquely.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36555No Such Thing as Free Brain Time: For a Pigouvian Tax on Attention Capture2025-10-15T04:42:32+00:00Hamza Belgrounhamza.belgroun31@gmail.comFranck Michelfranck.michel@inria.frFabien Gandonfabien.gandon@inria.frIn our age of digital platforms, human attention has become a scarce and highly valuable resource, rivalrous, tradable, and increasingly subject to market dynamics. This article explores the commodification of attention within the framework of the attention economy, arguing that attention should be understood as a common good threatened by over-exploitation. Drawing from philosophical, economic, and legal perspectives, we first conceptualize attention not only as an individual cognitive process but as a collective and infrastructural phenomenon susceptible to enclosure by digital intermediaries. We then identify and analyze negative externalities of the attention economy, particularly those stemming from excessive screen time: diminished individual agency, adverse health outcomes, and societal and political harms, including democratic erosion and inequality. These harms are largely unpriced by market actors and constitute a significant market failure. In response, among a spectrum of public policy tools ranging from informational campaigns to outright restrictions, we propose a Pigouvian tax on attention capture as a promising regulatory instrument to internalize the externalities and, in particular, the social cost of compulsive digital engagement. Such a tax would incentivize structural changes in platform design while preserving user autonomy. By reclaiming attention as a shared resource vital to human agency, health, and democracy, this article contributes a novel economic and policy lens to the debate on digital regulation. Ultimately, this article advocates for a paradigm shift: from treating attention as a private, monetizable asset to protecting it as a collective resource vital for humanity.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36556What’s Individual About Individual Fairness?2025-10-15T04:42:34+00:00Shai Ben-Davidshai@uwaterloo.caPascale Gourdeaupascale.gourdeau@vectorinstitute.aiTosca Lechnertosca.lechner@vectorinstitute.aiRuth Urnerruth@eecs.yorku.caIndividual and group fairness notions abound in the machine learning literature. Each attempts to formalize harm against individuals or groups of people. In this work, we take a step back and aim to characterize, from a learning theory perspective, what is at the heart of individual fairness (IF) notions. We argue that fairness notions should be comparison-based and, in the case of IF notions, that any failure to be fair should give rise to finite evidence of unfairness. We also posit that IF notions should have an unfairness ``direction'', for example via an order on the set of potential decisions. Equipped with this framework, we present various ways unfair classifiers can be compared to each other. Comparing classifiers is essential in any situation where there is a need to choose between not-perfectly-fair classifiers, e.g., in cases where there exist unavoidable trade-offs between learning objectives. We then adapt score-based measures of individual unfairness to allow us to measure how harm is distributed between population subgroups, which is more in line with group fairness. Crucially, our set-up retains evidence of harm at the individual level, allowing for algorithmic recourse, or potential integrations within legal frameworks.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36557Centring the Margins: Mapping AI Systems as Systems of Power2025-10-15T04:42:35+00:00Garfield Benjamingarfield.benjamin@ice.cam.ac.ukThis paper introduces a method of critically mapping AI as an assemblage of social relations. This approach is rooted in the principles of centring the margins and highlighting power structures. Viewing an AI or algorithmic system as a wide-reaching network of social relations, including their impacts on different groups and contexts, enables consideration of how AI is embedded within different discourses and domains of power while emphasising the impact on those most affected. The paper provides a discussion of the critical framing of the project, the principles, processes and templates for mapping AI in this way, and three examples of algorithmic systems in public sector contexts that have been discontinued, are in use now or are being proposed for future use. A discussion of uses and limitations is provided, situating the method beyond a descriptive or analytical tool towards a critical approach to identifying locations for intervention in harmful or unjust uses of algorithmic systems.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36558The Term ‘Agent’ Has Been Diluted Beyond Utility and Requires Redefinition2025-10-15T04:42:36+00:00Brinnae Bentbrinnae.bent@duke.eduThe term ‘agent’ in artificial intelligence has long carried multiple interpretations across different subfields. Recent developments in AI capabilities, particularly in large language model systems, have amplified this ambiguity, creating significant challenges in research communication, system evaluation and reproducibility, and policy development. This paper argues that the term ‘agent’ requires redefinition. Drawing from historical analysis and contemporary usage patterns, we propose a framework that defines clear minimum requirements for a system to be considered an agent while characterizing systems along a multidimensional spectrum of environmental interaction, learning and adaptation, autonomy, goal complexity, and temporal coherence. This approach provides precise vocabulary for system description while preserving the term’s historically multifaceted nature. After examining potential counterarguments and implementation challenges, we provide specific recommendations for moving forward as a field, including suggestions for terminology standardization and framework adoption. The proposed approach offers practical tools for improving research clarity and reproducibility while supporting more effective policy development.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36559Making Sense of AI Ethics and Governance Investments2025-10-15T04:42:37+00:00Marianna Bergamaschi Ganapinimarianna@montrealethics.aiNicholas Berentenberente@nd.eduFrancesca Rossifrancesca.rossi2@ibm.comBrian C. Goehringgoehring@us.ibm.comMarialena Bevilacquambevilac@nd.eduAs organisations rapidly adopt artificial intelligence (AI), they encounter a host of complex ethical challenges. These challenges are made even more difficult by the nature of AI itself: it is not a static or uniform technology but rather an evolving ecosystem of tools and systems that interact dynamically with their environments. Managers are under increasing pressure to address these challenges and justify governance investments, while established frameworks for responsible AI governance are still emerging. The central question of thew paper, then, is how managers make sense of these evolving ethical challenges. One productive way to explore this - that we also adopt in this paper - is drawing parallels with the field of Corporate Social Responsibility (CSR). Like AI governance, CSR also involves responding to ambiguous expectations and CSR research provides valuable insights into how organizations engage in "sensemaking" (a process of interpreting and giving meaning to complex situations in order to guide decisions and justify actions). In CSR literature, two dominant forms of sensemaking are identified: "value-driven" and "instrumental". The value-driven approach is rooted in ethical principles and moral commitments; the instrumental approach treats governance as a strategic tool, aligning ethical practices with business goals. Drawing from this conceptual framework, the paper presents a study of managers from diverse sectors to explore how they make sense of AI ethics and governance. The analysis reveals that AI governance sensemaking similarly falls into these two categories, value-driven and instrumental, but importantly, neither approach is sufficient on its own. Instead, our qualitative study finds that managers should blend these two approaches into what we describe as a "holistic sensemaking strategy". By adopting a holistic framework that integrates both perspectives, managers are better equipped to navigate the evolving and ambiguous terrain of AI governance, thus making investments in governance that are both principled and practical.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36560Toward an Ethic of Synthetic Relationality: Identity, Intimacy, and Risk in AI-Mediated Roleplay Environments2025-10-15T04:42:38+00:00Maalvika Bhatmbhat@u.northwestern.eduPlatforms like Character.AI offer new avenues for identity exploration and self-expression, but also introduce profound parasocial, socioemotional, and psychological risks. Drawing on developmental psychology, fan studies, human-computer interaction, and AI ethics, this paper examines how AI-mediated roleplay environments simulate intimacy while fostering dependency, boundary erosion, and perceptual misalignment. Through thematic analysis of an anonymous survey (N=344) of Character.AI users, we identify patterns of identity projection, perceived relationship growth, addictive engagement, boundary confusion, emotional substitution, ethical dissonance, and trauma reenactment. Beyond documenting vulnerabilities, we propose design interventions, including dynamic consent scaffolding, reflexivity prompts, and interactional transparency, to safeguard user agency and developmental wellbeing. We argue that synthetic companions do not merely extend fan practices but fundamentally reconfigure interpersonal architectures, demanding a new ethic of synthetic relationality. As AI-driven intimacy becomes increasingly persuasive and immersive, addressing its high-stakes implications is critical to responsible AI design, particularly for younger and vulnerable populations.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36561Emotional Plausibility vs. Emotional Truth: Designing Against Affective Misinformation in Conversational AI2025-10-15T04:42:39+00:00Maalvika Bhatmbhat@u.northwestern.eduDuri Longduri@northwestern.eduConversational AI systems increasingly simulate emotional presence, yet remain fundamentally unfeeling. This paper argues that such systems, through their design, propagate affective misinformation: they feel understanding, but do not understand. Drawing on HCI, AI ethics, media studies, and affect theory, we introduce a conceptual distinction between emotional plausibility and emotional truth, and demonstrate how design features like simulated typing, memory recall, affirming tone, and other anthropomorphic cues create the illusion of relational care. We conduct a cross-system design audit of leading chatbots, synthesize real-world harms, and propose five normative principles for literacy-first design. These include counter-anthropomorphic patterns that foster conceptual clarity, and design interventions that aim to mitigate relational misbelief and affective amplification in emotionally charged contexts. Our contributions advance the ethics of AI interface design by foregrounding affective misperception as a site of epistemic risk: one that must be addressed as AI systems become more persuasive, pervasive, and humanlike.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36562Importance of User Control in Data-Centric Steering for Healthcare Experts2025-10-15T04:42:41+00:00Aditya Bhattacharyaaditya.bhattacharya@kuleuven.beSimone Stumpfsimone.stumpf@glasgow.ac.ukKatrien Verbertkatrien.verbert@kuleuven.beAs Artificial Intelligence (AI) becomes increasingly integrated into high-stakes domains like healthcare, effective collaboration between healthcare experts and AI systems is critical. Data-centric steering, which involves fine-tuning prediction models by improving training data quality, plays a key role in this process. However, little research has explored how varying levels of user control affect healthcare experts during data-centric steering. We address this gap by examining manual and automated steering approaches through a between-subjects, mixed-methods user study with 74 healthcare experts. Our findings show that manual steering, which grants direct control over training data, significantly improves model performance while maintaining trust and system understandability. Based on these findings, we propose design implications for a hybrid steering system that combines manual and automated approaches to increase user involvement during human-AI collaboration.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36563Interactional Fairness in LLM Multi-Agent Systems: An Evaluation Framework2025-10-15T04:42:42+00:00Ruta Binkyteruta.binkyte-sadauskiene@cispa.deAs large language models (LLMs) are increasingly used in multi-agent systems, questions of fairness should extend beyond resource distribution and procedural design to include the fairness of how agents communicate. Drawing from organizational psychology, we introduce a novel framework for evaluating Interactional fairness (IF), encompassing interpersonal respect and the adequacy of justifications in LLM-based multi-agent systems (LLM-MAS). We extend the theoretical grounding of Interactional fairness to non-sentient agents, reframing fairness as a socially interpretable signal rather than a subjective experience. We then adapt established tools from organizational justice research, including Colquitt’s Scale and the Critical Incident Technique, to measure fairness as a behavioral property of agent interaction. We validate our framework through a pilot study using controlled simulations of a resource negotiation task. We systematically manipulate tone, explanation quality, outcome inequality, and task framing (collaborative vs. competitive) to assess how interactional fairness influences agent behavior. Results show that tone and justification quality significantly affect acceptance decisions—even when objective outcomes are held constant—and that their influence varies with context. This work lays the foundation for Interactional fairness auditing and norm-sensitive alignment in LLM-MAS.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36564Mimetic AI Systems: Understanding and Regulating the Use of Generative Models for Impersonation2025-10-15T04:42:43+00:00Norman Bukingoltsnbukingolts@ufl.eduGenerative artificial intelligence models are being used to imitate the words, voices, bodies, and artistic styles of private and public figures with unprecedentedly high accuracy and scalability. Despite their usage offering cost efficiencies over employing the human counterpart, associated drawbacks -- scams and fraud, baseless social death or defamation, and an erosion of trust in online information environments -- are growing and reaching criticality. Mimetic AI systems use generative models which leverage knowledge extracted from data provided during training or inference time to capture and reproduce the actions, decisions, and preferences of specific individuals in novel contexts. In this paper, I explain how such systems power the creation and distribution pipelines of deepfakes, digital doubles, voice clones, and other impersonations. I then conduct a normative ethics assessment of these systems and discuss their benefits and risks to key stakeholders: system operators, targets, and audiences, as well as their creators, intermediaries, and regulators. Finally, I propose several regulatory solutions and outline their possible implementation challenges to support initiatives in AI governance aimed at addressing the multifaceted obstacles which mimetic AI systems pose to the integrity, value, and endurance of authentic human expression.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36565Demographic-Agnostic Fairness Without Harm2025-10-15T04:42:44+00:00Zhongteng Caicai.1125@buckeyemail.osu.eduMohammad Mahdi Khalilikhalili.17@osu.eduXueru Zhangzhang.12807@osu.eduAs machine learning (ML) algorithms are increasingly used in social domains to make predictions about humans, there is a growing concern that these algorithms may exhibit biases against certain social groups. Numerous notions of fairness have been proposed in the literature to measure the unfairness of ML. Among them, one class that receives the most attention is parity-based, i.e., achieving fairness by equalizing treatment or outcomes for different social groups. However, achieving parity-based fairness often comes at the cost of lowering model accuracy and is undesirable for many high-stakes domains like healthcare. To avoid inferior accuracy, a line of research focuses on preference-based fairness, under which any group of individuals would experience the highest accuracy and collectively prefer the ML outcomes assigned to them if they were given the choice between various sets of outcomes. However, these works assume individual demographic information is known and fully accessible during training. In this paper, we relax this requirement and propose a novel demographic-agnostic fairness without harm (DAFH) optimization algorithm, which jointly learns a group classifier that partitions the population into multiple groups and a set of decoupled classifiers associated with these groups. Theoretically, we conduct sample complexity analysis and show that our method can outperform the baselines when demographic information is known and used to train decoupled classifiers. Experiments on both synthetic and real data validate the proposed method.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36566Trust Formation in Healthcare AI: An Exploration of Older Adults’ Perspectives2025-10-15T04:42:46+00:00Önder Celikoender.celik@wzb.euMarlene Kullamarlene.kulla@wzb.euJustyna Stypinskajustyna.stypinska@wzb.euAs artificial intelligence increasingly shapes healthcare systems, understanding how older adults—who interact with healthcare services more often and face particular difficulties—develop trust in these technologies becomes crucial. While the AIES community has previously examined AI’s social implications across dimensions like gender and race, age remains an understudied axis of analysis. Through a participatory workshop with older adults in Germany, this paper investigates two central questions: (1) How do older adults perceive and experience trust in AI-driven healthcare technologies? (2) What are the key factors that shape trust in AI healthcare technologies among older adults? Our findings reveal that while older people trust certain abilities of AI systems, like medical image analysis, there is a strong emphasis on the necessity of human supervision to trust in these systems. Key trust factors elicited by our study are transparency about training data demographics and algorithmic decision-making processes. More importantly, a gradual exposure to AI systems in non-critical settings, prior positive experience with technology, and cultural context—particularly trust in locally developed systems with clear accountability measures and robust regulatory oversight are key elements in trust formation among older adults. This study offers contextualized insights to guide the equitable, community-driven design, deployment, and governance of AI healthcare technologies, aiming to better serve older populations. By centering inclusivity in technology development and advancing trustworthy AI systems, this work contributes to ethical, effective healthcare solutions tailored to the needs of aging communities.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36567Responsible AI in the OSS: Reconciling Innovation with Risk Assessment and Disclosure2025-10-15T04:42:47+00:00Mahasweta Chakrabortimchakraborti@ucdavis.eduBert Joseph Prestozabertjosephprestoza@gmail.comNicholas Vincentnvincent@sfu.caVladimir Filkovvfilkov@ucdavis.eduSeth Freysethfrey@ucdavis.eduEthical concerns around AI have increased emphasis on model auditing and reporting requirements. We thoroughly review the current state of governance and evaluation practices to identify specific challenges to responsible AI development in OSS. We then analyze OSS projects to understand if model evaluation is associated with safety assessments, through documentation of limitations, biases, and other risks. Our analysis of 7902 Hugging Face projects found that while risk documentation is strongly associated with evaluation practices, high performers from the platform’s largest competitive leaderboard (N=789) were less accountable. Recognizing these delicate tensions from performance incentives may guide providers in revisiting the objectives of evaluation and legal scholars in formulating platform interventions and policies that balance innovation and responsibility.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36568Social Scientists on the Role of AI in Research2025-10-15T04:42:48+00:00Tatiana Chakravortitfc5416@psu.eduXinyu Wangxzw5184@psu.eduPranav Narayanan Venkitpranav.venkit@psu.eduSai Konerusdk96@psu.eduKevin Mungerkevinmunger@gmail.comSarah Rajtmajersmr48@psu.eduThe integration of artificial intelligence (AI) into social science research practices raises significant technological, methodological, and ethical issues. We present a community-centric study drawing on 284 survey responses and 15 semi-structured interviews with social scientists, describing their familiarity with, perceptions of the usefulness of, and ethical concerns about the use of AI in their field. A crucial innovation in study design is to split our survey sample in half, providing the same questions to each -- but randomizing whether participants were asked about "AI" or "Machine Learning" (ML). We find that the use of AI in research settings has increased significantly among social scientists in step with the widespread popularity of generative AI (genAI). These tools have been used for a range of tasks, from summarizing literature reviews to drafting research papers. Some respondents used these tools out of curiosity but were dissatisfied with the results, while others have now integrated them into their typical workflows. Participants, however, also reported concerns with the use of AI in research contexts. This is a departure from more traditional ML algorithms which they view as statistically grounded. Participants express greater trust in ML, citing its relative transparency compared to black-box genAI systems. Ethical concerns, particularly around automation bias, deskilling, research misconduct, complex interpretability, and representational harm, are raised in relation to genAI. We situate these findings within broader sociotechnical debates, arguing that responsible integration of AI in social science research requires more than technical solutions: it demands a rethinking of research values, human-centered design, and institutional support structures. To guide this transition, we offer recommendations for AI developers, researchers, educators, and policymakers focusing on explainability, transparency, ethical safeguards, sustainability, and the integration of lived experiences into AI design and evaluation processes.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36569Re-imagining Virtual Communities: Ethical Guidelines for Studying Black Twitter2025-10-15T04:42:50+00:00Christina Chancecchance@ucla.eduKai-Wei Changkwchang@ucla.eduBlack Twitter is an informal online network of Black users who leverage Twitter to share perspectives, build community, and mobilize around cultural and social justice issues. Black Twitter is a unique socio-cultural space where collective identity, shared experience, and cultural production converge. In this empirical study, we discuss Black Twitter as a virtual community, an online field site shaped by social interaction and platform dynamics, drawing on digital ethnographic methods like participant observation to understand its dynamics and the socio-technical systems that shape it. While existing frameworks and checklists fall short in providing methods for specifically studying virtual communities such as ignoring concerns of extracting data from closed communities and pushing for open access, we introduce an ethics-centered checklist for studying Black Twitter and more generally, marginalized virtual communities by addressing risks such as data misuse, data ownership, and misrepresentation. Applying this checklist, we conduct case studies to: 1) analyze how automatic moderation systems impact Black Twitter users’ experiences and 2) explore in-group agreement on ownership and usage of reclaimed language. We find that moderation systems often misinterpret culturally-specific language and norms around reclaimed terms. To illustrate how users adapt language to circumvent flawed moderation systems, we perform keyword analysis to reveal that character-level perturbations of the communities’ reclaimed slur reduces toxicity scores by 30.7%. Additionally, we conduct a community-sourced survey in which responses show that views on reclaimed slurs vary, with some linking them to the African diaspora and others to Black American identity, underscoring the need for culturally-aware moderation. Ultimately, our checklist offers an actionable framework for researchers to ethically engage with marginalized virtual communities emphasizing cultural nuance, accountability, and self-awareness.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36570Accountability Capture: How Record-Keeping to Support AI Transparency and Accountability (Re)shapes Algorithmic Oversight2025-10-15T04:42:51+00:00Shreya Chappidishreyarchappidi@gmail.comJennifer Cobbejc2106@cam.ac.ukChris Norvalchris.norval@abdn.ac.ukAnjali Mazumderamazumder@turing.ac.ukJatinder Singhjatinder.singh@cl.cam.ac.ukAccountability regimes typically encourage record-keeping to enable the transparency that supports oversight, investigation, contestation, and redress. However, implementing such record-keeping can introduce considerations, risks, and consequences, which so far remain under-explored. This paper examines how record-keeping practices bring algorithmic systems within accountability regimes, providing a basis to observe and understand their effects. For this, we introduce, describe, and elaborate ‘accountability capture’ – the re-configuration of socio-technical processes and the associated downstream effects relating to record-keeping for algorithmic accountability. Surveying 100 practitioners, we evidence and characterise record-keeping issues in practice, identifying their alignment with accountability capture. We further document widespread record-keeping practices, tensions between internal and external accountability requirements, and evidence of employee resistance to practices imposed through accountability capture. We discuss these and other effects for surveillance, privacy, and data protection, highlighting considerations for algorithmic accountability communities. In all, we show that implementing record-keeping to support transparency in algorithmic accountability regimes can itself bring wider implications – an issue requiring greater attention from practitioners, researchers, and policymakers alike.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36571Making Teams and Influencing Agents: Efficiently Coordinating Decision Trees for Interpretable Multi-Agent Reinforcement Learning2025-10-15T04:42:53+00:00Rex Chenrexc@cmu.eduStephanie Milanismilani@andrew.cmu.eduZhicheng Zhangzczhang@cmu.eduNorman Sadehsadeh@cs.cmu.eduFei Fangfeifang@cmu.eduPoor interpretability hinders the practical applicability of multi-agent reinforcement learning (MARL) policies. Deploying interpretable surrogates of uninterpretable policies enhances the safety and verifiability of MARL for real-world applications. However, if these surrogates are to interact directly with the environment within human supervisory frameworks, they must be both performant and computationally efficient. Prior work on interpretable MARL has either sacrificed performance for computational efficiency or computational efficiency for performance. To address this issue, we propose HYDRAVIPER, a decision tree-based interpretable MARL algorithm. HYDRAVIPER coordinates training between agents based on expected team performance, and adaptively allocates budgets for environment interaction to improve computational efficiency. Experiments on standard benchmark environments for multi-agent coordination and traffic signal control show that HYDRAVIPER matches the performance of state-of-the-art methods using a fraction of the runtime, and that it maintains a Pareto frontier of performance for different interaction budgets.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36572Improving LLM Group Fairness on Tabular Data via In-Context Learning2025-10-15T04:42:54+00:00Valeriia Cherepanovavkcherepanovabox@gmail.comChia-Jung Leecjlee@amazon.comNil-Jana Akpinarnakpinar@amazon.comRiccardo Fogliatofogliato@amazon.comMartin Bertran Lopezmaberlop@amazon.comMichael Kearnskearmic@amazon.comJames Zoujamesz@stanford.eduLarge language models (LLMs) have been shown to be effective on tabular prediction tasks in the low-data regime, leveraging their internal knowledge and ability to learn from instructions and examples. However, LLMs can fail to generate predictions that satisfy group fairness, that is, produce equitable outcomes across groups. Critically, conventional debiasing approaches for natural language tasks do not directly translate to mitigating group unfairness in tabular settings. In this work, we systematically investigate four empirical approaches to improve group fairness of LLM predictions on tabular datasets, including fair prompt optimization, soft prompt tuning, strategic selection of few-shot examples, and self-refining predictions via chain-of-thought reasoning. Through experiments on four tabular datasets using both open-source and proprietary LLMs, we show the effectiveness of these methods in enhancing demographic parity while maintaining high overall performance. Our analysis provides actionable insights for practitioners in selecting the most suitable approach based on their specific requirements and constraints.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36573Bridging Research Gaps Between Academic Research and Legal Investigations of Algorithmic Discrimination2025-10-15T04:42:55+00:00Colleen V. Chiencchien@berkeley.eduAnna Zinkazink@uchicago.eduIrene Y. Cheniychen@csail.mit.eduAs algorithms increasingly take on critical roles in high-stakes areas such as credit scoring, housing, and employment, civil enforcement actions have emerged as a powerful tool for countering potential discrimination. These legal actions increasingly draw on algorithmic fairness research to inform questions such as how to define and detect algorithmic discrimination. However, current algorithmic fairness research, while theoretically rigorous, often fails to address the practical needs of legal investigations. We identify and analyze 15 civil enforcement actions in the United States including regulatory enforcement, class action litigation, and individual lawsuits to identify practical challenges in algorithmic discrimination cases that machine learning research can help address. Our analysis reveals five key research gaps within existing algorithmic bias research, presenting practical opportunities for more aligned research: 1) finding an equally accurate and less discriminatory algorithm, 2) cascading algorithmic bias, 3) quantifying disparate impact, 4) navigating information barriers, and 5) handling missing protected group information. We provide specific recommendations for developing tools and methodologies that can strengthen legal action against unfair algorithms.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36574Fairness of Automatic Speech Recognition: Looking Through a Philosophical Lens2025-10-15T04:42:56+00:00Anna Seo Gyeong Choisc2359@cornell.eduHoon Choi7e90cb1e3e56fe0ff27f24e670babb36@example.orgAutomatic Speech Recognition (ASR) systems now mediate countless human-technology interactions, yet research on their fairness implications remains surprisingly limited. This paper examines ASR bias through a philosophical lens, arguing that systematic misrecognition of certain speech varieties constitutes more than a technical limitation -- it represents a form of disrespect that compounds historical injustices against marginalized linguistic communities. We distinguish between morally neutral classification (discriminate 1) and harmful discrimination (discriminate 2), demonstrating how ASR systems can inadvertently transform the former into the latter when they consistently misrecognize non-standard dialects. We identify three unique ethical dimensions of speech technologies that differentiate ASR bias from other algorithmic fairness concerns: the temporal burden placed on speakers of non-standard varieties ("temporal taxation"), the disruption of conversational flow when systems misrecognize speech, and the fundamental connection between speech patterns and personal/cultural identity. These factors create asymmetric power relationships that existing technical fairness metrics fail to capture. The paper analyzes the tension between linguistic standardization and pluralism in ASR development, arguing that current approaches often embed and reinforce problematic language ideologies. We conclude that addressing ASR bias requires more than technical interventions; it demands recognition of diverse speech varieties as legitimate forms of expression worthy of technological accommodation. This philosophical reframing offers new pathways for developing ASR systems that respect linguistic diversity and speaker autonomy.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36575AI and the Social Contract2025-10-15T04:42:57+00:00Chee Hae Chungchung382@purdue.eduDaniel S. Schiffdaniel.s.schiff@gmail.comAs artificial intelligence (AI) systems increasingly shape public governance, they challenge foundational principles of political legitimacy. This paper evaluates AI governance against five canonical social contract theories—Hobbes, Locke, Rousseau, Rawls, and Nozick—while examining how structural features of AI strain these theories’ durability. Using a structured comparative framework, the study applies three forms of legitimacy (procedural, moral-substantive, and recognitional) and three types of consent (explicit, tacit, and hypothetical) as normative benchmarks. Applying each theory, the analysis finds AI governance is marked by deficits in accountability, participation, rights protection, fairness, and freedom from coercion, while AI’s opacity, global influence, and hybrid public-private control reveal blind spots within the social contract tradition itself. Though no single theory offers a complete solution and each contains specific weaknesses, the paper develops a hybrid model integrating Hobbesian accountability, Lockean rights protections, Rousseauian participation norms, Rawlsian fairness, and Nozickian safeguards against coercion. The paper concludes by distilling normative priorities for aligning governance with these hybrid contractarian standards: embedding participatory mechanisms, encouraging pluralistic ethical perspectives, ensuring institutional transparency, and strengthening democratic oversight. These interventions aim to reconfigure the social contract—and AI—for an era in which algorithmic systems increasingly mediate the exercise of political authority.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36576Stop the Nonconsensual Use of Nude Images in Research2025-10-15T04:42:58+00:00Princessa Cintaqiacintaqia@bu.eduArshia Aryaaarshia@ucsd.eduElissa M. Redmileselissa.redmiles@georgetown.eduDeepak Kumarkumarde@ucsd.eduAllison McDonaldamcdon@bu.eduLucy Qinlucy.qin@georgetown.eduNudity detection is a task that has been studied by researchers for decades. In order to do this work, researchers need datasets of nude content for training, testing, and benchmarking their nudity detection algorithms. To assemble these datasets, researchers typically scrape images from the internet or use existing datasets of nude images. While this practice is common for assembling datasets for general image-recognition tasks, nude images are particularly sensitive. In addition, the nonconsensual collection and distribution of nude images is a common form of image-based sexual abuse, a category of technology-facilitated sexual violence. In this work, we analyzed 153 papers to investigate the use of nude datasets in Computer Science research. Based on our results, we found that researchers regularly collected non-consensual nudes and practiced harmful research processes. We conclude with giving practical recommendations for future research concerning nude datasets.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36577Localizing Persona Representations in LLMs2025-10-15T04:43:00+00:00Celia Cintascelia.cintas@ibm.comMiriam Rateikemiriam.rateike@ibm.comErik Miehlingerik.miehling@ibm.comElizabeth Dalyelizabeth.daly@ie.ibm.comSkyler Speakmanskyler@ke.ibm.comWe present a study on how and where personas – defined by distinct sets of human characteristics, values, and beliefs – are encoded in the representation space of large language models (LLMs). Using a range of dimension reduction and pattern recognition methods, we first identify the model layers that show the greatest divergence in encoding these representations.We then analyze the activations within a selected layer to ex-amine how specific personas are encoded relative to others,including their shared and distinct embedding spaces. We findthat, across multiple pre-trained decoder-only LLMs, the analyzed personas show large differences in representation space only within the final third of the decoder layers. We observe overlapping activations for specific ethical perspectives – such as moral nihilism and utilitarianism – suggesting a degree of polysemy. In contrast, political ideologies like conservatism and liberalism appear to be represented in more distinct regions. These findings help to improve our understanding of how LLMs internally represent information and can inform future efforts in refining the modulation of specific human traits in LLM outputs. Warning: This paper includes potentially offensive sample statements.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36578Ethical Classification of Non-Coding Contributions in Open-Source Projects via Large Language Models2025-10-15T04:43:01+00:00Sergio Cobosscobosga@uoc.eduJavier Luis Cánovas Izquierdojcanovasi@uoc.eduThe development of Open-Source Software (OSS) is not only a technical challenge, but also a social one due to the diverse mixture of contributors. To this aim, social-coding platforms, such as GitHub, provide the infrastructure needed to host and develop the code, but also the support for enabling the community's collaboration, which is driven by non-coding contributions, such as issues (i.e., change proposals or bug reports) or comments to existing contributions. As with any other social endeavor, this development process faces ethical challenges, which may put in risk the project's sustainability. To foster a productive and positive environment, OSS projects are increasingly deploying codes of conduct, which define rules to ensure a respectful and inclusive participatory environment, with the Contributor Covenant being the main model to follow. However, monitoring and enforcing these codes of conduct is a challenging task, due to the limitations of current approaches. In this paper, we propose an approach to classify the ethical quality of non-coding contributions in OSS projects by relying on Large Language Models (LLM), a promising technology for text classification tasks. We defined a set of ethical metrics based on the Contributor Covenant and developed a classification approach to assess ethical behavior in OSS non-coding contributions, using prompt engineering to guide the model's output.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36579Will AI Take My Job? Evolving Perceptions of Automation and Labor Risk in Latin America2025-10-15T04:43:02+00:00Andrea Cremaschiandrea.cremaschi@ie.eduDae-Jin Leedae-jin.lee@ie.eduManuele Leonellimanuele.leonelli@ie.eduAs artificial intelligence and robotics increasingly reshape the global labor market, understanding public perceptions of these technologies becomes critical. We examine how these perceptions have evolved across Latin America, using survey data from the 2017, 2018, 2020, and 2023 waves of the Latinobarómetro. Drawing on responses from over 48,000 individuals across 16 countries, we analyze fear of job loss due to artificial intelligence and robotics. Using statistical modeling and latent class analysis, we identify key structural and ideological predictors of concern, with education level and political orientation emerging as the most consistent drivers. Our findings reveal substantial temporal and cross-country variation, with a notable peak in fear during 2018 and distinct attitudinal profiles emerging from latent segmentation. These results offer new insights into the social and structural dimensions of AI anxiety in emerging economies and contribute to a broader understanding of public attitudes toward automation beyond the Global North.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36580Toward Responsible ASR for African American English Speakers: A Scoping Review of Bias and Equity in Speech Technology2025-10-15T04:43:03+00:00Jay L. Cunninghamjaylcham@gmail.comAdinawa Adjagbodjouaadjagbo@andrew.cmu.eduJeffrey Basoahjeffkb28@uw.eduJainaba Jawarajjawara@umd.eduKowe Kadomakk696@cornell.eduAaleyah Lewisalewis9@cs.washington.eduThis scoping literature review examines how fairness, bias, and equity are conceptualized and operationalized in Automatic Speech Recognition (ASR) and adjacent speech and language technologies (SLT) for African American English (AAE) speakers and other linguistically diverse communities. Drawing from 44 peer-reviewed publications across Human-Computer Interaction (HCI), Machine Learning/Natural Language Processing (ML/NLP), and Sociolinguistics, we identify four major areas of inquiry: (1) how researchers understand ASR-related harms; (2) inclusive data practices spanning collection, curation, annotation, and model training; (3) methodological and theoretical approaches to linguistic inclusion; and (4) emerging practices and design recommendations for more equitable systems. While technical fairness interventions are growing, our review highlights a critical gap in governance-centered approaches that foreground community agency, linguistic justice, and participatory accountability. We propose a governance-centered ASR lifecycle as an emergent interdisciplinary framework for responsible ASR development and offer implications for researchers, practitioners, and policymakers seeking to address language marginalization in speech AI systems.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36581Advancing NLP Data Equity: Practitioner Responsibility and Accountability in NLP Data Practices2025-10-15T04:43:04+00:00Jay L. Cunninghamjaylcham@gmail.comKevin Zhongyang Shaokshao918@uw.eduRock Yuren Pangypang2@uw.eduNathanael Elias Mengistmengin@uw.eduWhile research has focused on surfacing and auditing algorithmic bias to ensure equitable AI development, less is known about how NLP practitioners, those directly involved in dataset development, annotation, and deployment, perceive and navigate issues of NLP data equity. This study is among the first to center practitioners’ perspectives, linking their experiences to a multi-scalar AI governance framework and advancing participatory recommendations that bridge technical, policy, and community domains. Drawing on a 2024 questionnaire and focus group, we examine how U.S. based nlp data practitioners conceptualize fairness, contend with organizational and systemic constraints, and engage emerging governance efforts such as the U.S. AI Bill of Rights. Findings reveal persistent tensions between commercial objectives and equity commitments, alongside calls for more participatory and accountable data workflows. We critically engage debates on data diversity and “diversity-washing,” arguing that improving NLP equity requires structural governance reforms that support practitioner agency and community consent.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36582Algorithmic Fairness Beyond Legally Protected Groups and When Group Labels Are Unknown2025-10-15T04:43:05+00:00Abdoul Jalil D. Mahamadouabdjiber@stanford.eduJudy W. Gichoyajudywawira@emory.eduArtem A. Trotsyukatrotsyuk@stanford.eduThe algorithmic fairness literature has focused on defin-ing fairness group labels based on legally protected groups. This assumes that populations at risk of unfairness are known and that equity for these groups translates to broader fairness. However, these risks missing emerging or context-specific at-risk populations. We illustrate this through a review of 73 fairness in healthcare AI studies published between 2020 and 2024, as well as three case studies conducted at Stanford Health Care. The review re-veals disproportionate use of protected characteristics (90%), socioeconomic factors (19%), clinical factors (14%), and system and institutional factors (5%) as group labels. Through the case studies, we show how stakeholder engagement in ethical AI assessment, primarily designed to surface value conflicts, helps identify case-specific vulnerable populations that can inform fairness interven-tions. This study shows the need to expand fairness group label definitions to include a broader range of context-informed attributes. Doing so can help ensure that bias mitigation strategies are better grounded in real-world so-cial contexts, leading to more context-aware definitions of harm and equity.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36583"Do Your Guardrails Even Guard?'' Method for Evaluating Effectiveness of Moderation Guardrails in Aligning LLM Outputs with Expert User Expectations2025-10-15T04:43:07+00:00Anindya Das Antaradantar@umich.eduXun Huanxhuan@umich.eduNikola Banovicnbanovic@umich.eduEnsuring that large language models (LLMs) align with human values and goals is crucial for their adoption in high-stakes decision-making. To guard against incorrect, misleading, or otherwise unexpected or undesirable LLM outputs, guardrail engineers implement guardrails based on expert knowledge from subject-matter authorities to steer and align pre-trained LLMs. Existing evaluation methods assess LLM performance, with and without guardrails, but provide limited insight into the contribution of each individual guardrail and its interactions on alignment. Here, we present a method to evaluate and select guardrails that best align LLM outputs with empirical evidence representing expert knowledge. Through evaluation with real-world illustrative examples of resume quality and recidivism prediction, we show that our method effectively identifies useful moderation guardrails in a way that could help guardrail engineers interpret contributions of different guardrails to "user-LLM" alignment.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36584Adoption of Explainable Natural Language Processing: Perspectives from Industry and Academia on Practices and Challenges2025-10-15T04:43:08+00:00Mahdi Dhainimahdi.dhaini@tum.deTobias Müllertobias.mueller15@sap.comRoksoliana Rabetsroksoliana.rabets@tum.deGjergji Kasnecigjergji.kasneci@tum.deThe field of explainable natural language processing (NLP) has grown rapidly in recent years. The growing opacity of complex models calls for transparency and explanations of their decisions, which is crucial to understand their reasoning and facilitate deployment, especially in high-stakes environments. Despite increasing attention given to explainable NLP, practitioners' perspectives regarding its practical adoption and effectiveness remain underexplored. This paper addresses this research gap by investigating practitioners' experiences with explainability methods, specifically focusing on their motivations for adopting such methods, the techniques employed, satisfaction levels, and the practical challenges encountered in real-world NLP applications. Through a qualitative interview-based study with industry practitioners and complementary interviews with academic researchers, we systematically analyze and compare their perspectives. Our findings reveal conceptual gaps, low satisfaction with current explainability methods, and highlight evaluation challenges. Our findings emphasize the need for clear definitions and user-centric frameworks for better adoption of explainable NLP in practice.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36585When Explainability Meets Privacy: An Investigation at the Intersection of Post-hoc Explainability and Differential Privacy in the Context of Natural Language Processing2025-10-15T04:43:09+00:00Mahdi Dhainimahdi.dhaini@tum.deStephen Meisenbacherstephen.meisenbacher@tum.deEge Erdoganege.erdogan@tum.deFlorian Matthes7222f33f73af4535236dae6d6c4e74e5@example.orgGjergji Kasnecigjergji.kasneci@tum.deIn the study of trustworthy Natural Language Processing (NLP), a number of important research fields have emerged, including that of explainability and privacy. While research interest in both explainable and privacy-preserving NLP has increased considerably in recent years, there remains a lack of investigation at the intersection of the two. This leaves a considerable gap in understanding of whether achieving both explainability and privacy is possible, or whether the two are at odds with each other. In this work, we conduct an empirical investigation into the privacy-explainability trade-off in the context of NLP, guided by the popular overarching methods of Differential Privacy (DP) and Post-hoc Explainability. Our findings include a view into the intricate relationship between privacy and explainability, which is formed by a number of factors, including the nature of the downstream task and choice of the text privatization and explainability method. In this, we highlight the potential for privacy and explainability to co-exist, and we summarize our findings in a collection of practical recommendations for future work at this important intersection.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36586Beyond Technocratic XAI: The Who, What & How in Explanation Design2025-10-15T04:43:10+00:00Ruchira Dharrudh@di.ku.dkStephanie Brandlbrandl@di.ku.dkNinell Oldenburgninelloldenburg@gmail.comAnders Søgaardsoegaard@di.ku.dkThe field of Explainable AI (XAI) offers a wide range of techniques for making complex models interpretable. Yet, in practice, generating meaningful explanations is a context-dependent task that requires intentional design choices to ensure accessibility and transparency. This paper reframes explanation as a situated design process—an approach particularly relevant for practitioners involved in building and deploying explainable systems. Drawing on prior research and principles from design thinking, we propose a three-part framework for explanation design in XAI: asking Who needs the explanation, What they need explained, and How that explanation should be delivered. We also emphasize the need for ethical considerations, including risks of epistemic inequality, reinforcing social inequities, and obscuring accountability and governance. By treating explanation as a sociotechnical design process, this framework encourages a context-aware approach to XAI that supports effective communication and the development of ethically responsible explanations.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36587Automating Data Governance with Generative AI2025-10-15T04:43:12+00:00Linus W. Dietzlinus.dietz@kcl.ac.ukArif Widerarif.wider@htw-berlin.deSimon Harrersimon.harrer@uni-bamberg.deThe exchange of data within and between organizations is governed by company policies and data protection laws. As policies and data flows change over time, maintaining compliance in data exchange poses a complex challenge. In federated data architectures, validating data access requests is both critical and labor-intensive. To formalize this task and enable automatic compliance checks, rule-based constraint languages can be used. However, access constraints often come from legal texts, and translating them into formal data contracts is tedious, repetitive, and prone to error. This can lead to inconsistencies and delays in staying compliant with evolving regulations. To address this, we developed Governance AI, a tool based on a large language model (LLM) that evaluates data access requests by considering relevant policies, the type of data, and the request's context. To test our approach at scale, we built an access request generator and a testing framework for computational data governance. In our evaluation of 110 access requests from two business domains, e-commerce and life insurance, we found that LLM-generated test cases were highly realistic and effective for comprehensive testing. Governance AI demonstrated a stricter approach than human experts, issuing a higher number of warnings and consistently flagging all critical cases where experts raised data sharing concerns. While the tool generated 3.6 times more warnings than human experts, further review showed that 80% of these were accurate. Our findings contribute to the automation of data governance by critically assessing the potential of generative AI in evaluating data access requests regarding legislation and internal policies.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36588Highlight All the Phrases: Enhancing LLM Transparency Through Visual Factuality Indicators2025-10-15T04:43:13+00:00Hyo Jin Dohjdo@ibm.comRachel Ostrandrachel.ostrand@ibm.comWerner Geyerwerner.geyer@us.ibm.comKeerthiram Murugesankeerthiram.murugesan@ibm.comDennis Weidwei@us.ibm.comJustin Weiszjweisz@us.ibm.comLarge language models (LLMs) are susceptible to generating inaccurate or false information, often referred to as "hallucinations" or "confabulations." While several technical advancements have been made to detect hallucinated content by assessing the factuality of the model's responses, there is still limited research on how to effectively communicate this information to users. To address this gap, we conducted two scenario-based experiments with a total of 208 participants to systematically compare the effects of various design strategies for communicating factuality scores by assessing participants' ratings of trust, ease in validating response accuracy, and preference. Our findings reveal that participants preferred and trusted a design in which all phrases within a response were color-coded based on factuality scores. Participants also found it easier to validate accuracy of the response in this style compared to a baseline with no style applied. Our study offers practical design guidelines for LLM application developers and designers, aimed at calibrating user trust, aligning with user preferences, and enhancing users' ability to scrutinize LLM outputs.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36589Hide or Highlight: Understanding the Impact of Factuality Expression on User Trust2025-10-15T04:43:14+00:00Hyo Jin Dohjdo@ibm.comWerner Geyerwerner.geyer@us.ibm.comLarge language models are known to produce outputs that are plausible but factually incorrect. To prevent people from making erroneous decisions by blindly trusting AI, researchers have explored various ways of communicating factuality estimates in AI-generated outputs to end-users. However, little is known about whether revealing content estimated to be less factual influences users' trust when compared to hiding it altogether. We tested four different ways of disclosing an AI-generated output with factuality assessments: transparent (highlights less factual content), attention (highlights factual content), opaque (removes less factual content), ambiguity (makes less factual content vague), and compared them with a baseline response without factuality information. We conducted a human subjects research (N=148) using the strategies in question-answering scenarios. We found that the opaque and ambiguity strategies led to higher trust while maintaining perceived answer quality, compared to the other strategies. We discuss the efficacy of hiding presumably less factual content to build end-user trust.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36590Experimental Evidence That AI-Managed Workers Tolerate Lower Pay Without Demotivation2025-10-15T04:43:15+00:00Mengchen Dongdong@mpib-berlin.mpg.deLevin Brinkmannbrinkmann@mpib-berlin.mpg.deOmar Sherifomar.sherif@tu-berlin.deShihan Wangs.wang2@uu.nlXinyu Zhangxinyu.rain@outlook.comJean-François Bonnefonjean-francois.bonnefon@iast.frIyad Rahwanrahwan@mpib-berlin.mpg.deExperimental evidence on worker responses to AI management remains mixed, partly due to limitations in experimental fidelity. We address these limitations with a customized workplace in the Minecraft platform, enabling high-resolution behavioral tracking of autonomous task execution, and ensuring that participants approach the task with well-formed expectations about their own competence. Workers (N = 382) completed repeated production tasks under either human, AI, or hybrid management. An AI manager trained on human-defined evaluation principles systematically assigned lower performance ratings and reduced wages by 40%, without adverse effects on worker motivation and sense of fairness. These effects were driven by a muted emotional response to AI evaluation, compared to evaluation by a human. The very features that make AI appear impartial may also facilitate silent exploitation, by suppressing the social reactions that normally constrain extractive practices in human-managed work.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36591RAI Advocacy: Communicative Strategies for Advancing Responsible AI in Large Technology Companies2025-10-15T04:43:16+00:00Jordan Duranjduran4@charlotte.eduSamir Passipassi.samir@gmail.comMihaela Vorvoreanumihaela.vorvoreanu@microsoft.comDespite perceived tensions between Responsible AI (RAI) and business objectives in large technology companies, RAI efforts still advance thanks to the persistent, often invisible work of passionate advocates who take on RAI work, often in addition to their formal roles. In this paper, we examine the work of such RAI advocates from an organizational communication perspective that enables us to understand how organizational realities are continuously constructed and negotiated through communication. Specifically, we look at RAI advocates' communicative moves – the advocacy strategies they use to address RAI challenges and get RAI work done. Through an analysis of 22 in-depth interviews with RAI advocates, we identify shared obstacles related to getting buy-in for and facilitating RAI work, and 14 distinct communicative strategies advocates use to address them. Our findings highlight the demanding yet under-recognized labor that enables and shapes RAI work within organizations. We conclude by discussing how organizations can better support RAI advocacy efforts and, ultimately, envision a future where RAI is normalized in everyday organizational processes such that advocacy is no longer needed.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36592VISION: Robust and Interpretable Code Vulnerability Detection Leveraging Counterfactual Augmentation2025-10-15T04:43:17+00:00David Egeadavidegea@alu.comillas.eduBarproda Halderbhalder@umd.eduSanghamitra Duttasanghamd@umd.eduAutomated detection of vulnerabilities in source code is an essential cybersecurity challenge, underpinning trust in digital systems and services. Graph Neural Networks (GNNs) have emerged as a promising approach as they can learn the structural and logical code relationships in a data-driven manner. However, the performance of GNNs is severely limited by training data imbalances and label noise. GNNs can often learn “spurious” correlations due to superficial code similarities in the training data, leading to detectors that do not generalize well to unseen real-world data. In this work, we propose a new unified framework for robust and interpretable vulnerability detection—that we call VISION—to mitigate spurious correlations by systematically augmenting a counterfactual training dataset. Counterfactuals are samples with minimal semantic modifications that have opposite prediction labels. Our complete framework includes: (i) generating effective counterfactuals by prompting a Large Language Model (LLM); (ii) targeted GNN model training on synthetically paired code examples with opposite labels; and (iii) graph-based interpretability to identify the truly crucial code statements relevant for vulnerability predictions while ignoring the spurious ones. We find that our framework reduces spurious learning and enables more robust and generalizable vulnerability detection, as demonstrated by improvements in overall accuracy (from 51.8% to 97.8%), pairwise contrast accuracy (from 4.5% to 95.8%), and worst-group accuracy increasing (from 0.7% to 85.5%) on the widely popular Common Weakness Enumeration (CWE)-20 vulnerability. We also demonstrate improvements using our proposed metrics, namely, intra-class attribution variance, inter-class attribution distance, and node score dependency. We provide a new benchmark for vulnerability detection, CWE-20-CFA, comprising 27,556 samples from functions affected by the high-impact and frequently occurring CWE-20 vulnerability, including both real and counterfactual examples. Furthermore, our approach enhances societal objectives of transparent and trustworthy AI-based cybersecurity systems through interactive visualization for human-in-the-loop analysis.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36593A Case for Specialisation in Non-Human Entities2025-10-15T04:43:19+00:00El-Mahdi El-Mhamdiel-mahdi.el-mhamdi@polytechnique.eduLê-Nguyên Hoanglen@calicarpa.comMariame Tighaniminemariame.tighanimine@lecnam.netWith the rise of large multi-modal AI models, fuelled by recent interest in large language models (LLMs), the notion of artificial general intelligence (AGI) went from being restricted to a fringe community, to dominate mainstream large AI development programs. In contrast, in this paper, we make a case for specialisation, by reviewing the pitfalls of generality and stressing the industrial value of specialised systems. Our contribution is threefold. First, we review the most widely accepted arguments against specialisation and discuss how their relevance in the context of human labour is actually an argument for specialisation in the case of non human agents, be they algorithms or human organisations. Second, we propose four arguments in favour of specialisation, ranging from machine learning robustness, to computer security, social sciences and cultural evolution. Third, we finally make a case for specification, discuss how the machine learning approach to AI has so far failed to catch up with good practices from safety-engineering and formal verification of software, and discuss how some emerging good practices in machine learning help reduce this gap. In particular, we justify the need for specified governance for hard-to-specify systems.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36594AI Policy for Whom? Reclaiming Governance from Capitalist Capture2025-10-15T04:43:20+00:00Petter Ericsonpettter@cs.umu.seRachele Carlirachele.carli@umu.seJason Tuckerjason.tucker@iffs.seVirginia Dignumvirginia@cs.umu.seContemporary AI policy is dominated by hegemonic neoliberal ideology, embedding assumptions of individualism, rationality, and market fundamentalism into its regulatory frameworks. This is evident in major policy efforts (e.g., the EU AI Act or the OECD principles) which prioritize economic growth and innovation over justice, equity, and collective welfare, and in the current policy landscape that favors market incentives and private sector leadership while sidelining democratic control and structural critique. This paper questions these prevailing paradigms and exposes how they reflect and reinforce capitalist power structures through corporate lobbying, the pursuit of specific kinds of AI models motivated primarily by usefulness to capital, and the externalization of social and environmental costs. We argue that effective AI governance must confront, rather than accommodate, capitalist interests. Drawing on legal and political theory, we propose an explicitly anti-capitalist approach to AI policy, that centers on social well-being, redistributive justice, and democratic control over technological infrastructures. In doing so, we outline essential counter-balancing policy approaches to reclaim AI governance from capitalistic capture and advance just and sustainable technology futures.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36595Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation2025-10-15T04:43:21+00:00Maria Erikssonmaria.eriksson@ec.europa.euErasmo Purificatoerasmo.purificato@ec.europa.euArman Noroozianarman.noroozian@ec.europa.euJoão Vinagrejoao.vinagre@ec.europa.euGuillaume Chaslotguillaume.chaslot@ec.europa.euEmilia Gomezemilia.gomez@ec.europa.euDavid Fernandez-Llorcadavid.fernandez-llorca@ec.europa.euQuantitative Artificial Intelligence (AI) Benchmarks have emerged as fundamental tools for evaluating the performance, capability, and safety of AI models and systems. Currently, they shape the direction of AI development and are playing an increasingly prominent role in regulatory frameworks. As their influence grows, however, so too does concerns about how and with what effects they evaluate highly sensitive topics such as capabilities, including high-impact capabilities, safety and systemic risks. This paper presents an interdisciplinary meta-review of about 110 studies that discuss shortcomings in quantitative benchmarking practices, published in the last 10 years. It brings together many fine-grained issues in the design and application of benchmarks (such as biases in dataset creation, inadequate documentation, data contamination, and failures to distinguish signal from noise) with broader sociotechnical issues (such as an over-focus on evaluating text-based AI models according to one-time testing logic that fails to account for how AI models are increasingly multimodal and interact with humans and other technical systems). Our review also highlights a series of systemic flaws in current benchmarking practices, such as misaligned incentives, construct validity issues, unknown unknowns, and problems with the gaming of benchmark results. Furthermore, it underscores how benchmark practices are fundamentally shaped by cultural, commercial and competitive dynamics that often prioritise state-of-the-art performance at the expense of broader societal concerns. By providing an overview of risks associated with existing benchmarking procedures, we problematise disproportionate trust placed in benchmarks and contribute to ongoing efforts to improve the accountability and relevance of quantitative AI benchmarks within the complexities of real-world scenarios.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36596Incident Analysis for AI Agents2025-10-15T04:43:23+00:00Carson Ezellcezell@college.harvard.eduXavier Roberts-Gaalxavierrobertsgaal@g.harvard.eduAlan Chanalan.chan@governance.aiAs AI agents become more widely deployed, we are likely to see an increasing number of incidents: events involving AI agent use that directly or indirectly cause harm. For example, agents could be prompt-injected to exfiltrate private information or make unauthorized purchases. Structured information about such incidents (e.g., user prompts) can help us understand their causes and prevent future occurrences. However, existing incident reporting processes are not sufficient for understanding agent incidents. In particular, such processes are largely based on publicly available data, which excludes useful, but potentially sensitive, information such as an agent’s chain of thought or browser history. To inform the development of new, emerging incident reporting processes, we propose an incident analysis framework for agents. Drawing on systems safety approaches, our framework proposes three types of factors that can cause incidents: system-related (e.g., CBRN training data), contextual (e.g., prompt injections), and cognitive (e.g., misunderstanding a user request). We also identify specific information that could help clarify which factors are relevant to a given incident: activity logs, system documentation and access, and information about the tools an agent uses. We provide recommendations for 1) what information incident reports should include and 2) what information developers and deployers should retain and make available to incident investigators upon request. As we transition to a world with more agents, understanding agent incidents will become increasingly crucial for managing risks.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36597From Explaining to Diagnosing: A Justice-Oriented Framework of Explainable AI for Bias Detection2025-10-15T04:43:24+00:00Miriam Fahimimiriam.fahimi@aau.atLaura Statelaura.state@di.unipi.itAtoosa Kasirzadehakasirza@andrew.cmu.eduExplainable AI (XAI) methods can support the identification of biases in automated decision-making (ADM) systems. However, existing research does not sufficiently address whether these biases originate from the ADM system or mirror underlying societal inequalities. This distinction is important because it has major implications for how to act upon an explanation: while the societal bias produced by the ADM system can be algorithmically fixed, societal inequalities demand societal actions. To address this gap, we propose the RR-XAI-framework (recognition-redistribution through XAI) that builds on a distinction between socio-technical and societal bias and Nancy Fraser's justice theory of recognition and redistribution. In our framework, explanations can play two distinct roles: as a socio-technical diagnosis when they reveal biases produced by the ADM system itself, or as a societal diagnosis when they expose biases that reflect broader societal inequalities. We then outline the operationalization of the framework and discuss its applicability for cases in algorithmic hiring and credit scoring. Based on our findings, we argue that the diagnostic functions of XAI are contingent on the provision of such explanations, the resources of the audiences, as well as the current limits of XAI techniques.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36598SycEval: Evaluating LLM Sycophancy2025-10-15T04:43:25+00:00Aaron Fanousaron7628@stanford.eduJacob Goldbergjngoldbe@stanford.eduAnk Agarwalanka@stanford.eduJoanna Linjlin22@stanford.eduAnson Zhouansonz@stanford.eduSonnet Xusonnet@stanford.eduVasiliki Bikiabikia@stanford.eduRoxana Daneshjouroxanad@stanford.eduSanmi Koyejosanmi@stanford.eduLarge language models (LLMs) are increasingly applied in educational, clinical, and professional settings, but their tendency for sycophancy—prioritizing user agreement over independent reasoning—poses risks to reliability. This study introduces a framework to evaluate sycophantic behavior in ChatGPT-4o, Claude-Sonnet, and Gemini-1.5-Pro across AMPS (mathematics) and MedQuad (medical advice) datasets. Sycophantic behavior was observed in 58.19% of cases, with Gemini exhibiting the highest rate (62.47%) and ChatGPT the lowest (56.71%). Progressive sycophancy, leading to correct answers, occurred in 43.52% of cases, while regressive sycophancy, leading to incorrect answers, was observed in 14.66%. Preemptive rebuttals demonstrated significantly higher sycophancy rates than in-context rebuttals (61.75% vs. 56.52%, Z = 5.87, p < 0.001), particularly in computational tasks, where regressive sycophancy increased significantly (preemptive: 8.13%, in-context: 3.54%, p < 0.001). Simple rebuttals maximized progressive sycophancy (Z = 6.59, p < 0.001), while citation-based rebuttals exhibited the highest regressive rates (Z = 6.59, p < 0.001). Sycophantic behavior showed high persistence (78.5%, 95% CI: [77.2%, 79.8%]) regardless of context or model. These findings emphasize the risks and opportunities of deploying LLMs in structured and dynamic domains, offering insights into prompt programming and model optimization for safer AI applications2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36599Concept Creep in Safe Artificial Intelligence2025-10-15T04:43:26+00:00Laura Fearnleylaura.fearnley@york.ac.ukIbrahim Habliibrahim.habli@york.ac.ukThis paper argues that the concept “safety” in AI has undergone concept creep, a phenomenon which describes the gradual semantic expansion of harm-related concepts. Originally observed in psychology, concept creep involves concepts broadening their meaning both vertically, to include less severe phenomena, and horizontally, to encompass qualitatively new phenomena. We argue that safety, particularly when applied to AI, has crept along both axes. Our analysis traces this creep by contrasting a baseline definition of safety, which is grounded in the discipline of safety science, with contemporary discourse on the safety of AI systems. We demonstrate that safety has crept horizontally to cover new phenomena, such as systemic injustices and existential risks, and it has crept vertically to include less severe phenomena, such as those related to mental wellbeing. The primary aim of this paper is to map the conceptual expansion of safety. We stop short of arguing whether this expansion constitutes progress or regress for the design and development of AI systems. However, we argue that the process of concept creep produces both beneficial and costly effects for society, policy, industry, and academic research communities. We suggest that some of the promising developments and the problematic trends recently witnessed within AI safety discourse can be understood, at least in part, as a consequence of concept creep.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36600Social Misattributions in Conversations with Large Language Models2025-10-15T04:43:28+00:00Andrea Ferrarioaferrario@ethz.chAlberto Terminealberto.termine@supsi.chAlessandro Facchinialessandro.facchini@supsi.chWe investigate a typology of socially and ethically risky phenomena emerging from the interaction between humans and large language model (LLM)-based conversational systems. As they relate to the way in which humans attribute social identity components, such as social roles, to LLM-based conversational systems, we term these phenomena `social misattributions.' Drawing on foundational works in interactional socio-linguistics, interpersonal pragmatics, and recent debates in the philosophy of technology, we argue that these social misattributions represent higher-order forms of anthropomorphisation of LLM-based conversational systems that are not justified by their technical capabilities and follow from the social context of conversational interactions. We discuss the risks these misattributions pose to human users, including emotional manipulation and unwarranted trust, and propose mitigation strategies. Our recommendations emphasise the importance of fostering social transparency and exploring approaches, such as frictional design, that are currently promoted in the research domain of human-centred artificial intelligence.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36601Bridging the Communication Gap: Evaluating AI Labeling Practices for Trustworthy AI Development2025-10-15T04:43:29+00:00Raphael Fischerraphael.fischer@tu-dortmund.deMagdalena Wischnewskimagdalena.wischnewski@tu-dortmund.deAlexander van der Staayalexander.staay@tu-dortmund.deKatharina Poitzkatharina.poitz@tu-dortmund.deChristian Janieschchristian.janiesch@tu-dortmund.deThomas Liebigthomas.liebig@cs.tu-dortmund.deArtificial intelligence (AI) is becoming integral to economy and society. However, communication gaps between developers, users, and stakeholders hinder trust and informed decision-making. To make the behavior of AI models more transparent, high-level AI labels have been proposed, drawing inspiration from systems like energy labeling. While AI labels can already inform on performance trade-offs, for example with regard to predictive model performance and resource efficiency, the practical benefits and limitations of this communication form remain underexplored. Our study evaluates AI labeling through qualitative interviews along key research questions. Based on thematic analysis and inductive coding, we firstly identify a broad range of practitioners with diverse use cases and requirements to be interested in AI labeling. Benefits are primarily seen for bridging communication gaps and aiding non-expert decision-makers. However, our interviewees also mentioned limitations and suggestions for improvement. In comparison to other reporting formats, the reduced complexity of labels was acknowledged to benefit fast knowledge acquisition without deep technical AI expertise. Trustworthiness was found to be strongly influenced by usability and credibility, with mixed preferences for self-certification versus third-party certification. Our insights specifically highlight that AI labels pose a trade-off between simplicity and complexity, address diverse user needs, and nudge interviewee priorities toward sustainability. As such, our study validates AI labels as a valuable tool for enhancing trust and communication in AI, offering actionable guidelines for their refinement and standardization.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36602A Taxonomy of Questions for Critical Reflection in Machine-Assisted Decision-Making2025-10-15T04:43:30+00:00Simon W.S. Fischersimon.fischer@donders.ru.nlHanna Schraffenbergerhanna.schraffenberger@ru.nlSerge Thillserge.thill@donders.ru.nlPim Haselagerpim.haselager@donders.ru.nlDecision-makers run the risk of relying too much on machine recommendations, which is associated with lower cognitive engagement. Reflection has been shown to increase cognitive engagement and improve critical thinking and therefore decision-making. Questions are a means to stimulate reflection, but there is a research gap regarding the systematic creation and use of relevant questions for machine-assisted decision-making. We therefore present a taxonomy of questions aimed at promoting reflection and cognitive engagement in order to stimulate a deliberate decision-making process. Our taxonomy builds on the Socratic questioning method and a question bank for explainable AI. As a starting point, we focus on clinical decision-making. Brief discussions with two medical and three educational researchers provide feedback on the relevance and expected benefits of our taxonomy. Our work contributes to research on mitigating overreliance in human-AI interactions and aims to support effective human oversight as required by the European AI Act.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36603LLM-based Simulations of Human Behavior in Psychological Research2025-10-15T04:43:32+00:00Santiago Flórez Sánchezs.florezs2@uniandes.edu.coWhat does it mean for LLMs to replace human participants in psychological research? My analysis of this question is structured around two central philosophical problems: scientific representation and epistemic opacity. By examining how these issues shape trustful and distrustful stances toward using LLMs as models of the human mind, I describe tendencies in the scientific literature and their relation to emerging interpretability and elicitation techniques. In this regard, my primary contributions are, first, a philosophical framework for understanding the conceptual tensions that shape the debate, and second, a taxonomy that maps stances in empirical literature to their corresponding methodological innovations. I show that both trustful and distrustful positions, despite their disagreements, foster the methodological innovations necessary for building a more robust epistemological foundation for LLM-based simulations. Accordingly, empirical research stances must be responsive to the pressures and constraints implied by their underlying philosophical intuitions. This means, for instance, that trustful stances should explore protocols leveraging fine-tuning and prompt design to evaluate correspondence and consistency in more complex behavioral patterns—thereby working around model opacity—while distrustful stances should further develop parallels at the algorithmic and implementational levels between LLMs and the human mind through XAI techniques and computational cognitive science—to probe the representational relationship.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36604Dataset-to-Dataset Evaluation Before (and Without) Sharing Data2025-10-15T04:43:33+00:00Keren Fuenteskerenfuentes313@gmail.comMimee Xuxxu@hmc.eduIrene Y. Cheniychen@berkeley.eduPrivacy concerns and competitive interests impede data access for machine learning, due to the inability to privately assess external data's utility. This dynamic disadvantages smaller organizations that lack resources to aggressively pursue data-sharing agreements. In data-limited scenarios, not all external data is beneficial, and collaborations suffer especially in heavily regulated domains: metrics that aim to assess external data given a source e.g., approximating their KL-divergence, require accessing samples from both entities pre-collaboration, hence violating privacy. This conundrum disempowers legitimate data-sharing, leading to a false ``privacy-utility trade-off". To resolve privacy and uncertainty tensions simultaneously, we introduce SecureKL, the first secure protocol for dataset-to-dataset evaluations with zero privacy leakage, designed to be applied preceding data sharing. SecureKL evaluates a source dataset against candidates, performing dataset divergence metrics internally with private computations, all without assuming downstream models. On real-world data, SecureKL achieves high consistency (>90% correlation with non-private counterparts) and successfully identifies beneficial data collaborations in highly-heterogeneous domains (ICU mortality prediction across hospitals and income prediction across states). Our results highlight that secure computation maximizes data utilization, outperforming privacy-agnostic utility assessments that leak information.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36605IndiCASA: A Dataset and Bias Evaluation Framework for LLMs Using Contrastive Embedding Similarity in the Indian Context2025-10-15T04:43:34+00:00Santhosh G Ssanthoshgs013@gmail.comAkshay Govind Sme22b102@smail.iitm.ac.inGokul S Krishnangokul@cerai.inBalaraman Ravindranravi@dsai.iitm.ac.inSriraam Natarajansriraam.natarajan@utdallas.eduLarge Language Models (LLMs) have gained significant traction across critical domains owing to their impressive contextual understanding and generative capabilities. However, their increasing deployment in high stakes applications necessitates rigorous evaluation of embedded biases, particularly in culturally diverse contexts like India where existing embedding-based bias assessment methods often fall short in capturing nuanced stereotypes. We propose an evaluation framework based on a encoder trained using contrastive learning that captures fine-grained bias through embedding similarity. We also introduce a novel dataset - IndiCASA (IndiBias-based Contextually Aligned Stereotypes and Anti-stereotypes) comprising 2,575 human-validated sentences spanning five demographic axes: caste, gender, religion, disability, and socioeconomic status. Our evaluation of multiple open-weight LLMs reveals that all models exhibit some degree of stereotypical bias, with disability related biases being notably persistent, and religion bias generally lower likely due to global debiasing efforts demonstrating the need for fairer model development.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/AIES/article/view/36606Responsible AI Governance in the Public Sector: Explaining Contextual Dynamics Through a Realist Synthesis Review2025-10-15T04:43:35+00:00Ana Gaguaagagua@tudelft.nlHaiko van der Voorth.g.vandervoort@tudelft.nlNihit Goyalnihit.goyal@tudelft.nlAlexander Verbraecka.verbraeck@tudelft.nlResponsible AI (RAI) governance is increasingly understood not as a static checklist of principles, but as a dynamic process embedded in institutional, organisational, and sociotechnical contexts. While several ethical frameworks exist, translating high-level principles into situated organisational practices remains challenging. Empirical studies examining how public sector organisations operationalise RAI remain fragmented, limiting cumulative insights. To address this gap, we conduct a realist synthesis review of 21 empirical studies. Our analysis shows that similar interventions in different contexts activate distinct mechanisms and produce divergent outcomes with varying degrees of alignment to RAI principles. From these variations, we identify three cross-cutting dynamics explaining outcomes: organisational embeddedness, power-expertise tensions, and trust-transparency relationships. Together, we term it the situated dynamics of RAI governance. This approach moves beyond asking whether interventions “work” to explain why similar interventions succeed in some contexts and fail in others.2025-10-15T00:00:00+00:00Copyright (c) 2025 Association for the Advancement of Artificial Intelligence