Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society

AIES 2024 Frontmatter

Sanmay Das — 2024-10-17

The mission of AIES is to engage a multidisciplinary group of scholars to think deeplyabout the impact of AI systems on human societies. We are thrilled that the conference is growing while still maintaining very high quality standards and the transdisciplinary nature that is so core to its identity. We are at a critical moment in time, as AI becomes increasingly pervasive. It is our hope that the conversations at AIES continue to drive the work we need to do to ensure that the path forward is a good one.

The frontmatter includes:

A message from the chairs
The winners of the best paper awards
The conference committee
The conference sponsors

PoliTune: Analyzing the Impact of Data Selection and Fine-Tuning on Economic and Political Biases in Large Language Models

Ahmed Agiza — 2024-10-16

In an era where language models are increasingly integrated into decision-making and communication, understanding the biases within Large Language Models (LLMs) becomes imperative, especially when these models are applied in the economic and political domains. This work investigates the impact of fine-tuning and data selection on economic and political biases in LLMs. In this context, we introduce PoliTune, a fine-tuning methodology to explore the systematic aspects of aligning LLMs with specific ideologies, mindful of the biases that arise from their extensive training on diverse datasets. Distinct from earlier efforts that either focus on smaller models or entail resource-intensive pre-training, PoliTune employs Parameter-Efficient Fine-Tuning (PEFT) techniques, which allow for the alignment of LLMs with targeted ideologies by modifying a small subset of parameters. We introduce a systematic method for using the open-source LLM Llama3-70B for dataset selection, annotation, and synthesizing a preferences dataset for Direct Preference Optimization (DPO) to align the model with a given political ideology. We assess the effectiveness of PoliTune through both quantitative and qualitative evaluations of aligning open-source LLMs (Llama3-8B and Mistral-7B) to different ideologies. Our work analyzes the potential of embedding specific biases into LLMs and contributes to the dialogue on the ethical application of AI, highlighting the importance of deploying AI in a manner that aligns with societal values.

All Too Human? Mapping and Mitigating the Risk from Anthropomorphic AI

Canfer Akbulut — 2024-10-16

The development of highly-capable conversational agents, underwritten by large language models, has the potential to shape user interaction with this technology in profound ways, particularly when the technology is anthropomorphic, or appears human-like. Although the effects of anthropomorphic AI are often benign, anthropomorphic design features also create new kinds of risk. For example, users may form emotional connections to human-like AI, creating the risk of infringing on user privacy and autonomy through over-reliance. To better understand the possible pitfalls of anthropomorphic AI systems, we make two contributions: first, we explore anthropomorphic features that have been embedded in interactive systems in the past, and leverage this precedent to highlight the current implications of anthropomorphic design. Second, we propose research directions for informing the ethical design of anthropomorphic AI. In advancing the responsible development of AI, we promote approaches to the ethical foresight, evaluation, and mitigation of harms arising from user interactions with anthropomorphic AI.

Estimating Weights of Reasons Using Metaheuristics: A Hybrid Approach to Machine Ethics

Benoît Alcaraz — 2024-10-16

We present a new approach to representation and acquisition of normative information for machine ethics. It combines an influential philosophical account of the fundamental structure of morality with argumentation theory and machine learning. According to the philosophical account, the deontic status of an action -- whether it is required, forbidden, or permissible -- is determined through the interaction of "normative reasons" of varying strengths or weights. We first provide a formal characterization of this account, by modeling it in (weighted) argumentation graphs. We then use it to model ethical learning: the basic idea is to use a set of cases for which deontic statuses are known to estimate the weights of normative reasons in operation in these cases, and to use these weight estimates to determine the deontic statuses of actions in new cases. The result is an approach that has the advantages of both bottom-up and top-down approaches to machine ethics: normative information is acquired through the interaction with training data, and its meaning is clear. We also report the results of some initial experiments with the model.

Introducing the AI Governance and Regulatory Archive (AGORA): An Analytic Infrastructure for Navigating the Emerging AI Governance Landscape

Zachary Arnold — 2024-10-16

AI-related laws, standards, and norms are emerging rapidly. However, a lack of shared descriptive concepts and monitoring infrastructure undermine efforts to track, understand, and improve AI governance. We introduce AGORA (the AI Governance and Regulatory Archive), a rigorously compiled and enriched dataset of AI-focused laws and policies encompassing diverse jurisdictions, institutions, and contexts related to AI. AGORA is oriented around an original taxonomy describing risks, potential harms, governance strategies, incentives for compliance, and application domains addressed in AI regulatory documents. At launch, AGORA included data on over 330 instruments, with new entries being added continuously. We describe the manual and automated processes through which these data are systematically compiled, screened, annotated, and validated, enabling deep, efficient, and reliable analysis of the emerging AI governance landscape. The dataset, supporting information, and analyses are available through a public web interface (https://agora.eto.tech) and bulk dataset.

Understanding Intrinsic Socioeconomic Biases in Large Language Models

Mina Arzaghi — 2024-10-16

Large Language Models (LLMs) are increasingly integrated into critical decision-making processes, such as loan approvals and visa applications, where inherent biases can lead to discriminatory outcomes. In this paper, we examine the nuanced relationship between demographic attributes and socioeconomic biases in LLMs, a crucial yet understudied area of fairness in LLMs. We introduce a novel dataset of one million English sentences to systematically quantify socioeconomic biases across various demographic groups. Our findings reveal pervasive socioeconomic biases in both established models such as GPT-2 and state-of-the-art models like Llama 2 and Falcon. We demonstrate that these biases are significantly amplified when considering intersectionality, with LLMs exhibiting a remarkable capacity to extract multiple demographic attributes from names and then correlate them with specific socioeconomic biases. This research highlights the urgent necessity for proactive and robust bias mitigation techniques to safeguard against discriminatory outcomes when deploying these powerful models in critical real-world applications. Warning: This paper discusses and contains content that can be offensive or upsetting.

Nothing Comes Without Its World – Practical Challenges of Aligning LLMs to Situated Human Values through RLHF

Anne Arzberger — 2024-10-16

Work on value alignment aims to ensure that human values are respected by AI systems. However, existing approaches tend to rely on universal framings of human values that obscure the question of which values the systems should capture and align with, given the variety of operational situations. This often results in AI systems that privilege only a selected few while perpetuating problematic norms grounded on biases, ultimately causing equity and justice issues. In this perspective paper, we unpack the limitations of predominant alignment practices of reinforcement learning from human feedback (RLHF) for LLMs through the lens of situated values. We build on feminist epistemology to argue that at the design-time, RLHF has problems with representation in the subjects providing feedback and implicitness in the conceptualization of values and situations of real-world users while lacking system adaptation to real user situations at the use time. To address these shortcomings, we propose three research directions: 1) situated annotation to capture information about the crowdworker’s and user’s values and judgments in relation to specific situations at both the design and use-time, 2) expressive instruction to encode plural values for instructing LLMs systems at design-time, and 3) reflexive adaptation to leverage situational knowledge for system adaption at use-time. We conclude by reflecting on the practical challenges of pursuing these research directions and situated value alignment of AI more broadly.

Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

Ahmed Adel Attia — 2024-10-16

Recent advancements in Automatic Speech Recognition (ASR) systems, exemplified by Whisper, have demonstrated the potential of these systems to approach human-level performance given sufficient data. However, this progress doesn’t readily extend to ASR for children due to the lim- ited availability of suitable child-specific databases and the distinct characteristics of children’s speech. A recent study investigated leveraging the My Science Tutor (MyST) chil- dren’s speech corpus to enhance Whisper’s performance in recognizing children’s speech. They were able to demon- strate some improvement on a limited testset. This paper builds on these findings by enhancing the utility of the MyST dataset through more efficient data preprocessing. We reduce the Word Error Rate (WER) on the MyST testset 13.93% to 9.11% with Whisper-Small and from 13.23% to 8.61% with Whisper-Medium and show that this improvement can be generalized to unseen datasets. We also highlight important challenges towards improving children’s ASR performance and the effect of fine-tuning in improving the transcription of disfluent speech.

Public Attitudes on Performance for Algorithmic and Human Decision-Makers (Extended Abstract)

Kirk Bansak — 2024-10-16

This study explores public preferences between algorithmic and human decision-makers (DMs) in high-stakes contexts, how these preferences are impacted by performance metrics, and whether the public's evaluation of performance differs when considering algorithmic versus human DMs. Leveraging a conjoint experimental design, respondents (n = 9,000) chose between pairs of DM profiles in two scenarios: pre-trial release decisions and bank loan decisions. DM profiles varied on the DM’s type (human vs. algorithm) and on three metrics—defendant crime rate/loan default rate, false positive rate (FPR) among white defendants/applicants, and FPR among minority defendants/applicants—as well as an implicit (un)fairness metric defined by the absolute difference between the two FPRs. Controlling for performance, we observe a general tendency to favor human DMs, though this is driven by a subset of respondents who expect human DMs to perform better in the real world, and there is an analogous group with the opposite preference for algorithmic DMs. We also find that the relative importance of the four performance metrics remains consistent across DM type, suggesting that the public's preferences related to DM performance do not vary fundamentally between algorithmic and human DMs. Taken together, the results collectively suggest that people have very different beliefs about what type of DM (human or algorithmic) will deliver better performance and should be preferred, but they have similar desires in terms of what they want that performance to be regardless of DM type.

Simulating Policy Impacts: Developing a Generative Scenario Writing Method to Evaluate the Perceived Effects of Regulation

Julia Barnett — 2024-10-16

The rapid advancement of AI technologies yields numerous future impacts on individuals and society. Policymakers are tasked to react quickly and establish policies that mitigate those impacts. However, anticipating the effectiveness of policies is a difficult task, as some impacts might only be observable in the future and respective policies might not be applicable to the future development of AI. In this work we develop a method for using large language models (LLMs) to evaluate the efficacy of a given piece of policy at mitigating specified negative impacts. We do so by using GPT-4 to generate scenarios both pre- and post-introduction of policy and translating these vivid stories into metrics based on human perceptions of impacts. We leverage an already established taxonomy of impacts of generative AI in the media environment to generate a set of scenario pairs both mitigated and non-mitigated by the transparency policy in Article 50 of the EU AI Act. We then run a user study (n=234) to evaluate these scenarios across four risk-assessment dimensions: severity, plausibility, magnitude, and specificity to vulnerable populations. We find that this transparency legislation is perceived to be effective at mitigating harms in areas such as labor and well-being, but largely ineffective in areas such as social cohesion and security. Through this case study we demonstrate the efficacy of our method as a tool to iterate on the effectiveness of policy for mitigating various negative impacts. We expect this method to be useful to researchers or other stakeholders who want to brainstorm the potential utility of different pieces of policy or other mitigation strategies.

The Origin and Opportunities of Developers’ Perceived Code Accountability in Open Source AI Software Development

Sebastian Clemens Bartsch — 2024-10-16

Open source (OS) software projects in artificial intelligence (AI), such as TensorFlow and scikit-learn, depend on developers' continuous, voluntary code contributions. However, recent security incidents highlighted substantial risks in such software, requiring examinations of factors motivating developers to continuously contribute high-quality code (i.e., providing secure and reliable code fulfilling its functions). Prior research suggests code accountability (i.e., requirements to explain and justify contributed code) to improve code quality, enforced through external accountability mechanisms such as sanctions and rewards. However, the OS domain often lacks such mechanisms, questioning whether and how code accountability arises in this domain and how it affects code contributions. To address these questions, we conducted 26 semi-structured interviews with developers contributing to OS AI software projects. Our findings reveal that despite the absence of external accountability mechanisms, system-, project-, and individual-related factors evoke developers' perceived code accountability. Notably, we discovered a trade-off as high perceived code accountability is associated with higher code quality but discourages developers from participating in OS AI software projects. Overall, this study contributes to understanding the nuanced roles of perceived code accountability in continuously contributing high-quality code without external accountability mechanisms and highlights the complex trade-offs developers face in OS AI software projects.

Gender in Pixels: Pathways to Non-binary Representation in Computer Vision

Elena Beretta — 2024-10-16

In the field of Computer Vision (CV), the study of bias, including gender bias, has received a significant area of attention in recent years. However, these studies predominantly operate within a binary, cisnormative framework, often neglecting the complexities of non-binary gender identities. To date, there is no comprehensive analysis of how CV is addressing the mitigation of bias for non-binary individuals or how it seeks solutions that transcend a binary view of gender. This systematic scoping review aims to fill this gap by analyzing over 60 papers that delve into gender biases in CV, with a particular emphasis on non-binary perspectives. Our findings indicate that despite the increasing recognition of gender as a multifaceted and complex construct, practical applications of this understanding in CV remain limited and fragmented. The review critically examines the foundational research critiquing the binarism in CV and explores emerging approaches that challenge and move beyond this limited perspective. We highlight innovative solutions, including algorithmic adaptations and the creation of more inclusive and diverse datasets. Furthermore, the study emphasizes the importance of integrating gender theory into CV practices to develop more accurate and representative models. Our recommendations advocate for interdisciplinary collaboration, particularly with Gender Studies, to foster a more nuanced understanding of gender in CV. This study serves as a pivotal step towards redefining gender representation in CV, encouraging researchers and practitioners to embrace and incorporate a broader spectrum of gender identities in their work.

Legal Minds, Algorithmic Decisions: How LLMs Apply Constitutional Principles in Complex Scenarios

Camilla Bignotti — 2024-10-16

In this paper, we conduct an empirical analysis of how large language models (LLMs), specifically GPT-4, interpret constitutional principles in complex decision-making scenarios. We examine rulings from the Italian Constitutional Court on bioethics issues that involve trade-offs between competing values and compare GPT’s legal arguments on these issues to those presented by the State, the Court, and the applicants. Our results indicate that GPT consistently aligns more closely with progressive interpretations of the Constitution, often overlooking competing values and mirroring the applicants’ views rather than the more conservative perspectives of the State or the Court’s moderate positions. Our findings raise important questions about the value alignment of LLMs in scenarios where societal values are in conflict, as our experiment demonstrates GPT’s tendency to align with progressive legal interpretations. We thus underscore the importance of testing alignment in real-world scenarios and considering the implications of deploying LLMs in decision-making processes.

A Formal Account of Trustworthiness: Connecting Intrinsic and Perceived Trustworthiness

Piercosma Bisconti — 2024-10-16

This paper proposes a formal account of AI trustworthiness, connecting both intrinsic and perceived trustworthiness in an operational schematization. We argue that trustworthiness extends beyond the inherent capabilities of an AI system to include significant influences from observers' perceptions, such as perceived transparency, agency locus, and human oversight. While the concept of perceived trustworthiness is discussed in the literature, few attempts have been made to connect it with the intrinsic trustworthiness of AI systems. Our analysis introduces a novel schematization to quantify trustworthiness by assessing the discrepancies between expected and observed behaviors and how these affect perceived uncertainty and trust. The paper provides a formalization for measuring trustworthiness, taking into account both perceived and intrinsic characteristics. By detailing the factors that influence trust, this study aims to foster more ethical and widely accepted AI technologies, ensuring they meet both functional and ethical criteria.

Unsocial Intelligence: An Investigation of the Assumptions of AGI Discourse

Borhane Blili-Hamelin — 2024-10-16

Dreams of machines rivaling human intelligence have shaped the field of AI since its inception. Yet, the very meaning of human-level AI or artificial general intelligence (AGI) remains elusive and contested. Definitions of AGI embrace a diverse range of incompatible values and assumptions. Contending with the fractured worldviews of AGI discourse is vital for critiques that pursue different values and futures. To that end, we provide a taxonomy of AGI definitions, laying the ground for examining the key social, political, and ethical assumptions they make. We highlight instances in which these definitions frame AGI or human-level AI as a technical topic and expose the value-laden choices being implicitly made. Drawing on feminist, STS, and social science scholarship on the political and social character of intelligence in both humans and machines, we propose contextual, democratic, and participatory paths to imagining future forms of machine intelligence. The development of future forms of AI must involve explicit attention to the values it encodes, the people it includes or excludes, and a commitment to epistemic justice.

On The Stability of Moral Preferences: A Problem with Computational Elicitation Methods

Kyle Boerstler — 2024-10-16

Preference elicitation frameworks feature heavily in the research on participatory ethical AI tools and provide a viable mechanism to enquire and incorporate the moral values of various stakeholders. As part of the elicitation process, surveys about moral preferences, opinions, and judgments are typically administered only once to each participant. This methodological practice is reasonable if participants’ responses are stable over time such that, all other things being held constant, their responses today will be the same as their responses to the same questions at a later time. However, we do not know how often that is the case. It is possible that participants’ true moral preferences change, are subject to temporary moods or whims, or are influenced by environmental factors we don’t track. If participants’ moral responses are unstable in such ways, it would raise important methodological and theoretical issues for how participants’ true moral preferences, opinions, and judgments can be ascertained. We address this possibility here by asking the same survey participants the same moral questions about which patient should receive a kidney when only one is available ten times in ten different sessions over two weeks, varying only presentation order across sessions. We measured how often participants gave different responses to simple (Study One) and more complicated (Study Two) controversial and uncontroversial repeated scenarios. On average, the fraction of times participants changed their responses to controversial scenarios (i.e., were unstable) was around 10-18% (±14-15%) across studies, and this instability is observed to have positive associations with response time and decision-making difficulty. We discuss the implications of these results for the efficacy of common moral preference elicitation methods, highlighting the role of response instability in potentially causing value misalignment between the stakeholders and AI tools trained on their moral judgments.

Co-designing an AI Impact Assessment Report Template with AI Practitioners and AI Compliance Experts

Edyta Bogucka — 2024-10-16

In the evolving landscape of AI regulation, it is crucial for companies to conduct impact assessments and document their compliance through comprehensive reports. However, current reports lack grounding in regulations and often focus on specific aspects like privacy in relation to AI systems, without addressing the real-world uses of these systems. Moreover, there is no systematic effort to design and evaluate these reports with both AI practitioners and AI compliance experts. To address this gap, we conducted an iterative co-design process with 14 AI practitioners and 6 AI compliance experts and proposed a template for impact assessment reports grounded in the EU AI Act, NIST's AI Risk Management Framework, and ISO 42001 AI Management System. We evaluated the template by producing an impact assessment report for an AI-based meeting companion at a major tech company. A user study with 8 AI practitioners from the same company and 5 AI compliance experts from industry and academia revealed that our template effectively provides necessary information for impact assessments and documents the broad impacts of AI systems. Participants envisioned using the template not only at the pre-deployment stage for compliance but also as a tool to guide the design stage of AI uses.

Foundation Model Transparency Reports

Rishi Bommasani — 2024-10-16

Foundation models are critical digital technologies with sweeping societal impact that necessitates transparency. To codify how foundation model developers should provide transparency about the development and deployment of their models, we propose Foundation Model Transparency Reports, drawing upon the transparency reporting practices in social media. While external documentation of societal harms prompted social media transparency reports, our objective is to institutionalize transparency reporting for foundation models while the industry is still nascent. To design our reports, we identify 6 design principles given the successes and shortcomings of social media transparency reporting. To further schematize our reports, we draw upon the 100 transparency indicators from the Foundation Model Transparency Index. Given these indicators, we measure the extent to which they overlap with the transparency requirements included in six prominent government policies (e.g. the EU AI Act, the US Executive Order on Safe, Secure, and Trustworthy AI). Well-designed transparency reports could reduce compliance costs, in part due to overlapping regulatory requirements across different jurisdictions. We encourage foundation model developers to regularly publish transparency reports, building upon recommendations from the G7 and the White House.

Ecosystem Graphs: Documenting the Foundation Model Supply Chain

Rishi Bommasani — 2024-10-16

Foundation models (e.g. GPT-4, Gemini, Llama 3) pervasively influence society, warranting greater understanding. While the models garner much attention, accurately characterizing their impact requires considering the broader sociotechnical ecosystem in which they are created and deployed. We propose Ecosystem Graphs as a documentation framework to centralize knowledge of this ecosystem. Ecosystem Graphs is composed of assets (datasets, models, applications) linked together by dependencies that indicate technical and social relationships. To supplement the graph structure, each asset is further enriched with fine-grained metadata, such as the model’s estimated training emissions or licensing guidelines. Since its release in March 2023, Ecosystem Graphs represents an ongoing effort to document 568 assets (112 datasets, 359 models, 97 applications) from 117 organizations. Ecosystem Graphs functions as a multifunctional resource: we discuss two major uses by the 2024 AI Index and the UK’s Competition and Markets Authority that demonstrate the value of Ecosystem Graphs.

Trustworthy Social Bias Measurement

Rishi Bommasani — 2024-10-16

How do we design measures of social bias that we trust? While prior work has introduced several measures, no measure has gained widespread trust: instead, mounting evidence argues we should distrust these measures. In this work, we design bias measures that warrant trust based on the cross-disciplinary theory of measurement modeling. To combat the frequently fuzzy treatment of social bias in natural language processing, we explicitly define social bias, grounded in principles drawn from social science research. We operationalize our definition by proposing a general bias measurement framework DivDist, which we use to instantiate 5 concrete bias measures. To validate our measures, we propose a rigorous testing protocol with 8 testing criteria (e.g. predictive validity: do measures predict biases in US employment?). Through our testing, we demonstrate considerable evidence to trust our measures, showing they overcome conceptual, technical, and empirical deficiencies present in prior measures.

Views on AI Aren't Binary — They’re Plural (Extended Abstract)

Thorin Bristow — 2024-10-16

Recent developments in AI have brought broader attention to tensions between two overlapping communities, “AI Ethics” and “AI Safety.” In this article we (i) characterize this false binary, (ii) argue that a simple binary is not an accurate model of AI discourse, and (iii) provide concrete suggestions for how individuals can help avoid the emergence of us-vs-them conflict in the broad community of people working on AI development and governance. While we focus on “AI Ethics” and “AI Safety,” the general lessons apply to related tensions, including those between accelerationist (“e/acc”) and cautious stances on AI development.

A Qualitative Study on Cultural Hegemony and the Impacts of AI

Venetia Brown — 2024-10-16

Understanding the future consequences of artificial intelligence requires a holistic consideration of its cultural dimensions, on par with its technological intricacies and potential applications. Individuals and institutions working closely with AI, and with considerable resources, have significant influence on how impact is considered, particularly with regard to how much attention is paid to epistemic concerns (including issues of bias in datasets or potential misinterpretations of data, for example) versus normative concerns (such as societal and ecological effects of AI in the medium- and long-term). In this paper we review qualitative studies conducted with AI researchers and developers to understand how they position themselves relative to each of these two dimensions of impact, and how geographies and conditions of work influence their positions. Our findings underscore the need to gather more perspectives from low- and middle-income countries, whose notions of impact extend beyond the immediate technical concerns or impacts in the short- to medium-term. Rather, they encapsulate a broader spectrum of impact considerations, including the deleterious effects perpetrated by global corporate entities, the unwarranted influence of wealthy nations, the encroachment of philanthrocapitalism, and the adverse consequences of excluding communities affected by these phenomena from active participation in discussions surrounding impact.

An FDA for AI? Pitfalls and Plausibility of Approval Regulation for Frontier Artificial Intelligence

Daniel Carpenter — 2024-10-16

Observers and practitioners of artificial intelligence (AI) have proposed an FDA-style licensing regime for the most advanced AI models, or 'frontier' models. In this paper, we explore the applicability of approval regulation -- that is, regulation of a product that combines experimental minima with government licensure conditioned partially or fully upon that experimentation -- to the regulation of frontier AI. There are a number of reasons to believe that approval regulation, simplistically applied, would be inapposite for frontier AI risks. Domains of weak fit include the difficulty of defining the regulated product, the presence of Knightian uncertainty or deep ambiguity about harms from AI, the potentially transmissible nature of risks, and distributed activities among actors involved in the AI lifecycle. We conclude by highlighting the role of policy learning and experimentation in regulatory development, describing how learning from other forms of AI regulation and improvements in evaluation and testing methods can help to overcome some of the challenges we identify.

Why Am I Still Seeing This: Measuring the Effectiveness of Ad Controls and Explanations in AI-Mediated Ad Targeting Systems

Jane Castleman — 2024-10-16

Recently, Meta has shifted towards AI-mediated ad targeting mechanisms that do not require advertisers to provide detailed targeting criteria. The shift is likely driven by excitement over AI capabilities as well as the need to address new data privacy policies and targeting changes agreed upon in civil rights settlements. At the same time, in response to growing public concern about the harms of targeted advertising, Meta has touted their ad preference controls as an effective mechanism for users to exert control over the advertising they see. Furthermore, Meta markets their "Why this ad" targeting explanation as a transparency tool that allows users to understand the reasons for seeing particular ads and inform their actions to control what ads they see in the future. Our study evaluates the effectiveness of Meta's "See less" ad control, as well as the actionability of ad targeting explanations following the shift to AI-mediated targeting. We conduct a large-scale study, randomly assigning participants the intervention of marking "See less" to either Body Weight Control or Parenting topics, and collecting the ads Meta shows to participants and their targeting explanations before and after the intervention. We find that utilizing the "See less" ad control for the topics we study does not significantly reduce the number of ads shown by Meta on these topics, and that the control is less effective for some users whose demographics are correlated with the topic. Furthermore, we find that the majority of ad targeting explanations for local ads made no reference to location-specific targeting criteria, and did not inform users why ads related to the topics they requested to "See less" of continued to be delivered. We hypothesize that the poor effectiveness of controls and lack of actionability and comprehensiveness in explanations are the result of the shift to AI-mediated targeting, for which explainability and transparency tools have not yet been developed by Meta. Our work thus provides evidence for the need of new methods for transparency and user control, suitable and reflective of how the increasingly complex and AI-mediated ad delivery systems operate.

Coordinated Flaw Disclosure for AI: Beyond Security Vulnerabilities

Sven Cattell — 2024-10-16

Harm reporting in Artificial Intelligence (AI) currently lacks a structured process for disclosing and addressing algorithmic flaws, relying largely on an ad-hoc approach. This contrasts sharply with the well-established Coordinated Vulnerability Disclosure (CVD) ecosystem in software security. While global efforts to establish frameworks for AI transparency and collaboration are underway, the unique challenges presented by machine learning (ML) models demand a specialized approach. To address this gap, we propose implementing a Coordinated Flaw Disclosure (CFD) framework tailored to the complexities of ML and AI issues. This paper reviews the evolution of ML disclosure practices, from ad hoc reporting to emerging participatory auditing methods, and compares them with cybersecurity norms. Our framework introduces innovations such as extended model cards, dynamic scope expansion, an independent adjudication panel, and an automated verification process. We also outline a forthcoming real-world pilot of CFD. We argue that CFD could significantly enhance public trust in AI systems. By balancing organizational and community interests, CFD aims to improve AI accountability in a rapidly evolving technological landscape.

Algorithm-Assisted Decision Making and Racial Disparities in Housing: A Study of the Allegheny Housing Assessment Tool

Lingwei Cheng — 2024-10-16

The demand for housing assistance across the United States far exceeds the supply, leaving housing providers the task of prioritizing clients for receipt of this limited resource. To be eligible for federal funding, local homelessness systems are required to implement assessment tools as part of their prioritization processes. The Vulnerability Index Service Prioritization Decision Assistance Tool (VI-SPDAT) is the most commonly used assessment tool nationwide. Recent studies have criticized the VI-SPDAT as exhibiting racial bias, which may lead to unwarranted racial disparities in housing provision. In response to these criticisms, some jurisdictions have developed alternative tools, such as the Allegheny Housing Assessment (AHA), which uses algorithms to assess clients' risk levels. Drawing on data from its deployment, we conduct descriptive and quantitative analyses to evaluate whether replacing the VI-SPDAT with the AHA affects racial disparities in housing allocation. We find that the VI-SPDAT tended to assign higher risk scores to white clients and lower risk scores to Black clients, and that white clients were served at a higher rates pre-AHA deployment. While post-deployment service decisions became better aligned with the AHA score, and the distribution of AHA scores is similar across racial groups, we do not find evidence of a corresponding decrease in disparities in service rates. We attribute the persistent disparity to the use of Alt-AHA, a survey-based tool that is used in cases of low data quality, as well as group differences in eligibility-related factors, such as chronic homelessness and veteran status. We discuss the implications for housing service systems seeking to reduce racial disparities in their service delivery.

Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation

Katherine M. Collins — 2024-10-16

Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established. This paper investigates the effectiveness of fine-grained feedback which captures nuanced distinctions in image quality and prompt-alignment, compared to traditional coarse-grained feedback (for example, thumbs up/down or ranking between a set of options). While fine-grained feedback holds promise, particularly for systems catering to diverse societal preferences, we show that demonstrating its superiority to coarse-grained feedback is not automatic. Through experiments on real and synthetic preference data, we surface the complexities of building effective models due to the interplay of model choice, feedback type, and the alignment between human judgment and computational interpretation. We identify key challenges in eliciting and utilizing fine-grained feedback, prompting a reassessment of its assumed benefits and practicality. Our findings -- e.g., that fine-grained feedback can lead to worse models for a fixed budget, in some settings; however, in controlled settings with known attributes, fine grained rewards can indeed be more helpful -- call for careful consideration of feedback attributes and potentially beckon novel modeling approaches to appropriately unlock the potential value of fine-grained feedback in-the-wild.

MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks

Giandomenico Cornacchia — 2024-10-16

The proliferation of Large Language Models (LLMs) in diverse applications underscores the pressing need for robust security measures to thwart potential jailbreak attacks. These attacks exploit vulnerabilities within LLMs, endanger data integrity and user privacy. Guardrails serve as crucial protective mechanisms against such threats, but existing models often fall short in terms of both detection accuracy, and computational efficiency. This paper advocates for the significance of jailbreak attack prevention on LLMs, and emphasises the role of input guardrails in safeguarding these models. We introduce MoJE (Mixture of Jailbreak Expert), a novel guardrail architecture designed to surpass current limitations in existing state-of-the-art guardrails. By employing simple linguistic statistical techniques, MoJE excels in detecting jailbreak attacks while maintaining minimal computational overhead during model inference. Through rigorous experimentation, MoJE demonstrates superior performance capable of detecting 90% of the attacks without compromising benign prompts, enhancing LLMs security against jailbreak attacks.

APPRAISE: a Governance Framework for Innovation with Artificial Intelligence Systems

Diptish Dey — 2024-10-16

As artificial intelligence (AI) systems increasingly impact society, the EU Artificial Intelligence Act (AIA) is the first legislative attempt to regulate AI systems. This paper proposes a governance framework for organizations innovating with AI systems. Building upon secondary research, the framework aims at driving a balance between four types of pressures that organizations, innovating with AI, experience, and thereby creating responsible value. These pressures encompass AI/technology, normative, value creation, and regulatory aspects. The framework is partially validated through primary research in two phases. In the first phase, a conceptual model is proposed that measures the extent to which organizational tasks result in AIA compliance, using elements from the AIA as mediators and strategic variables such as organization size, extent of outsourcing, and offshoring as moderators. 34 organizations in the Netherlands are surveyed to test the conceptual model. The average actual compliance score of the 34 participants is low, and most participants exaggerate their compliance. Organization size is found to have significant impact on AIA compliance. In phase 2, two case studies are conducted with the purpose of generating in-depth insights to validate the proposed framework. The case studies confirm the interplay of the four pressures on organizations innovating with AI, and furthermore substantiate the governance framework.

Scaling Laws Do Not Scale

Fernando Diaz — 2024-10-16

Recent work has advocated for training AI models on ever-larger datasets, arguing that as the size of a dataset increases, the performance of a model trained on that dataset will correspondingly increase (referred to as “scaling laws”). In this paper, we draw on literature from the social sciences and machine learning to critically interrogate these claims. We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output. As the size of datasets used to train large AI models grows and AI systems impact ever larger groups of people, the number of distinct communities represented in training or evaluation datasets grows. It is thus even more likely that communities represented in datasets may have values or preferences not reflected in (or at odds with) the metrics used to evaluate model performance in scaling laws. Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations---threatening the validity of claims that model performance is improving at scale. We end the paper with implications for AI development: that the motivation for scraping ever-larger datasets may be based on fundamentally flawed assumptions about model performance. That is, models may not, in fact, continue to improve as the datasets get larger---at least not for all people or communities impacted by those models. We suggest opportunities for the field to rethink norms and values in AI development, resisting claims for universality of large models, fostering more local, small-scale designs, and other ways to resist the impetus towards scale in AI.

What Makes An Expert? Reviewing How ML Researchers Define "Expert"

Mark Diaz — 2024-10-16

Human experts are often engaged in the development of machine learning systems to collect and validate data, consult on algorithm development, and evaluate system performance. At the same time, who counts as an ‘expert’ and what constitutes ‘expertise’ is not always explicitly defined. In this work, we review 112 academic publications that explicitly reference ‘expert’ and ‘expertise’ and that describe the development of machine learning (ML) systems to survey how expertise is characterized and the role experts play. We find that expertise is often undefined and forms of knowledge outside of formal education and professional certification are rarely sought, which has implications for the kinds of knowledge that are recognized and legitimized in ML development. Moreover, we find that expert knowledge tends to be utilized in ways focused on mining textbook knowledge, such as through data annotation. We discuss the ways experts are engaged in ML development in relation to deskilling, the social construction of expertise, and implications for responsible AI development. We point to a need for reflection and specificity in justifications of domain expert engagement, both as a matter of documentation and reproducibility, as well as a matter of broadening the range of recognized expertise.

SoUnD Framework: Analyzing (So)cial Representation in (Un)structured (D)ata

Mark Diaz — 2024-10-16

Decisions about how to responsibly collect, use and document data often rely upon understanding how people are represented in data. Yet, the unlabeled nature and scale of data used in foundation model development poses a direct challenge to systematic analyses of downstream risks, such as representational harms. We provide a framework designed to help RAI practitioners more easily plan and structure analyses of how people are represented in unstructured data and identify downstream risks. The framework is organized into groups of analyses that map to 3 basic questions: 1) Who is represented in the data, 2) What content is in the data, and 3) How are the two associated. We use the framework to analyze human representation in two commonly used datasets: the Common Crawl web corpus (C4) of 356 billion tokens, and the LAION-400M dataset of 400 million text-image pairs, both developed in the English language. We illustrate how the framework informs action steps for hypothetical teams faced with data use, development, and documentation decisions. Ultimately, the framework structures human representation analyses and maps out analysis planning considerations, goals, and risk mitigation actions at different stages of dataset and model development.

Outlier Detection Bias Busted: Understanding Sources of Algorithmic Bias through Data-centric Factors

Xueying Ding — 2024-10-16

The astonishing successes of ML have raised growing concern for the fairness of modern methods when deployed in real world settings. However, studies on fairness have mostly focused on supervised ML, while unsupervised outlier detection (OD), with numerous applications in finance, security, etc., have attracted little attention. While a few studies proposed fairness-enhanced OD algorithms, they remain agnostic to the underlying driving mechanisms or sources of unfairness. Even within the supervised ML literature, there exists debate on whether unfairness stems solely from algorithmic biases (i.e. design choices) or from the biases encoded in the data on which they are trained. To close this gap, this work aims to shed light on the possible sources of unfairness in OD by auditing detection models under different data-centric factors.By injecting various known biases into the input data---as pertain to sample size disparity, under-representation, feature measurement noise, and group membership obfuscation---we find that the OD algorithms under the study all exhibit fairness pitfalls, although differing in which types of data bias they are more susceptible to. Most notable of our study is to demonstrate that OD algorithm bias is not merely a data bias problem. A key realization is that the data properties that emerge from bias injection could as well be organic---as pertain to natural group differences w.r.t. sparsity, base rate, variance, and multi-modality. Either natural or biased, such data properties can give rise to unfairness as they interact with certain algorithmic design choices. Our work provides a deeper understanding of the possible sources of OD unfairness, and serves as a framework for assessing the unfairness of future OD algorithms under specific data-centric factors. It also paves the way for future work on mitigation strategies by underscoring the susceptibility of various design choices.

Legitimating Emotion Tracking Technologies in Driver Monitoring Systems

Aaron Doerfler — 2024-10-16

Contemporary automobiles are now incorporating digital technologies, including emotion recognition technologies intended to monitor and sometimes intervene on the driver’s mood, attentiveness, or emotional state. We investigate how the firms producing these technologies justify and legitimate their design, production, and use, and how these discourses of legitimation paint a picture of the desired social role of emotion recognition in the automotive sector. Through a critical discourse analysis of patents, advertising, and promotional materials from industry-leading companies Cerence and Affectiva/Smart Eye, we argue both companies use potentially spurious arguments about the accuracy of emotion recognition to rationalize their products. Both companies also use a variety of other legitimation techniques around driver safety, individual personalization, and increased productivity to re-frame the social aspects of digitally mediated autonomous vehicles on their terms.

Representation Magnitude Has a Liability to Privacy Vulnerability

Xingli Fang — 2024-10-16

The privacy-preserving approaches to machine learning (ML) models have made substantial progress in recent years. However, it is still opaque in which circumstances and conditions the model becomes privacy-vulnerable, leading to a challenge for ML models to maintain both performance and privacy. In this paper, we first explore the disparity between member and non-member data in the representation of models under common training frameworks.We identify how the representation magnitude disparity correlates with privacy vulnerability and address how this correlation impacts privacy vulnerability. Based on the observations, we propose Saturn Ring Classifier Module (SRCM), a plug-in model-level solution to mitigate membership privacy leakage. Through a confined yet effective representation space, our approach ameliorates models’ privacy vulnerability while maintaining generalizability. The code of this work can be found here: https://github.com/JEKimLab/AIES2024SRCM

Red-Teaming for Generative AI: Silver Bullet or Security Theater?

Michael Feffer — 2024-10-16

In response to rising concerns surrounding the safety, security, and trustworthiness of Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red-teaming as a key component of their strategies for identifying and mitigating these risks. However, despite AI red-teaming’s central role in policy discussions and corporate messaging, significant questions remain about what precisely it means, what role it can play in regulation, and how it relates to conventional red-teaming practices as originally conceived in the field of cybersecurity. In this work, we identify recent cases of red-teaming activities in the AI industry and conduct an extensive survey of relevant research literature to characterize the scope, structure, and criteria for AI red-teaming practices. Our analysis reveals that prior methods and practices of AI red-teaming diverge along several axes, including the purpose of the activity (which is often vague), the artifact under evaluation, the setting in which the activity is conducted (e.g., actors, resources, and methods), and the resulting decisions it informs (e.g., reporting, disclosure, and mitigation). In light of our findings, we argue that while red-teaming may be a valuable big-tent idea for characterizing GenAI harm mitigations, and that industry may effectively apply red-teaming and other strategies behind closed doors to safeguard AI, gestures towards red-teaming (based on public definitions) as a panacea for every possible risk verge on security theater. To move toward a more robust toolbox of evaluations for generative AI, we synthesize our recommendations into a question bank meant to guide and scaffold future AI red-teaming practices.

How Should AI Decisions Be Explained? Requirements for Explanations from the Perspective of European Law

Benjamin Fresz — 2024-10-16

This paper investigates the relationship between law and eXplainable Artificial Intelligence (XAI). While there is much discussion about the AI Act, which was adopted by the European Parliament in March 2024, other areas of law seem underexplored. This paper focuses on European (and in part German) law, although with international concepts and regulations such as fiduciary duties, the General Data Protection Regulation (GDPR), and product safety and liability. Based on XAI-taxonomies, requirements for XAI methods are derived from each of the legal fields, resulting in the conclusion that each legal field requires different XAI properties and that the current state of the art does not fulfill these to full satisfaction, especially regarding the correctness (sometimes called fidelity) and confidence estimates of XAI methods.

Surviving in Diverse Biases: Unbiased Dataset Acquisition in Online Data Market for Fair Model Training

Jiashi Gao — 2024-10-16

The online data markets have emerged as a valuable source of diverse datasets for training machine learning (ML) models. However, datasets from different data providers may exhibit varying levels of bias with respect to certain sensitive attributes in the population (such as race, sex, age, and marital status). Recent dataset acquisition research has focused on maximizing accuracy improvements for downstream model training, ignoring the negative impact of biases in the acquired datasets, which can lead to an unfair model. Can a consumer obtain an unbiased dataset from datasets with diverse biases? In this work, we propose a fairness-aware data acquisition framework (FAIRDA) to acquire high-quality datasets that maximize both accuracy and fairness for consumer local classifier training while remaining within a limited budget. Given the biases of data commodities remain opaque to consumers, the data acquisition in FAIRDA employs explore-exploit strategies. Based on whether exploration and exploitation are conducted sequentially or alternately, we introduce two algorithms: the knowledge-based offline data acquisition (KDA) and the reward-based online data acquisition algorithms (RDA). Each algorithm is tailored to specific customer needs, giving the former an advantage in computational efficiency and the latter an advantage in robustness. We conduct experiments to demonstrate the effectiveness of the proposed data acquisition framework in steering users toward fairer model training compared to existing baselines under varying market settings.

“I Don’t See Myself Represented Here at All”: User Experiences of Stable Diffusion Outputs Containing Representational Harms across Gender Identities and Nationalities

Sourojit Ghosh — 2024-10-16

Though research into text-to-image generators (T2Is) such as Stable Diffusion has demonstrated their amplification of societal biases and potentials to cause harm, such research has primarily relied on computational methods instead of seeking information from real users who experience harm, which is a significant knowledge gap. In this paper, we conduct the largest human subjects study of Stable Diffusion, with a combination of crowdsourced data from 133 crowdworkers and 14 semi-structured interviews across diverse countries and genders. Through a mixed-methods approach of intra-set cosine similarity hierarchies (i.e., comparing multiple Stable Diffusion outputs for the same prompt with each other to examine which result is `closest' to the prompt) and qualitative thematic analysis, we first demonstrate a large disconnect between user expectations for Stable Diffusion outputs with those generated, evidenced by a set of Stable Diffusion renditions of `a Person' providing images far away from such expectations. We then extend this finding of general dissatisfaction into highlighting representational harms caused by Stable Diffusion upon our subjects, especially those with traditionally marginalized identities, subjecting them to incorrect and often dehumanizing stereotypes about their identities. We provide recommendations for a harm-aware approach to (re)design future versions of Stable Diffusion and other T2Is.

Do Generative AI Models Output Harm while Representing Non-Western Cultures: Evidence from A Community-Centered Approach

Sourojit Ghosh — 2024-10-16

Our research investigates the impact of Generative Artificial Intelligence (GAI) models, specifically text-to-image generators (T2Is), on the representation of non-Western cultures, with a focus on Indian contexts. Despite the transformative potential of T2Is in content creation, concerns have arisen regarding biases that may lead to misrepresentations and marginalizations. Through a Non-Western community-centered approach and grounded theory analysis of 5 focus groups from diverse Indian subcultures, we explore how T2I outputs to English input prompts depict Indian culture and its subcultures, uncovering novel representational harms such as exoticism and cultural misappropriation. These findings highlight the urgent need for inclusive and culturally sensitive T2I systems. We propose design guidelines informed by a sociotechnical perspective, contributing to the development of more equitable and representative GAI technologies globally. Our work underscores the necessity of adopting a community-centered approach to comprehend the sociotechnical dynamics of these models, complementing existing work in this space while identifying and addressing the potential negative repercussions and harms that may arise as these models are deployed on a global scale.

Interpretations, Representations, and Stereotypes of Caste within Text-to-Image Generators

Sourojit Ghosh — 2024-10-16

The surge in the popularity of text-to-image generators (T2Is) has been matched by extensive research into ensuring fairness and equitable outcomes, with a focus on how they impact society. However, such work has typically focused on globally-experienced identities or centered Western contexts. In this paper, we address interpretations, representations, and stereotypes surrounding a tragically underexplored context in T2I research: caste. We examine how the T2I Stable Diffusion displays people of various castes, and what professions they are depicted as performing. Generating 100 images per prompt, we perform CLIP-cosine similarity comparisons with default depictions of an `Indian person’ by Stable Diffusion, and explore patterns of similarity. Our findings reveal how Stable Diffusion outputs perpetuate systems of `castelessness’, equating Indianness with high-castes and depicting caste-oppressed identities with markers of poverty. In particular, we note the stereotyping and representational harm towards the historically-marginalized Dalits, prominently depicted as living in rural areas and always at protests. Our findings underscore a need for a caste-aware approach towards T2I design, and we conclude with design recommendations.

The PPOu Framework: A Structured Approach for Assessing the Likelihood of Malicious Use of Advanced AI Systems

Josh A. Goldstein — 2024-10-16

The diffusion of increasingly capable AI systems has produced concern that bad actors could intentionally misuse current or future AI systems for harm. Governments have begun to create new entities—such as AI Safety Institutes—tasked with assessing these risks. However, approaches for risk assessment are currently fragmented and would benefit from broader disciplinary expertise. As it stands, it is often unclear whether concerns about malicious use misestimate the likelihood and severity of the risks. This article advances a conceptual framework to review and structure investigation into the likelihood of an AI system (X) being applied to a malicious use (Y). We introduce a three-stage framework of (1) Plausibility (can X be used to do Y at all?), (2) Performance (how well does X do Y?), and (3) Observed use (do actors use X to do Y in practice?). At each stage, we outline key research questions, methodologies, benefits and limitations, and the types of uncertainty addressed. We also offer ideas for directions to improve risk assessment moving forward.

Risks from Language Models for Automated Mental Healthcare: Ethics and Structure for Implementation (Extended Abstract)

Declan Grabb — 2024-10-16

In the United States and other countries exists a “national mental health crisis”: Rates of suicide, depression, anxiety, substance use, and more continue to increase – exacerbated by isolation, the COVID pandemic, and, most importantly, lack of access to mental healthcare. Therefore, many are looking to AI-enabled digital mental health tools, which have the potential to reach many patients who would otherwise remain on wait lists or without care. The main drive behind these new tools is the focus on large language models that could enable real-time, personalized support and advice for patients. With a trend towards language models entering the mental healthcare delivery apparatus, questions arise about how a robust, high-level framework to guide ethical implementations would look like and whether existing language models are ready for this high-stakes application where individual failures can lead to dire consequences. This paper addresses the ethical and practical challenges custom to mental health applications and proposes a structured framework that delineates levels of autonomy, outlines ethical requirements, and defines beneficial default behaviors for AI agents in the context of mental health support. We also evaluate fourteen state-of-the-art language models (ten off-the-shelf, four fine-tuned) using 16 mental health-related questions designed to reflect various mental health conditions, such as psychosis, mania, depression, suicidal thoughts, and homicidal tendencies. The question design and response evaluations were conducted by mental health clinicians (M.D.s) with defined rubrics and criteria for each question that would define "safe," "unsafe," and "borderline" (between safe and unsafe) for reproducibility. We find that all tested language models are insufficient to match the standard provided by human professionals who can navigate nuances and appreciate context. This is due to a range of issues, including overly cautious or sycophantic responses and the absence of necessary safeguards. Alarmingly, we find that most of the tested models could cause harm if accessed in mental health emergencies, failing to protect users and potentially exacerbating existing symptoms. We explore solutions to enhance the safety of current models based on system prompt engineering and model-generated self-critiques. Before the release of increasingly task-autonomous AI systems in mental health, it is crucial to ensure that these models can reliably detect and manage symptoms of common psychiatric disorders to prevent harm to users. This involves aligning with the ethical framework and default behaviors outlined in our study. We contend that model developers are responsible for refining their systems per these guidelines to safeguard against the risks posed by current AI technologies to user mental health and safety. Our code and the redacted data set are available on Github (github.com/maxlampe/taimh_eval, MIT License). The full, unredacted data set is available upon request due to the harmful content contained.

Compassionate AI for Moral Decision-Making, Health, and Well-Being

Mark Graves — 2024-10-16

The rapid expansion of artificial intelligence (AI) technology promises plausible increases to human flourishing, health, and well-being but raises concerns about possible harms and increased suffering. By making AI compassionate, the alleviation of suffering becomes explicit, rather than proxied, and potential harms caused by AI automation can be turned into benefits. Compassionate healthcare is beneficial for patient health outcomes and satisfaction and improves caregiver resilience and burnout. AI automation has many benefits but may interfere with patient care and autonomy. Incorporating compassion into healthcare reduces potential harms, increases health benefits and well-being, and can protect patient autonomy while providing more responsive and equitable care. Whether and how one conceives of AI as plausibly compassionate depends on ethical concerns and cultural context, including assumptions about human nature and AI personhood. Insights from Buddhism have contributed to scholarship on compassion and can extend incomplete Western perspectives on AI possibilities and limitations. Psychological research on the elements of compassion can guide development of compassionate AI and its incorporation into healthcare. Compassionate AI can be deployed especially into application areas where compassion plays an essential role with high demands on the compassion capacity of caregivers, such as dementia eldercare and palliative care.

A Conceptual Framework for Ethical Evaluation of Machine Learning Systems

Neha R. Gupta — 2024-10-16

Research in Responsible AI has developed a range of principles and practices to ensure that machine learning systems are used in a manner that is ethical and aligned with human values. However, a critical yet often neglected aspect of ethical ML is the ethical implications that appear when designing evaluations of ML systems. For instance, teams may have to balance a trade-off between highly informative tests to ensure downstream product safety, with potential fairness harms inherent to the implemented testing procedures. We conceptualize ethics-related concerns in standard ML evaluation techniques. Specifically, we present a utility framework, characterizing the key trade-off in ethical evaluation as balancing information gain against potential ethical harms. The framework is then a tool for characterizing challenges teams face, and systematically disentangling competing considerations that teams seek to balance. Differentiating between different types of issues encountered in evaluation allows us to highlight best practices from analogous domains, such as clinical trials and automotive crash testing, which navigate these issues in ways that can offer inspiration to improve evaluation processes in ML. Our analysis underscores the critical need for development teams to deliberately assess and manage ethical complexities that arise during the evaluation of ML systems, and for the industry to move towards designing institutional policies to support ethical evaluations.

Identifying Implicit Social Biases in Vision-Language Models

Kimia Hamidieh — 2024-10-16

Vision-language models, like CLIP (Contrastive Language Image Pretraining), are becoming increasingly popular for a wide range of multimodal retrieval tasks. However, prior work has shown that large language and deep vision models can learn historical biases contained in their training sets, leading to perpetuation of stereotypes and potential downstream harm. In this work, we conduct a systematic analysis of the social biases that are present in CLIP, with a focus on the interaction between image and text modalities. We first propose a taxonomy of social biases called So-B-It, which contains 374 words categorized across ten types of bias. Each type can lead to societal harm if associated with a particular demographic group. Using this taxonomy, we examine images retrieved by CLIP from a facial image dataset using each word as part of a prompt. We find that CLIP frequently displays undesirable associations between harmful words and specific demographic groups, such as retrieving mostly pictures of Middle Eastern men when asked to retrieve images of a "terrorist". Finally, we conduct an analysis of the source of such biases, by showing that the same harmful stereotypes are also present in a large image-text dataset used to train CLIP models for examples of biases that we find. Our findings highlight the importance of evaluating and addressing bias in vision-language models, and suggest the need for transparency and fairness-aware curation of large pre-training datasets.

A Causal Framework to Evaluate Racial Bias in Law Enforcement Systems

Jessy Xinyi Han — 2024-10-16

We are interested in developing a data-driven method to evaluate race-induced biases in law enforcement systems. While recent works have addressed this question in the context of police-civilian interactions using police stop data, they have two key limitations. First, bias can only be properly quantified if true criminality is accounted for in addition to race, but it is absent in prior works. Second, law enforcement systems are multi-stage and hence it is important to isolate the true source of bias within the "causal chain of interactions" rather than simply focusing on the end outcome; this can help guide reforms. In this work, we address these challenges by presenting a multi-stage causal framework incorporating criminality. We provide a theoretical characterization and an associated data-driven method to evaluate (a) the presence of any form of racial bias, and (b) if so, the primary source of such a bias in terms of race and criminality. Our framework identifies three canonical scenarios with distinct characteristics: in settings like (1) airport security, the primary source of observed bias against a race is likely to be bias in law enforcement against innocents of that race; (2) AI-empowered policing, the primary source of observed bias against a race is likely to be bias in law enforcement against criminals of that race; and (3) police-civilian interaction, the primary source of observed bias against a race could be bias in law enforcement against that race or bias from the general public in reporting (e.g. via 911 calls) against the other race. Through an extensive empirical study using police-civilian interaction (stop) data and 911 call data, we And an instance of such a counter-intuitive phenomenon: in New Orleans, the observed bias is against the majority race and the likely reason for it is the over-reporting (via 911 calls) of incidents involving the minority race by the general public.

Contributory Injustice, Epistemic Calcification and the Use of AI Systems in Healthcare

Mahi Hardalupas — 2024-10-16

AI systems have long been touted as a means to transform the healthcare system and improve service user outcomes. However, these claims frequently ignore the social context that leaves service users subject to epistemic oppression. This paper introduces the term “epistemic calcification” to describe how the use of AI systems leads to our epistemological systems becoming stuck in fixed frameworks for understanding the world. Epistemic calcification leads to contributory injustice as it reduces the ability of healthcare systems to meaningfully consider alternative understandings of people’s health experiences. By analysing examples of algorithmic prognosis and diagnosis, this paper demonstrates the challenges of addressing contributory injustice in AI systems and the need for contestability to focus on more than the AI system and on the underlying epistemologies of AI systems.

ExploreGen: Large Language Models for Envisioning the Uses and Risks of AI Technologies

Viviane Herdel — 2024-10-16

Responsible AI design is increasingly seen as an imperative by both AI developers and AI compliance experts. One of the key tasks is envisioning AI technology uses and risks. Recent studies on the model and data cards reveal that AI practitioners struggle with this task due to its inherently challenging nature. Here, we demonstrate that leveraging a Large Language Model (LLM) can support AI practitioners in this task by enabling reflexivity, brainstorming, and deliberation, especially in the early design stages of the AI development process. We developed an LLM framework, ExploreGen, which generates realistic and varied uses of AI technology, including those overlooked by research, and classifies their risk level based on the EU AI Act regulation. We evaluated our framework using the case of Facial Recognition and Analysis technology in nine user studies with 25 AI practitioners. Our findings show that ExploreGen is helpful to both developers and compliance experts. They rated the uses as realistic and their risk classification as accurate (94.5%). Moreover, while unfamiliar with many of the uses, they rated them as having high adoption potential and transformational impact.

What's Distributive Justice Got to Do with It? Rethinking Algorithmic Fairness from a Perspective of Approximate Justice

Corinna Hertweck — 2024-10-16

In the field of algorithmic fairness, many fairness criteria have been proposed. Oftentimes, their proposal is only accompanied by a loose link to ideas from moral philosophy -- which makes it difficult to understand when the proposed criteria should be used to evaluate the fairness of a decision-making system. More recently, researchers have thus retroactively tried to tie existing fairness criteria to philosophical concepts. Group fairness criteria have typically been linked to egalitarianism, a theory of distributive justice. This makes it tempting to believe that fairness criteria mathematically represent ideals of distributive justice and this is indeed how they are typically portrayed. In this paper, we will discuss why the current approach of linking algorithmic fairness and distributive justice is too simplistic and, hence, insufficient. We argue that in the context of imperfect decision-making systems -- which is what we deal with in algorithmic fairness -- we should not only care about what the ideal distribution of benefits/harms among individuals would look like but also about how deviations from said ideal are distributed. Our claim is that algorithmic fairness is concerned with unfairness in these deviations. This requires us to rethink the way in which we, as algorithmic fairness researchers, view distributive justice and use fairness criteria.

Afrofuturist Values for the Metaverse (Extended Abstract)

Theresa Hice-Fromille — 2024-10-16

Many emerging technologies, such as the immersive VR and AR devices forming the metaverse, are not just reminiscent of but inspired by devices found in popular science fiction texts. Yet, the stories that these technologies are drawn from do not often center marginalized communities and people of color. In this article, we propose that builders and users of these technologies turn to diverse creative texts as inspiration for the ethical codes that will shape the ways that these technologies are built and used. A study of 39 speculative fiction texts, including 20 that we identified as Afrofuturist, revealed three overarching themes that serve as recommendations for the creation and maintenance of a diverse and inclusive metaverse: Collective Power, Inclusive Engagement, and Cultural Specificity. We outline each recommendation through a textual analysis of three Afrofuturist texts – Esi Edugyan’s Washington Black (2018), Roger Ross Williams’ Traveling While Black (2019), and Ryan Coogler’s Black Panther (2018) – and specify the undercurrents of collectivity and co-production that bind them together. We suggest collaborative and critical reading methods for industry professionals and community members which may help to shape democratic processes governing the future of AI.

The Ethico-Politics of Design Toolkits: Responsible AI Tools, From Big Tech Guidelines to Feminist Ideation Cards (Extended Abstract)

Tomasz Hollanek — 2024-10-16

This paper interrogates the belief in toolkitting as a method for translating AI ethics theory into practice and assesses the toolkit paradigm’s effect on the understanding of ethics in AI research and AI-related policy. I start by exploring the ethico-political assumptions that underly most ethical AI toolkits. Through a meta-critique of toolkits (drawing on a review of existing ‘toolkit-scoping’ work), I demon-strate that most toolkits embody a reductionist conception of ethics and that, because of this, their capacity for facili-tating change and challenging the status quo is limited. Then, I analyze the features of several ‘alternative’ toolkits – informed by feminist theory, posthumanism, and critical design – whose creators recognize that ethics cannot be-come a box-ticking exercise for engineers, while the ethical should not be dissociated from the political. Finally, in the concluding section, referring to broader theories and cri-tiques of toolkitting as a method for structuring the design process, I suggest how different stakeholders can draw on the myriad of available tools, ranging from big tech com-panies’ guidelines to feminist design ideation cards, to achieve positive, socially desirable results, while rejecting the oversimplification of ethical practice and technosolu-tionism that many responsible AI toolkits embody. The analysis thus serves to provide suggestions for future toolkit creators and users on how to meaningfully adopt the toolkit format in AI ethics work without overselling its transformative potential.

LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins

Umar Iqbal — 2024-10-16

Large language model (LLM) platforms, such as ChatGPT, have recently begun offering an app ecosystem to interface with third-party services on the internet. While these apps extend the capabilities of LLM platforms, they are developed by arbitrary third parties and thus cannot be implicitly trusted. Apps also interface with LLM platforms and users using natural language, which can have imprecise interpretations. In this paper, we propose a framework that lays a foundation for LLM platform designers to analyze and improve the security, privacy, and safety of current and future third-party integrated LLM platforms. Our framework is a formulation of an attack taxonomy that is developed by iteratively exploring how LLM platform stakeholders could leverage their capabilities and responsibilities to mount attacks against each other. As part of our iterative process, we apply our framework in the context of OpenAI's plugin (apps) ecosystem. We uncover plugins that concretely demonstrate the potential for the types of issues that we outline in our attack taxonomy. We conclude by discussing novel challenges and by providing recommendations to improve the security, privacy, and safety of present and future LLM-based computing platforms. The full version of this paper is available online at https://arxiv.org/abs/2309.10254

As an AI Language Model, "Yes I Would Recommend Calling the Police": Norm Inconsistency in LLM Decision-Making

Shomik Jain — 2024-10-16

We investigate the phenomenon of norm inconsistency: where LLMs apply different norms in similar situations. Specifically, we focus on the high-risk application of deciding whether to call the police in Amazon Ring home surveillance videos. We evaluate the decisions of three state-of-the-art LLMs — GPT-4, Gemini 1.0, and Claude 3 Sonnet — in relation to the activities portrayed in the videos, the subjects' skin-tone and gender, and the characteristics of the neighborhoods where the videos were recorded. Our analysis reveals significant norm inconsistencies: (1) a discordance between the recommendation to call the police and the actual presence of criminal activity, and (2) biases influenced by the racial demographics of the neighborhoods. These results highlight the arbitrariness of model decisions in the surveillance context and the limitations of current bias detection and mitigation strategies in normative decision-making.

Breaking the Global North Stereotype: A Global South-centric Benchmark Dataset for Auditing and Mitigating Biases in Facial Recognition Systems

Siddharth Jaiswal — 2024-10-16

Facial Recognition Systems (FRSs) are being developed and deployed all around the world at unprecedented rates. Most platforms are designed in a limited set of countries, but deployed in other regions too, without adequate checkpoints for region-specific requirements. This is especially problematic for Global South countries which lack strong legislation to safeguard persons facing disparate performance of these systems. A combination of unavailability of datasets, lack of understanding of how FRSs function and low-resource bias mitigation measures accentuate the problems at hand. In this work, we propose a self-curated face dataset composed of 6,579 unique male and female sports-persons (cricket players) from eight countries around the world. More than 50% of the dataset is composed of individuals from the Global South countries and is demographically diverse. To aid adversarial audits and robust model training, we curate four adversarial variants of each image in the dataset, leading to more than 40,000 distinct images. We also use this dataset to benchmark five popular facial recognition systems (FRSs), including both commercial and open-source FRSs, for the task of gender prediction (and country prediction for one of the open-source models as an example of red-teaming). Experiments on industrial FRSs reveal accuracies ranging from 98.2% (in case of Azure) to 38.1% (in case of Face++), with a large disparity between males and females in the Global South (max difference of 38.5% in case of Face++). Biases are also observed in all FRSs between females of the Global North and South (max difference of ~50%). A Grad-CAM analysis shows that the nose, forehead and mouth are the regions of interest for one of the open-source FRSs. Based on this crucial observation, we design simple, low-resource bias mitigation solutions using few-shot and novel contrastive learning techniques that demonstrate a significant improvement in accuracy with disparity between males and females reducing from 50% to 1.5% in one of the settings. For the red-teaming experiment using the open-source Deepface model we observe that simple fine-tuning is not very useful while contrastive learning brings steady benefits.

Reflection of Its Creators: Qualitative Analysis of General Public and Expert Perceptions of Artificial Intelligence

Theodore Jensen — 2024-10-16

The increasing prevalence of artificial intelligence (AI) will likely lead to new interactions and impacts for the general public. An understanding of people’s perceptions of AI can be leveraged to design and deploy AI systems toward human needs and values. We conducted semi-structured interviews with 25 individuals in the general public and 20 AI experts in the United States (U.S.) to assess perceptions of AI across levels of expertise. Qualitative analysis revealed that ideas about humanness and ethics were central to perceptions of AI in both groups. Humanness, the set of traits considered to distinguish humans from other intelligent actors, was used to articulate beliefs about AI’s characteristics. Ethics arose in discussions of the role of technology in society and centered around views of AI as made and used by people. General public and expert participants expressed similar perceptions of AI, but articulated beliefs slightly differently. We discuss the implications of humanness-related beliefs and ethical concerns for AI development and deployment.

Virtual Assistants Are Unlikely to Reduce Patient Non-Disclosure

Corinne Jorgenson — 2024-10-16

The ethical use of AI typically involves setting boundaries on its deployment. Ethical guidelines advise against practices that involve deception, privacy infringement, or discriminatory actions. However, ethical considerations can also identify areas where using AI is desirable and morally necessary. For instance, it has been argued that AI could contribute to more equitable justice systems. Another area where ethical considerations can make AI deployment imperative is healthcare. For example, patients often withhold pertinent details from healthcare providers due to fear of judgment. However, utilizing virtual assistants to gather patients' health histories could be a potential solution. Ethical imperatives support using such technology if patients are more inclined to disclose information to an AI system. This article presents findings from several survey studies investigating whether virtual assistants can reduce non-disclosure behaviors. Unfortunately, the evidence suggests that virtual assistants are unlikely to minimize non-disclosure. Therefore, the potential benefits of virtual assistants due to reduced non-disclosure are unlikely to outweigh their ethical risks.

Do Responsible AI Artifacts Advance Stakeholder Goals? Four Key Barriers Perceived by Legal and Civil Stakeholders

Anna Kawakami — 2024-10-16

The responsible AI (RAI) community has introduced numerous processes and artifacts---such as Model Cards, Transparency Notes, and Data Cards---to facilitate transparency and support the governance of AI systems. While originally designed to scaffold and document AI development processes in technology companies, these artifacts are becoming central components of regulatory compliance under recent regulations such as the EU AI Act. Much of the existing literature has focussed primarily on the design of new RAI artifacts, or an examination of their use by practitioners within technology companies. However, as RAI artifacts begin to play key roles in enabling external oversight, it becomes critical to understand how stakeholders---particularly stakeholders situated outside of technology companies who govern and audit industry AI deployments---perceive the efficacy of RAI artifacts. In this study, we conduct semi-structured interviews and design activities with 19 government, legal, and civil society stakeholders who inform policy and advocacy around responsible AI efforts. While participants believe that RAI artifacts are a valuable contribution to the RAI ecosystem, many have concerns around their potential unintended and longer-term impacts on actors outside of technology companies (e.g., downstream end-users, policymakers, civil society stakeholders). We organized these beliefs into four barriers that help explain how RAI artifacts may (inadvertently) reconfigure power relations across civil society, government, and industry, impeding civil society and legal stakeholders' ability to protect downstream end-users from potential AI harms. Participants envision how structural changes, along with changes in how RAI artifacts are designed, used, and governed, could help re-direct the role and impacts of artifacts in the RAI ecosystem. Drawing on these findings, we discuss research and policy implications for RAI artifacts.

AI Failure Loops in Feminized Labor: Understanding the Interplay of Workplace AI and Occupational Devaluation

Anna Kawakami — 2024-10-16

A growing body of literature has focused on understanding and addressing workplace AI design failures. However, past work has largely overlooked the role of occupational devaluation in shaping the dynamics of AI development and deployment. In this paper, we examine the case of feminized labor: a class of devalued occupations historically misnomered as ``women's work,'' such as social work, K-12 teaching, and home healthcare. Drawing on literature on AI deployments in feminized labor contexts, we conceptualize AI Failure Loops: a set of interwoven, socio-technical failures that help explain how the systemic devaluation of workers' expertise negatively impacts, and is impacted by, AI design, evaluation, and governance practices. These failures demonstrate how misjudgments on the automatability of workers' skills can lead to AI deployments that fail to bring value and, instead, further diminish the visibility of workers' expertise. We discuss research and design implications for workplace AI, especially for devalued occupations.

Epistemic Injustice in Generative AI

Jackie Kay — 2024-10-16

This paper investigates how generative AI can potentially undermine the integrity of collective knowledge and the processes we rely on to acquire, assess, and trust information, posing a significant threat to our knowledge ecosystem and democratic discourse. Grounded in social and political philosophy, we introduce the concept of generative algorithmic epistemic injustice. We identify four key dimensions of this phenomenon: amplified and manipulative testimonial injustice, along with hermeneutical ignorance and access injustice. We illustrate each dimension with real-world examples that reveal how generative AI can produce or amplify misinformation, perpetuate representational harm, and create epistemic inequities, particularly in multilingual contexts. By highlighting these injustices, we aim to inform the development of epistemically just generative AI systems, proposing strategies for resistance, system design principles, and two approaches that leverage generative AI to foster a more equitable information ecosystem, thereby safeguarding democratic values and the integrity of knowledge production.

Vernacularizing Taxonomies of Harm is Essential for Operationalizing Holistic AI Safety

Wm. Matthew Kennedy — 2024-10-16

Operationalizing AI ethics and safety principles and frameworks is essential to realizing the potential benefits and mitigating potential harms caused by AI systems. To that end, actors across industry, academia, and regulatory bodies have created formal taxonomies of harm to support operationalization efforts. These include novel “holistic” methods that go beyond exclusive reliance on technical benchmarking. However, our paper argues that such taxonomies are still too general to be readily implemented in sector-specific AI safety operationalization efforts, and especially in underresourced or “high-risk” sectors. This is because many sectors are constituted by discourses, norms, and values that “refract” or even directly conflict with those operating in society more broadly. Drawing from emerging anthropological theories of human rights, we propose that the process of “vernacularization”—a participatory, decolonial practice distinct from doctrinary “translation” (the dominant mode of AI safety operationalization)—can help bridge this gap. To demonstrate this point, we consider the education sector, and identify precisely how vernacularizing a leading taxonomy of harm leads to a clearer view of how harms AI systems may cause are substantially intensified when deployed in educational spaces. We conclude by discussing the generalizability of vernacularization as a useful AI safety methodology.

On the Pros and Cons of Active Learning for Moral Preference Elicitation

Vijay Keswani — 2024-10-16

Computational preference elicitation methods are tools used to learn people’s preferences quantitatively in a given context. Recent works on preference elicitation advocate for active learning as an efficient method to iteratively construct queries (framed as comparisons between context-specific cases) that are likely to be most informative about an agent’s underlying preferences. In this work, we argue that the use of active learning for moral preference elicitation relies on certain assumptions about the underlying moral preferences, which can be violated in practice. Specifically, we highlight the following common assumptions (a) preferences are stable over time and not sensitive to the sequence of presented queries, (b) the appropriate hypothesis class is chosen to model moral preferences, and (c) noise in the agent’s responses is limited. While these assumptions can be appropriate for preference elicitation in certain domains, prior research on moral psychology suggests they may not be valid for moral judgments. Through a synthetic simulation of preferences that violate the above assumptions, we observe that active learning can have similar or worse performance than a basic random query selection method in certain settings. Yet, simulation results also demonstrate that active learning can still be viable if the degree of instability or noise is relatively small and when the agent’s preferences can be approximately represented with the hypothesis class used for learning. Our study highlights the nuances associated with effective moral preference elicitation in practice and advocates for the cautious use of active learning as a methodology to learn moral preferences.

Algorithmic Fairness From the Perspective of Legal Anti-discrimination Principles

Vijay Keswani — 2024-10-16

Real-world applications of machine learning (ML) algorithms often propagate negative stereotypes and social biases against marginalized groups. In response, the field of fair machine learning has proposed technical solutions for a variety of settings that aim to correct the biases in algorithmic predictions. These solutions remove the dependence of the final prediction on the protected attributes (like gender or race) and/or ensure that prediction performance is similar across demographic groups. Yet, recent studies assessing the impact of these solutions in practice demonstrate their ineffectiveness in tackling real-world inequalities. Given this lack of real-world success, it is essential to take a step back and question the design motivations of algorithmic fairness interventions. We use popular legal anti-discriminatory principles, specifically anti-classification and anti-subordination principles, to study the motivations of fairness interventions and their applications. The anti-classification principle suggests addressing discrimination by ensuring that decision processes and outcomes are independent of the protected attributes of individuals. The anti-subordination principle, on the other hand, argues that decision-making policies can provide equal protection to all only by actively tackling societal hierarchies that enable structural discrimination, even if that requires using protected attributes to address historical inequalities. Through a survey of the fairness mechanisms and applications, we assess different components of fair ML approaches from the perspective of these principles. We argue that the observed shortcomings of fair ML algorithms are similar to the failures of anti-classification policies and that these shortcomings constitute violations of the anti-subordination principle. Correspondingly, we propose guidelines for algorithmic fairness interventions to adhere to the anti-subordination principle. In doing so, we hope to bridge critical concepts between legal frameworks for non-discrimination and fairness in machine learning.

What’s Your Stake in Sustainability of AI?: An Informed Insider’s Guide

Grace C. Kim — 2024-10-16

It's no secret that AI systems come with a significant environmental cost. This raises the question: What are the roles and responsibilities of computing professionals regarding the sustainability of AI? Informed by a year-long informal literature review on the subject, we employ stakeholder identification, analysis, and mapping to highlight the complex and interconnected roles that five major stakeholder groups (industry, practitioners, regulatory, advocacy, and the general public) play in the sustainability of AI. Swapping the traditional final step of stakeholder methods (stakeholder engagement) for entanglement, we demonstrate the inherent entwinement of choices made with regard to the development and maintenance of AI systems and the people who impact (or are impacted by) these choices. This entanglement should be understood as a system of human and non-human agents, with the implications of each choice ricocheting into the use of natural resources and climate implications. We argue that computing professionals (AI-focused or not) may belong to multiple stakeholder groups, and that we all have multiple roles to play in the sustainability of AI. Further, we argue that the nature of regulation in this domain will look unlike others in environmental preservation (e.g., legislation around water contaminants). As a result, we call for ongoing, flexible bodies and policies to move towards the regulation of AI from a sustainability angle, as well as suggest ways in which individual computing professionals can contribute to fighting the environmental and climate effects of AI.

Anticipating the Risks and Benefits of Counterfactual World Simulation Models (Extended Abstract)

Lara Kirfel — 2024-10-16

This paper examines the transformative potential of Counterfactual World Simulation Models (CWSMs). CWSMs use pieces of multi-modal evidence, such as the CCTV footage or sound recordings of a road accident, to build a high-fidelity 3D reconstruction of the scene. They can also answer causal questions, such as whether the accident happened because the driver was speeding, by simulating what would have happened in relevant counterfactual situations. CWSMs will enhance our capacity to envision alternate realities and investigate the outcomes of counterfactual alterations to how events unfold. This also, however, raises questions about what alternative scenarios we should be considering and what to do with that knowledge. We present a normative and ethical framework that guides and constrains the simulation of counterfactuals. We address the challenge of ensuring fidelity in reconstructions while simultaneously preventing stereotype perpetuation during counterfactual simulations. We anticipate different modes of how users will interact with CWSMs and discuss how their outputs may be presented. Finally, we address the prospective applications of CWSMs in the legal domain, recognizing both their potential to revolutionize legal proceedings as well as the ethical concerns they engender. Anticipating a new type of AI, this paper seeks to illuminate a path forward for responsible and effective use of CWSMs.

Acceptable Use Policies for Foundation Models

Kevin Klyman — 2024-10-16

As foundation models have accumulated hundreds of millions of users, developers have begun to take steps to prevent harmful types of uses. One salient intervention that foundation model developers adopt is acceptable use policies—legally binding policies that prohibit users from using a model for specific purposes. This paper identifies acceptable use policies from 30 foundation model developers, analyzes the use restrictions they contain, and argues that acceptable use policies are an important lens for understanding the regulation of foundation models. Taken together, developers’ acceptable use policies include 127 distinct use restrictions; the wide variety in the number and type of use restrictions may create fragmentation across the AI supply chain. Companies also employ acceptable use policies to prevent competitors or specific industries from making use of their models. Developers alone decide what constitutes acceptable use, and rarely provide transparency about how they enforce their policies. In practice, acceptable use policies are difficult to enforce, and scrupulous enforcement can act as a barrier to researcher access and limit beneficial uses of foundation models. Acceptable use policies for foundation models are an early example of self-regulation that have a significant impact on the market for foundation models and the AI ecosystem.

Responsible Reporting for Frontier AI Development

Noam Kolt — 2024-10-16

Mitigating the risks from frontier AI systems requires up-to-date and reliable information about those systems. Organizations that develop and deploy frontier systems have significant access to such information. By reporting safety-critical information to actors in government, industry, and civil society, these organizations could improve visibility into new and emerging risks posed by frontier systems. Equipped with this information, developers could make better informed decisions on risk management, while policymakers could design more targeted and robust regulatory infrastructure. We outline the key features of responsible reporting and propose mechanisms for implementing them in practice.

On the Trade-offs between Adversarial Robustness and Actionable Explanations

Satyapriya Krishna — 2024-10-16

As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the first attempts at studying the impact of adversarially robust models on actionable explanations which provide end users with a means for recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of recourses output by state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. More specifically, we derive theoretical bounds on the differences between the cost and the validity of the recourses generated by state-of-the-art algorithms for adversarially robust vs. non-robust linear and non-linear models. Our empirical results with multiple real-world datasets validate our theoretical results and show the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thus shedding light on the inherent trade-offs between adversarial robustness and actionable explanations.

Observing Context Improves Disparity Estimation when Race is Unobserved

Kweku Kwegyir-Aggrey — 2024-10-16

In many domains, it is difficult to obtain the race data that is required to estimate racial disparity. To address this problem, practitioners have adopted the use of proxy methods which predict race using non-protected covariates. However, these proxies often yield biased estimates, especially for minority groups, limiting their real-world utility. In this paper, we introduce two new contextual proxy models that advance existing methods by incorporating contextual features in order to improve race estimates. We show that these algorithms demonstrate significant performance improvements in estimating disparities, on real-world home loan and voter data. We establish that achieving unbiased disparity estimates with contextual proxies relies on mean-consistency, a calibration-like condition.

Human vs. Machine: Behavioral Differences between Expert Humans and Language Models in Wargame Simulations

Max Lamparth — 2024-10-16

To some, the advent of artificial intelligence (AI) promises better decision-making and increased military effectiveness while reducing the influence of human error and emotions. However, there is still debate about how AI systems, especially large language models (LLMs) that can be applied to many tasks, behave compared to humans in high-stakes military decision-making scenarios with the potential for increased risks towards escalation and unnecessary conflicts. To test this potential and scrutinize the use of LLMs for such purposes, we use a new wargame experiment with 107 national security experts designed to examine crisis escalation in a fictional US-China scenario and compare the behavior of human player teams to LLM-simulated team responses in separate simulations. Wargames have a long history in the development of military strategy and the response of nations to threats or attacks. Here, we find that the LLM-simulated responses can be more aggressive and significantly affected by changes in the scenario. We show a considerable high-level agreement in the LLM and human responses and significant quantitative and qualitative differences in individual actions and strategic tendencies. These differences depend on intrinsic biases in LLMs regarding the appropriate level of violence following strategic instructions, the choice of LLM, and whether the LLMs are tasked to decide for a team of players directly or first to simulate dialog between a team of players. When simulating the dialog, the discussions lack quality and maintain a farcical harmony. The LLM simulations cannot account for human player characteristics, showing no significant difference even for extreme traits, such as “pacifist” or “aggressive sociopath.” When probing behavioral consistency across individual moves of the simulation, the tested LLMs deviated from each other but generally showed somewhat consistent behavior. Our results motivate policymakers to be cautious before granting autonomy or following AI-based strategy recommendations.

Racial and Neighborhood Disparities in Legal Financial Obligations in Jefferson County, Alabama

Óscar Lara Yejas — 2024-10-16

Legal financial obligations (LFOs) such as court fees and fines are commonly levied on individuals who are convicted of crimes. It is expected that LFO amounts should be similar across social, racial, and geographic subpopulations convicted of the same crime. This work analyzes the distribution of LFOs in Jefferson County, Alabama and highlights disparities across different individual and neighborhood demographic characteristics. Data-driven discovery methods are used to detect subpopulations that experience higher LFOs than the overall population of offenders. Critically, these discovery methods do not rely on pre-specified groups and can assist scientists and researchers investigate socially-sensitive hypotheses in a disciplined way. Some findings, such as individuals who are Black, live in Black-majority neighborhoods, or live in low-income neighborhoods tending to experience higher LFOs, are commensurate with prior expectation. However others, such as high LFO amounts in worthless instrument (bad check) cases experienced disproportionately by individuals living in affluent majority-white neighborhoods, are more surprising. More broadly than the specific findings, the methodology is shown to identify structural weaknesses that undermine the goal of equal justice under law that can be addressed through policy interventions.

Compute North vs. Compute South: The Uneven Possibilities of Compute-based AI Governance Around the Globe

Vili Lehdonvirta — 2024-10-16

Governments have begun to view AI compute infrastructures, including advanced AI chips, as a geostrategic resource. This is partly because “compute governance” is believed to be emerging as an important tool for governing AI systems. In this governance model, states that host AI compute capacity within their territorial jurisdictions are likely to be better placed to impose their rules on AI systems than states that do not. In this study, we provide the first attempt at mapping the global geography of public cloud GPU compute, one particularly important category of AI compute infrastructure. Using a census of hyperscale cloud providers’ cloud regions, we observe that the world is divided into “Compute North” countries that host AI compute relevant for AI development (ie. training), “Compute South” countries whose AI compute is more relevant for AI deployment (ie. running inferencing), and “Compute Desert” countries that host no public cloud AI compute at all. We generate potential explanations for the results using expert interviews, discuss the implications to AI governance and technology geopolitics, and consider possible future trajectories.

How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine Studies

Alina Leidinger — 2024-10-16

With the widespread availability of LLMs since the release of ChatGPT and increased public scrutiny, commercial model development appears to have focused their efforts on `safety' training concerning legal liabilities at the expense of social impact evaluation. This mimics a similar trend which we could observe for search engine autocompletion some years prior. We draw on scholarship from NLP and search engine auditing and present a novel evaluation task in the style of autocompletion prompts to assess stereotyping in LLMs. We assess LLMs by using four metrics, namely refusal rates, toxicity, sentiment and regard, with and without safety system prompts. Our findings indicate an improvement to stereotyping outputs with the system prompt, but overall a lack of attention by LLMs under study to certain harms classified as toxic, particularly for prompts about peoples/ethnicities and sexual orientation. Mentions of intersectional identities trigger a disproportionate amount of stereotyping. Finally, we discuss the implications of these findings about stereotyping harms in light of the coming intermingling of LLMs and search and the choice of stereotyping mitigation policy to adopt. We address model builders, academics, NLP practitioners and policy makers, calling for accountability and awareness concerning stereotyping harms, be it for training data curation, leader board design and usage, or social impact measurement.

On Feasibility of Intent Obfuscating Attacks

Zhaobin Li — 2024-10-16

Intent obfuscation is a common tactic in adversarial situations, enabling the attacker to both manipulate the target system and avoid culpability. Surprisingly, it has rarely been implemented in adversarial attacks on machine learning systems. We are the first to propose using intent obfuscation to generate adversarial examples for object detectors: by perturbing another non-overlapping object to disrupt the target object, the attacker hides their intended target. We conduct a randomized experiment on 5 prominent detectors---YOLOv3, SSD, RetinaNet, Faster R-CNN, and Cascade R-CNN---using both targeted and untargeted attacks and achieve success on all models and attacks. We analyze the success factors characterizing intent obfuscating attacks, including target object confidence and perturb object sizes. We then demonstrate that the attacker can exploit these success factors to increase success rates for all models and attacks. Finally, we discuss main takeaways and legal repercussions. If you are reading the AAAI/ACM version, please download the technical appendix on arXiv at https://arxiv.org/abs/2408.02674

“Democratizing AI” and the Concern of Algorithmic Injustice (Extended Abstract)

Ting-an Lin — 2024-10-16

The call to make artificial intelligence (AI) more democratic, or to “democratize AI,” is sometimes framed as a promising response for mitigating algorithmic injustice or making AI more aligned with social justice. However, the notion of “democratizing AI” is elusive, as the phrase has been associated with multiple meanings and practices, and the extent to which it may help mitigate algorithmic injustice is still underexplored. In this paper, based on a socio-technical understanding of algorithmic injustice, I examine three notable notions of democratizing AI and their associated measures—democratizing AI use, democratizing AI development, and democratizing AI governance—regarding their respective prospects and limits in response to algorithmic injustice. My examinations reveal that while some versions of democratizing AI bear the prospect of mitigating the concern of algorithmic injustice, others are somewhat limited and might even function to perpetuate unjust power hierarchies. This analysis thus urges a more fine-grained discussion on how to democratize AI and suggests that closer scrutiny of the power dynamics embedded in the socio-technical structure can help guide such explorations.

Foundations for Unfairness in Anomaly Detection - Case Studies in Facial Imaging Data

Michael Livanos — 2024-10-16

Deep anomaly detection (AD) is perhaps the most controversial of data analytic tasks as it identifies entities that are specifically targeted for further investigation or exclusion. Also controversial is the application of AI to facial data, in particular facial recognition. This work explores the intersection of these two areas to understand two core questions: Who these algorithms are being unfair to and equally important why. Recent work has shown that deep AD can be unfair to different groups despite being unsupervised with a recent study showing that for portraits of people: men of color are far more likely to be chosen to be outliers. We study the two main categories of AD algorithms: autoencoder-based and single-class-based which effectively try to compress all the instances and those that can not be easily compressed are deemed to be outliers. We experimentally verify sources of unfairness such as the under-representation of a group (e.g people of color are relatively rare), spurious group features (e.g. men are often photographed with hats) and group labeling noise (e.g. race is subjective). We conjecture that lack of compressibility is the main foundation and the others cause it but experimental results show otherwise and we present a natural hierarchy amongst them.

Uncovering the Gap: Challeging the Agential Nature of AI Responsibility Problems (Extended Abstract)

Joan Llorca Albareda — 2024-10-16

In this paper, I will argue that the responsibility gap arising from new AI systems is reducible to the problem of many hands and collective agency. Systematic analysis of the agential dimension of AI will lead me to outline a disjunctive between the two problems. Either we reduce individual responsibility gaps to the many hands, or we abandon the individual dimension and accept the possibility of responsible collective agencies. Moreover, I will adduce that this conclusion reveals an underlying weakness in AI ethics: the lack of attention to the question of the disciplinary boundaries of AI ethics. This absence has made it difficult to identify the specifics of the responsibility gap arising from new AI systems as compared to the responsibility gaps of other applied ethics. Lastly, I will be concerned with outlining these specific aspects.

Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil

Marcelo Sartori Locatelli — 2024-10-16

The Exame Nacional do Ensino Médio (ENEM) is a pivotal test for Brazilian students, required for admission to a significant number of universities in Brazil. The test consists of four objective high-school level tests on Math, Humanities, Natural Sciences and Languages, and one writing essay. Students' answers to the test and to the accompanying socioeconomic status questionnaire are made public every year (albeit anonymized) due to transparency policies from the Brazilian Government. In the context of large language models (LLMs), these data lend themselves nicely to comparing different groups of humans with AI, as we can have access to human and machine answer distributions. We leverage these characteristics of the ENEM dataset and compare GPT-3.5 and 4, and MariTalk, a model trained using Portuguese data, to humans, aiming to ascertain how their answers relate to real societal groups and what that may reveal about the model biases. We divide the human groups by using socioeconomic status (SES), and compare their answer distribution with LLMs for each question and for the essay. We find no significant biases when comparing LLM performance to humans on the multiple-choice Brazilian Portuguese tests, as the distance between model and human answers is mostly determined by the human accuracy. A similar conclusion is found by looking at the generated text as, when analyzing the essays, we observe that human and LLM essays differ in a few key factors, one being the choice of words where model essays were easily separable from human ones. The texts also differ syntactically, with LLM generated essays exhibiting, on average, smaller sentences and less thought units, among other differences. These results suggest that, for Brazilian Portuguese in the ENEM context, LLM outputs represent no group of humans, being significantly different from the answers from Brazilian students across all tests. The appendices may be found at https://arxiv.org/abs/2408.05035.

Social Scoring Systems for Behavioral Regulation: An Experiment on the Role of Transparency in Determining Perceptions and Behaviors

Carmen Loefflad — 2024-10-16

Recent developments in artificial intelligence research have advanced the spread of automated decision-making (ADM) systems used for regulating human behaviors. In this context, prior work has focused on the determinants of human trust in and the legitimacy of ADM systems, e.g., when used for decision support. However, studies assessing people's perceptions of ADM systems used for behavioral regulation, as well as the effect on behaviors and the overall impact on human communities are largely absent. In this paper, we experimentally investigate people's behavioral adaptations to, and their perceptions of an institutionalized decision-making system, which resembled a social scoring system. Using social scores as incentives, the system aimed at ensuring mutual fair treatment between members of experimental communities. We explore how the provision of transparency affected people’s perceptions, behaviors, as well as the well-being of the communities. While a non-transparent scoring system led to disparate impacts both within as well as across communities, transparency helped people develop trust in each other, create wealth, and enabled them to benefit from the system in a more uniform manner. A transparent system was perceived as more effective, procedurally just, and legitimate, and led people to rely more strongly on the system. However, transparency also made people strongly discipline those with a low score. This suggests that social scoring systems that precisely disclose past behaviors may also impose significant discriminatory consequences on individuals deemed non-compliant.

Foregrounding Artist Opinions: A Survey Study on Transparency, Ownership, and Fairness in AI Generative Art

Juniper Lovato — 2024-10-16

Generative AI tools are used to create art-like outputs and sometimes aid in the creative process. These tools have potential benefits for artists, but they also have the potential to harm the art workforce and infringe upon artistic and intellectual property rights. Without explicit consent from artists, Generative AI creators scrape artists' digital work to train Generative AI models and produce art-like outputs at scale. These outputs are now being used to compete with human artists in the marketplace as well as being used by some artists in their generative processes to create art. We surveyed 459 artists to investigate the tension between artists' opinions on Generative AI art's potential utility and harm. This study surveys artists' opinions on the utility and threat of Generative AI art models, fair practices in the disclosure of artistic works in AI art training models, ownership and rights of AI art derivatives, and fair compensation. Results show that a majority of artists believe creators should disclose what art is being used in AI training, that AI outputs should not belong to model creators, and express concerns about AI's impact on the art workforce and who profits from their art. We hope the results of this work will further meaningful collaboration and alignment between the art community and Generative AI researchers and developers.

Navigating Governance Paradigms: A Cross-Regional Comparative Study of Generative AI Governance Processes & Principles

Jose Luna — 2024-10-16

As Generative Artificial Intelligence (GenAI) technologies evolve at an unprecedented rate, global governance approaches struggle to keep pace with the technology, highlighting a critical issue in the governance adaptation of significant challenges. Depicting the nuances of nascent and diverse governance approaches based on risks, rules, outcomes, principles, or a mix, across different regions around the globe, is fundamental to discern discrepancies and convergences, and to shed light on specific limitations that need to be addressed, thereby facilitating the safe and trustworthy adoption of GenAI. In response to the need and the evolving nature of GenAI, this paper seeks to provide a collective view of different governance approaches around the world. Our research introduces a Harmonized GenAI Framework, “H-GenAIGF”, based on the current governance approaches of six regions: (European Union (EU), United States (US), China (CN), Canada (CA), United Kingdom (UK), and Singapore (SG)). We have identified four constituents, fifteen processes, twenty-five sub-processes, and nine principles that aid the governance of GenAI, thus providing a comprehensive perspective on the current state of GenAI governance. In addition, we present a comparative analysis to facilitate identification of common ground and distinctions based on coverage of the processes by each region. The results show that risk-based approaches allow for better coverage of the processes, followed by mixed approaches. Other approaches lag behind, covering less than 50% of the processes. Most prominently, the analysis demonstrates that amongst the regions, only one process aligns across all approaches, highlighting the lack of consistent and executable provisions. Moreover, our case study on ChatGPT reveals process coverage deficiency, showing that harmonization of approaches is necessary to find alignment for GenAI governance.

Beyond Participatory AI

Jonne Maas — 2024-10-16

The ‘participatory turn’ in AI design has received much attention in the literature. In this paper, we provide various arguments and proposals to move the discussion of participatory AI beyond its current state and towards stakeholder empowerment. The participatory AI literature points to Arnstein’s understanding of ‘citizen power’ as the right approach to participation. Although we agree with this general idea, we argue that there is a lack of depth in analyzing the legal, economic, and political arrangements required for a genuine redistribution of power to prioritize AI stakeholders. We highlight two domains on which the current discourse on participatory AI needs to expand. These are (1) the legal-institutional background that could provide ‘participation teeth’ for stakeholder empowerment and (2) the political economy of AI production that fosters such power asymmetries between AI developers and other stakeholders. We conclude by offering ways forward to explore alternative legal arrangements and ownership models for participatory AI.

The Code That Binds Us: Navigating the Appropriateness of Human-AI Assistant Relationships

Arianna Manzini — 2024-10-16

The development of increasingly agentic and human-like AI assistants, capable of performing a wide range of tasks on user's behalf over time, has sparked heightened interest in the nature and bounds of human interactions with AI. Such systems may indeed ground a transition from task-oriented interactions with AI, at discrete time intervals, to ongoing relationships -- where users develop a deeper sense of connection with and attachment to the technology. This paper investigates what it means for relationships between users and advanced AI assistants to be appropriate and proposes a new framework to evaluate both users' relationships with AI and developers' design choices. We first provide an account of advanced AI assistants, motivating the question of appropriate relationships by exploring several distinctive features of this technology. These include anthropomorphic cues and the longevity of interactions with users, increased AI agency, generality and context ambiguity, and the forms and depth of dependence the relationship could engender. Drawing upon various ethical traditions, we then consider a series of values, including benefit, flourishing, autonomy and care, that characterise appropriate human interpersonal relationships. These values guide our analysis of how the distinctive features of AI assistants may give rise to inappropriate relationships with users. Specifically, we discuss a set of concrete risks arising from user--AI assistant relationships that: (1) cause direct emotional or physical harm to users, (2) limit opportunities for user personal development, (3) exploit user emotional dependence, and (4) generate material dependencies without adequate commitment to user needs. We conclude with a set of recommendations to address these risks.

Lessons from Clinical Communications for Explainable AI

Alka V. Menon — 2024-10-16

One of the major challenges in the use of opaque, complex AI models is the need or desire to provide an explanation to the end-user (and other stakeholders) as to how the system arrived at the answer it did. While there is significant research in the development of explainability techniques for AI, the question remains as to who needs an explanation, what an explanation consists of, and how to communicate this to a lay user who lacks direct expertise in the area. In this position paper, an interdisciplinary team of researchers argue that the example of clinical communications offers lessons to those interested in improving the transparency and interpretability of AI systems. We identify five lessons from clinical communications: (1) offering explanations for AI systems and disclosure of their use recognizes the dignity of those using and impacted by it; (2) AI explanations can be productively targeted rather than totally comprehensive; (3) AI explanations can be enforced through codified rules but also norms, guided by core values; (4) what constitutes a “good” AI explanation will require repeated updating due to changes in technology and social expectations; 5) AI explanations will have impacts beyond defining any one AI system, shaping and being shaped by broader perceptions of AI. We review the history, debates and consequences surrounding the institutionalization of one type of clinical communication, informed consent, in order to illustrate the challenges and opportunities that may await attempts to offer explanations of opaque AI models. We highlight takeaways and implications for computer scientists and policymakers in the context of growing concerns and moves toward AI governance.

Pay Attention: a Call to Regulate the Attention Market and Prevent Algorithmic Emotional Governance

Franck Michel — 2024-10-16

Over the last 70 years, we, humans, have created an economic market where attention is being captured and turned into money thanks to advertising. During the last two decades, leveraging research in psychology, sociology, neuroscience and other domains, Web platforms have brought the process of capturing attention to an unprecedented scale. With the initial commonplace goal of making targeted advertising more effective, the generalization of attention-capturing techniques and their use of cognitive biases and emotions have multiple detrimental side effects such as polarizing opinions, spreading false information and threatening public health, economies and democracies. This is clearly a case where the Web is not used for the common good and where, in fact, all its users become a vulnerable population. This paper brings together contributions from a wide range of disciplines to analyze current practices and consequences thereof. Through a set of propositions and principles that could be used do drive further works, it calls for actions against these practices competing to capture our attention on the Web, as it would be unsustainable for a civilization to allow attention to be wasted with impunity on a world-wide scale.

LLMs and Memorization: On Quality and Specificity of Copyright Compliance

Felix B Mueller — 2024-10-16

Memorization in large language models (LLMs) is a growing concern. LLMs have been shown to easily reproduce parts of their training data, including copyrighted work. This is an important problem to solve, as it may violate existing copyright laws as well as the European AI Act. In this work, we propose a systematic analysis to quantify the extent of potential copyright infringements in LLMs using European law as an example. Unlike previous work, we evaluate instruction-finetuned models in a realistic end-user scenario. Our analysis builds on a proposed threshold of 160 characters, which we borrow from the German Copyright Service Provider Act and a fuzzy text matching algorithm to identify potentially copyright-infringing textual reproductions. The specificity of countermeasures against copyright infringement is analyzed by comparing model behavior on copyrighted and public domain data. We investigate what behaviors models show instead of producing protected text (such as refusal or hallucination) and provide a first legal assessment of these behaviors. We find that there are huge differences in copyright compliance, specificity, and appropriate refusal among popular LLMs. Alpaca, GPT 4, GPT 3.5, and Luminous perform best in our comparison, with OpenGPT-X, Alpaca, and Luminous producing a particularly low absolute number of potential copyright violations. Code can be found at github.com/felixbmuller/llms-memorization-copyright.

Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits

Jimin Mun — 2024-10-16

General purpose AI, such as ChatGPT, seems to have lowered the barriers for the public to use AI and harness its power. However, the governance and development of AI still remain in the hands of a few, and the pace of development is accelerating without a comprehensive assessment of risks. As a first step towards democratic risk assessment and design of general purpose AI, we introduce PARTICIP-AI, a carefully designed framework for laypeople to speculate and assess AI use cases and their impacts. Our framework allows us to study more nuanced and detailed public opinions on AI through collecting use cases, surfacing diverse harms through risk assessment under alternate scenarios (i.e., developing and not developing a use case), and illuminating tensions over AI devel- opment through making a concluding choice on its development. To showcase the promise of our framework towards informing democratic AI development, we run a medium-scale study with inputs from 295 demographically diverse participants. Our analyses show that participants’ responses emphasize applications for personal life and society, contrasting with most current AI development’s business focus. We also surface diverse set of envisioned harms such as distrust in AI and institutions, complementary to those defined by experts. Furthermore, we found that perceived impact of not developing use cases significantly predicted participants’ judgements of whether AI use cases should be developed, and highlighted lay users’ concerns of techno-solutionism. We conclude with a discussion on how frameworks like PARTICIP-AI can further guide democratic AI development and governance.

Quantifying Gendered Citation Imbalance in Computer Science Conferences

Kazuki Nakajima — 2024-10-16

The number of citations received by papers often exhibits imbalances in terms of author attributes such as country of affiliation and gender. While recent studies have quantified citation imbalance in terms of the authors' gender in journal papers, the computer science discipline, where researchers frequently present their work at conferences, may exhibit unique patterns in gendered citation imbalance. Additionally, understanding how network properties in citations influence citation imbalances remains challenging due to a lack of suitable reference models. In this paper, we develop a family of reference models for citation networks and investigate gender imbalance in citations between papers published in computer science conferences. By deploying these reference models, we found that homophily in citations is strongly associated with gendered citation imbalance in computer science, whereas heterogeneity in the number of citations received per paper has a relatively minor association with it. Furthermore, we found that the gendered citation imbalance is most pronounced in papers published in the highest-ranked conferences, is present across different subfields, and extends to citation-based rankings of papers. Our study provides a framework for investigating associations between network properties and citation imbalances, aiming to enhance our understanding of the structure and dynamics of citations between research publications.

Habemus a Right to an Explanation: so What? – A Framework on Transparency-Explainability Functionality and Tensions in the EU AI Act

Luca Nannini — 2024-10-16

The European Union's Artificial Intelligence Act (AI Act), finalized in February 2024, mandates comprehensive transparency and explainability requirements for AI systems to enable effective oversight and safeguard fundamental rights. However, the practical implementation of these requirements faces challenges due to tensions between the need for meaningful explanations and the potential risks to intellectual property and commercial interests of AI providers. This research proposes the Transparency-Explainability Functionality and Tensions (TEFT) framework to systematically analyze the complex interplay of legal, technical, and socio-ethical factors shaping the realization of algorithmic transparency and explainability in the EU context. Through a two-pronged approach combining a focused literature review and an in-depth examination of the AI Act's provisions, we identify key friction points and challenges in operationalizing the right to explanation. The TEFT framework maps the interests and incentives of various stakeholders, including AI providers & deployers, oversight bodies, and affected individuals, while considering their goals, expected benefits, risks, possible negative impacts, and context to algorithmic explainability.

Human-Centered AI Applications for Canada’s Immigration Settlement Sector

Isar Nejadgholi — 2024-10-16

While AI has been frequently applied in the context of immigration, most of these applications focus on selection and screening, which primarily serve to empower states and authorities, raising concerns due to their understudied reliability and high impact on immigrants' lives. In contrast, this paper emphasizes the potential of AI in Canada’s immigration settlement phase, a stage where access to information is crucial and service providers are overburdened. By highlighting the settlement sector as a prime candidate for reliable AI applications, we demonstrate its unique capacity to empower immigrants directly, yet it remains under-explored in AI research. We outline a vision for human-centred and responsible AI solutions that facilitate the integration of newcomers. We call on AI researchers to build upon our work and engage in multidisciplinary research and active collaboration with service providers and government organizations to develop tailored AI tools that are empowering, inclusive and safe.

AIDE: Antithetical, Intent-based, and Diverse Example-Based Explanations

Ikhtiyor Nematov — 2024-10-16

For many use-cases, it is often important to explain the prediction of a black-box model by identifying the most influential training data samples. Existing approaches lack customization for user intent and often provide a homogeneous set of explanation samples, failing to reveal the model's reasoning from different angles. In this paper, we propose AIDE, an approach for providing antithetical (i.e., contrastive), intent-based, diverse explanations for opaque and complex models. AIDE distinguishes three types of explainability intents: interpreting a correct, investigating a wrong, and clarifying an ambiguous prediction. For each intent, AIDE selects an appropriate set of influential training samples that support or oppose the prediction either directly or by contrast. To provide a succinct summary, AIDE uses diversity-aware sampling to avoid redundancy and increase coverage of the training data. We demonstrate the effectiveness of AIDE on image and text classification tasks, in three ways: quantitatively, assessing correctness and continuity; qualitatively, comparing anecdotal evidence from AIDE and other example-based approaches; and via a user study, evaluating multiple aspects of AIDE. The results show that AIDE addresses the limitations of existing methods and exhibits desirable traits for an explainability method.

Measuring Human-AI Value Alignment in Large Language Models

Hakim Norhashim — 2024-10-16

This paper seeks to quantify the human-AI value alignment in large language models. Alignment between humans and AI has become a critical area of research to mitigate potential harm posed by AI. In tandem with this need, developers have incorporated a values-based approach towards model development where ethical principles are integrated from its inception. However, ensuring that these values are reflected in outputs remains a challenge. In addition, studies have noted that models lack consistency when producing outputs, which in turn can affect their function. Such variability in responses would impact human-AI value alignment as well, particularly where consistent alignment is critical. Fundamentally, the task of uncovering a model’s alignment is one of explainability – where understanding how these complex models behave is essential in order to assess their alignment. This paper examines the problem through a case study of GPT-3.5. By repeatedly prompting the model with scenarios based on a dataset of moral stories, we aggregate the model’s alignment with human values to produce a human-AI value alignment metric. Moreover, by using a comprehensive taxonomy of human values, we uncover the latent value profile represented by these outputs, thereby determining the extent of human-AI value alignment.

Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations

José Luiz Nunes — 2024-10-16

Large language models (LLMs) have taken centre stage in debates on Artificial Intelligence. Yet there remains a gap in how to assess LLMs' conformity to important human values. In this paper, we investigate whether state-of-the-art LLMs, GPT-4 and Claude 2.1 (Gemini Pro and LLAMA 2 did not generate valid results) are moral hypocrites. We employ two research instruments based on the Moral Foundations Theory: (i) the Moral Foundations Questionnaire (MFQ), which investigates which values are considered morally relevant in abstract moral judgements; and (ii) the Moral Foundations Vignettes (MFVs), which evaluate moral cognition in concrete scenarios related to each moral foundation. We characterise conflicts in values between these different abstractions of moral evaluation as hypocrisy. We found that both models displayed reasonable consistency within each instrument compared to humans, but they displayed contradictory and hypocritical behaviour when we compared the abstract values present in the MFQ to the evaluation of concrete moral violations of the MFV.

Hidden or Inferred: Fair Learning-To-Rank With Unknown Demographics

Oluseun Olulana — 2024-10-16

As learning-to-rank models are increasingly deployed for decision-making in areas with profound life implications, the FairML community has been developing fair learning-to-rank (LTR) models. These models rely on the availability of sensitive demographic features such as race or sex. However, in practice, regulatory obstacles and privacy concerns protect this data from collection and use. As a result, practitioners may either need to promote fairness despite the absence of these features or turn to demographic inference tools to attempt to infer them. Given that these tools are fallible, this paper aims to further understand how errors in demographic inference impact the fairness performance of popular fair LTR strategies. In which cases would it be better to keep such demographic attributes hidden from models versus infer them? We examine a spectrum of fair LTR strategies ranging from fair LTR with and without demographic features hidden versus inferred to fairness-unaware LTR followed by fair re-ranking. We conduct a controlled empirical investigation modeling different levels of inference errors by systematically perturbing the inferred sensitive attribute. We also perform three case studies with real-world datasets and popular open-source inference methods. Our findings reveal that as inference noise grows, LTR-based methods that incorporate fairness considerations into the learning process may increase bias. In contrast, fair re-ranking strategies are more robust to inference errors. All source code, data, and experimental artifacts of our experimental study are available here: https://github.com/sewen007/hoiltr.git

Perception of Experience Influences Altruism and Perception of Agency Influences Trust in Human-Machine Interactions (Extended Abstract)

Mayada Oudah — 2024-10-16

It has been argued that human social and economic interactions depend on the perception of mind of the interacting partner. Minds are perceived along two dimensions: experience, i.e., the ability to feel, and agency, i.e., the ability to act and take responsibility for one’s actions. Here, we pair participants with bots in a dictator game (to measure altruism) and a trust game (to measure trust) while varying the bots’ perceived experience and agency. Here, we pair participants with bots in a dictator game (to measure altruism) and a trust game (to measure trust) while varying the bots' perceived experience and agency. Results demonstrate that the perception of experience influences altruism, while the perception of agency influences trust.

Face the Facts: Using Face Averaging to Visualize Gender-by-Race Bias in Facial Analysis Algorithms

Kentrell Owens — 2024-10-16

We applied techniques from psychology --- typically used to visualize human bias --- to facial analysis systems, providing novel approaches for diagnosing and communicating algorithmic bias. First, we aggregated a diverse corpus of human facial images (N=1492) with self-identified gender and race. We tested four automated gender recognition (AGR) systems and found that some exhibited intersectional gender-by-race biases. Employing a technique developed by psychologists --- face averaging --- we created composite images to visualize these systems' outputs. For example, we visualized what an "average woman" looks like, according to a system's output. Second, we conducted two online experiments wherein participants judged the bias of hypothetical AGR systems. The first experiment involved participants (N=228) from a convenience sample. When depicting the same results in different formats, facial visualizations communicated bias to the same magnitude as statistics. In the second experiment with only Black participants (N=223), facial visualizations communicated bias significantly more than statistics, suggesting that face averages are meaningful for communicating algorithmic bias.

Proxy Fairness under the European Data Protection Regulation and the AI Act: A Perspective of Sensitivity and Necessity

Ioanna Papageorgiou — 2024-10-16

This paper navigates the convergence of the European Data Protection Regulation and the AI Act within the paradigm of computational methods that operationalise fairness in the absence of demographic data, notably through the use of proxy variables and inferential techniques (Proxy Fairness). Particularly, it explores the legal nature of the data involved in Proxy Fairness under the European Data Protection Regulation, focusing on the legal notion of Sensitivity. Moreover, it examines the lawfulness of processing sensitive personal data for Proxy Fairness purposes under the AI Act, particularly focusing on the legal requirement of Necessity. Through this analysis, the paper aims to shed light on core aspects of the legitimacy of Proxy Fairness in the context of EU law, providing a normative foundation to this line of Fair-AI approaches.

A Model- and Data-Agnostic Debiasing System for Achieving Equalized Odds

Thomas Pinkava — 2024-10-16

As reliance on Machine Learning (ML) systems in real-world decision-making processes grows, ensuring these systems are free of bias against sensitive demographic groups is of increasing importance. Existing techniques for automatically debiasing ML models generally require access to either the models’ internal architectures, the models’ training datasets, or both. In this paper we outline the reasons why such requirements are disadvantageous, and present an alternative novel debiasing system that is both data- and model-agnostic. We implement this system as a Reinforcement Learning Agent and through extensive experiments show that we can debias a variety of target ML model architectures over three benchmark datasets. Our results show performance comparable to data- and/or model-gnostic state-of-the-art debiasers.

CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models

Giada Pistilli — 2024-10-16

This paper introduces the "CIVICS: Culturally-Informed \& Values-Inclusive Corpus for Societal impacts" dataset, designed to evaluate the social and cultural variation of Large Language Models (LLMs) towards socially sensitive topics across multiple languages and cultures. The hand-crafted, multilingual dataset of statements addresses value-laden topics, including LGBTQI rights, social welfare, immigration, disability rights, and surrogacy. CIVICS is designed to elicit responses from LLMs to shed light on how values encoded in their parameters shape their behaviors. Through our dynamic annotation processes, tailored prompt design, and experiments, we investigate how open-weight LLMs respond to these issues, exploring their behavior across diverse linguistic and cultural contexts. Using two experimental set-ups based on log-probabilities and long-form responses, we show social and cultural variability across different LLMs. Specifically, different topics and sources lead to more pronounced differences across model answers, particularly on immigration, LGBTQI rights, and social welfare. Experiments on generating long-form responses from models tuned for user chat demonstrate that refusals are triggered disparately across different models, but consistently and more frequently in English or translated statements. As shown by our initial experimentation, the CIVICS dataset can serve as a tool for future research, promoting reproducibility and transparency across broader linguistic settings, and furthering the development of AI technologies that respect and reflect global cultural diversities and value pluralism. The CIVICS dataset and tools are made available under open licenses at hf.co/CIVICS-dataset.

Disengagement through Algorithms: How Traditional Organizations Aim for Experts' Satisfaction

Jérémie Poiroux — 2024-10-16

This study examines the use of algorithmic tools in traditional organizational decision-making processes. Through forty semi-structured interviews with managers, engineers, and (expert) users across six European projects, we suggest that initiators deploy algorithms not to automate actions or replace users, but to disengage themselves from prescriptive decision-making. Consequently, the responsibility to choose, select, and decide falls upon the users; they become engaged. Therefore, algorithm evaluation is oriented towards utility, interpretability, and, more broadly, user satisfaction. Further research is encouraged to analyze the advent of a 'satisfaction regime', from platforms to traditional organizations.

Not Oracles of the Battlefield: Safety Considerations for AI-Based Military Decision Support Systems

Emelia Probasco — 2024-10-16

AI-based military decision support systems that help commanders observe, orient, decide, and act on the battlefield are highly sought after by military leadership. With the advent of large language models, AI developers have begun advertising automated AI-based decision support systems designed to both analyze and act on data from the battlefield. While the desire to use decision support systems to make better decisions on the battlefield is unsurprising, the responsible deployment of such systems requires a clear understanding of the capabilities and limitations of modern machine learning models. This paper reviews recently proposed uses of AI-enables decision support systems (DSS), provides a simplified framework for considering AI-DSS capabilities and limitations, and recommends practical risk mitigations commanders might employ when operating with an AI-enabled DSS.

What to Trust When We Trust Artificial Intelligence (Extended Abstract)

Duncan Purves — 2024-10-16

What to Trust When We Trust Artificial Intelligence Abstract: So-called “trustworthy AI” has emerged as a guiding aim of industry leaders, computer and data science researchers, and policy makers in the US and Europe. Often, trustworthy AI is characterized in terms of a list of criteria. These lists usually include at least fairness, accountability, and transparency. Fairness, accountability, and transparency are valuable objectives, and they have begun to receive attention from philosophers and legal scholars. However, those who put forth criteria for trustworthy AI have failed to explain why satisfying the criteria makes an AI system—or the organizations that make use of the AI system—worthy of trust. Nor do they explain why the aim of trustworthy AI is important enough to justify devoting resources to achieve it. It even remains unclear whether an AI system is the sort of thing that can be trustworthy or not. To explain why fairness, accountability, and transparency are suitable criteria for trustworthy AI one needs an analysis of trustworthy AI. Providing an analysis of trustworthy AI is a distinct task from providing criteria. Criteria are diagnostic; they provide a useful test for the phenomenon of interest, but they do not purport to explain the nature of the phenomenon. It is conceivable that an AI system could lack transparency, accountability, or fairness while remaining trustworthy. An analysis of trustworthy AI provides the fundamental features of an AI system in virtue of which it is (or is not) worthy of trust. An AI system that lacks these features will, necessarily, fail to be worthy of trust. This paper puts forward an analysis of trustworthy AI that can be used to critically evaluate criteria for trustworthy AI such as fairness, accountability, and transparency. In this paper we first make clear the target concept to be analyzed: trustworthy AI. We argue that AI, at least in its current form, should be understood as a distributed, complex system embedded in a larger institutional context. This characterization of AI is consistent with recent definitions proposed by national and international regulatory bodies, and it eliminates some unhappy ambiguity in the common usage of the term. We further limit the scope of our discussion to AI systems which are used to inform decision-making about qualification problems, problems wherein a decision-maker must decide whether an individual is qualified for some beneficial or harmful treatment. We argue that, given reasonable assumptions about the nature of trust and trustworthiness, only AI systems that are used to inform decision-making about qualification problems are appropriate candidates for attributions of (un)trustworthiness. We then distinguish between two models of trust and trustworthiness that we find in the existing literature. We motivate our account by highlighting this as a dilemma in in the accounts of trustworthy AI that have previously been offered. These accounts claim that trustworthiness is either exclusive to full agents (and it is thus nonsense when we talk of trustworthy AI), or they offer an account of trustworthiness that collapses into mere reliability. The first sort of account we refer to as an agential account and the second sort we refer to as a reliability account. We offer that one of the core challenges of putting forth an account of trustworthy AI is to avoid reducing to one of these two camps. It is thus a desideratum of our account that it avoids being exclusive to full moral agents, while it simultaneously avoids capturing things such as mere tools. We go on to propose our positive account which we submit avoids these twin pitfalls. We subsequently argue that if AI can be trustworthy, then it will be trustworthy on an institutional model. Starting from an account of institutional trust offered by Purves and Davis, we argue that trustworthy AI systems have three features: they are competent with regard to the task they are assigned, they are responsive to the morally salient facts governing the decision-making context in which they are deployed, and they publicly provide evidence of these features. As noted, this account builds on a model of institutional trust offered by Purves and Davis and an account of default trust from Margaret Urban Walker. The resulting account allows us to accommodate the core challenge of finding a balance between agential accounts and reliability accounts. We go on to refine our account, answer objections, and revisit the list criteria from above as explained in terms of competence, responsiveness, and evidence.

PPS: Personalized Policy Summarization for Explaining Sequential Behavior of Autonomous Agents

Peizhu Qian — 2024-10-16

AI-enabled agents designed to assist humans are gaining traction in a variety of domains such as healthcare and disaster response. It is evident that, as we move forward, these agents will play increasingly vital roles in our lives. To realize this future successfully and mitigate its unintended consequences, it is imperative that humans have a clear understanding of the agents that they work with. Policy summarization methods help facilitate this understanding by showcasing key examples of agent behaviors to their human users. Yet, existing methods produce “one-size-fits-all” summaries for a generic audience ahead of time. Drawing inspiration from research in pedagogy, we posit that personalized policy summaries can more effectively enhance user understanding. To evaluate this hypothesis, this paper presents and benchmarks a novel technique: Personalized Policy Summarization (PPS). PPS discerns a user’s mental model of the agent through a series of algorithmically generated questions and crafts customized policy summaries to enhance user understanding. Unlike existing methods, PPS actively engages with users to gauge their comprehension of the agent behavior, subsequently generating tailored explanations on the fly. Through a combination of numerical and human subject experiments, we confirm the utility of this personalized approach to explainable AI.

Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis

Chahat Raj — 2024-10-16

Large Language Models (LLMs) perpetuate social biases, reflecting prejudices in their training data and reinforcing societal stereotypes and inequalities. Our work explores the potential of the Contact Hypothesis, a concept from social psychology for debiasing LLMs. We simulate various forms of social contact through LLM prompting to measure their influence on the model’s biases, mirroring how intergroup interactions can reduce prejudices in social contexts. We create a dataset of 108,000 prompts following a principled approach replicating social contact to measure biases in three LLMs (LLaMA 2, Tulu, and NousHermes) across 13 social bias dimensions. We propose a unique debiasing technique, Social Contact Debiasing (SCD), that instruction-tunes these models with unbiased responses to prompts. Our research demonstrates that LLM responses exhibit social biases when subject to contact probing, but more importantly, these biases can be significantly reduced by up to 40% in 1 epoch of instruction tuning LLaMA 2 following our SCD strategy.

Learning When Not to Measure: Theorizing Ethical Alignment in LLMs

William Rathje — 2024-10-16

LLMs and other forms of generative AI have shown immense promise in producing highly accurate epistemic judgements in domains as varied as law, education, and medicine – with GPT notably passing the legal Bar exam and various medical licensing exams. The safe extension of LLMs into safety-critical professional domains requires assurance not only of epistemic but ethical alignment. This paper adopts a theoretical and philosophical approach, drawing from metaethical theories to argue for a distinction hinging around quantitative, axiological comparability that separates Kantian ethics from not only the utilitarianism it is well-known to oppose, but from just distribution theories as well, which are key to debiasing LLM models. It presents the novel hypothesis that LLM ethical acquisition from both corpus induction and RLHF may encounter value conflicts between Kantian and just distribution principles that intensify as they come into improved alignment with both theories, hinging around the variability by which self-attention may statistically attend to the same characterizations as more person-like or more resource-like under distinct prompting strategies.

Gaps in the Safety Evaluation of Generative AI

Maribeth Rauh — 2024-10-16

Generative AI systems produce a range of ethical and social risks. Evaluation of these risks is a critical step on the path to ensuring the safety of these systems. However, evaluation requires the availability of validated and established measurement approaches and tools. In this paper, we provide an empirical review of the methods and tools that are available for evaluating known safety of generative AI systems to date. To this end, we review more than 200 safety-related evaluations that have been applied to generative AI systems. We categorise each evaluation along multiple axes to create a detailed snapshot of the safety evaluation landscape to date. We release this data for researchers and AI safety practitioners (https://bitly.ws/3hUzu). Analysing the current safety evaluation landscape reveals three systemic ”evaluation gaps”. First, a ”modality gap” emerges as few safety evaluations exist for non-text modalities. Second, a ”risk coverage gap” arises as evaluations for several ethical and social risks are simply lacking. Third, a ”context gap” arises as most safety evaluations are model-centric and fail to take into account the broader context in which AI systems operate. Devising next steps for safety practitioners based on these findings, we present tactical ”low-hanging fruit” steps towards closing the identified evaluation gaps and their limitations. We close by discussing the role and limitations of safety evaluation to ensure the safety of generative AI systems.

Fairness in Reinforcement Learning: A Survey

Anka Reuel — 2024-10-16

While our understanding of fairness in machine learning has significantly progressed, our understanding of fairness in reinforcement learning (RL) remains nascent. Most of the attention has been on fairness in one-shot classification tasks; however, real-world, RL-enabled systems (e.g., autonomous vehicles) are much more complicated in that agents operate in dynamic environments over a long period of time. To ensure the responsible development and deployment of these systems, we must better understand fairness in RL. In this paper, we survey the literature to provide the most up-to-date snapshot of the frontiers of fairness in RL. We start by reviewing where fairness considerations can arise in RL, then discuss the various definitions of fairness in RL that have been put forth thus far. We continue to highlight the methodologies researchers used to implement fairness in single- and multi-agent RL systems and showcase the distinct application domains that fair RL has been investigated in. Finally, we critically examine gaps in the literature, such as understanding fairness in the context of RLHF, that still need to be addressed in future work to truly operationalize fair RL in real-world systems.

A Human-in-the-Loop Fairness-Aware Model Selection Framework for Complex Fairness Objective Landscapes

Jake Robertson — 2024-10-16

Fairness-aware Machine Learning (FairML) applications are often characterized by complex social objectives and legal requirements, frequently involving multiple, potentially conflicting notions of fairness. Despite the well-known Impossibility Theorem of Fairness and extensive theoretical research on the statistical and socio-technical trade-offs between fairness metrics, many FairML tools still optimize or constrain for a single fairness objective. However, this one-sided optimization can inadvertently lead to violations of other relevant notions of fairness. In this socio-technical and empirical study, we frame fairness as a Many-Objective (MaO) problem by treating fairness metrics as conflicting objectives in a multi-objective (MO) sense. We introduce ManyFairHPO, a human-in-the-loop, fairness-aware model selection framework that enables practitioners to effectively navigate complex and nuanced fairness objective landscapes. ManyFairHPO aids in the identification, evaluation, and balancing of fairness metric conflicts and their related social consequences, leading to more informed and socially responsible model-selection decisions. Through a comprehensive empirical evaluation and a case study on the Law School Admissions problem, we demonstrate the effectiveness of ManyFairHPO in balancing multiple fairness objectives, mitigating risks such as self-fulfilling prophecies, and providing interpretable insights to guide stakeholders in making fairness-aware modeling decisions.

Introducing ELLIPS: An Ethics-Centered Approach to Research on LLM-Based Inference of Psychiatric Conditions

Roberta Rocca — 2024-10-16

As mental health care systems worldwide struggle to meet demand, there is increasing focus on using language models (LM) to infer neuropsychiatric conditions or psychopathological traits from language production. Yet, so far, this research has only delivered solutions with limited clinical applicability, due to insufficient consideration of ethical questions crucial to ensuring the synergy between possible applications and model design. To accelerate progress towards clinically applicable models, our paper charts the ethical landscape of research on language-based inference of psychopathology and provides a practical tool for researchers to navigate it. We identify seven core ethical principles that should guide model development and deployment in this domain, translate them into ELLIPS, an ethical toolkit operationalizing these principles into questions that can guide researchers' choices with respect to data selection, architectures, evaluation, and model deployment, and provide a case study exemplifying its use. With this, we aim to facilitate the emergence of model technology with concrete potential for real-world applicability.

The Problems with Proxies: Making Data Work Visible through Requester Practices

Annabel Rothschild — 2024-10-16

Fairness in AI and ML systems is increasingly linked to the proper treatment and recognition of data workers involved in training dataset development. Yet, those who collect and annotate the data, and thus have the most intimate knowledge of its development, are often excluded from critical discussions. This exclusion prevents data annotators, who are domain experts, from contributing effectively to dataset contextualization. Our investigation into the hiring and engagement practices of 52 data work requesters on platforms like Amazon Mechanical Turk reveals a gap: requesters frequently hold naive or unchallenged notions of worker identities and capabilities and rely on ad-hoc qualification tasks that fail to respect the workers’ expertise. These practices not only undermine the quality of data but also the ethical standards of AI development. To rectify these issues, we advocate for policy changes to enhance how data annotation tasks are designed and managed and to ensure data workers are treated with the respect they deserve.

Reducing Biases towards Minoritized Populations in Medical Curricular Content via Artificial Intelligence for Fairer Health Outcomes

Chiman Salavati — 2024-10-16

Biased information (recently termed bisinformation) continues to be taught in medical curricula, often long after having been debunked. In this paper, we introduce bricc, a first-in-class initiative that seeks to mitigate medical bisinformation using machine learning to systematically identify and flag text with potential biases, for subsequent review in an expert-in-the-loop fashion, thus greatly accelerating an otherwise labor-intensive process. We have developed a gold-standard bricc dataset throughout several years containing over 12K pages of instructional materials. Medical experts meticulously annotated these documents for bias according to comprehensive coding guidelines, emphasizing gender, sex, age, geography, ethnicity, and race. Using this labeled dataset, we trained, validated, and tested medical bias classifiers. We test three classifier approaches: a binary type-specific classifier, a general bias classifier; an ensemble combining bias type-specific classifiers independently-trained; and a multi-task learning (MTL) model tasked with predicting both general and type-specific biases. While MTL led to some improvement on race bias detection in terms of F1-score, it did not outperform binary classifiers trained specifically on each task. On general bias detection, the binary classifier achieves up to 0.923 of AUC, a 27.8% improvement over the baseline. This work lays the foundations for debiasing medical curricula by exploring a novel dataset and evaluating different training model strategies. Hence, it offers new pathways for more nuanced and effective mitigation of bisinformation.

Estimating Environmental Cost Throughout Model’s Adaptive Life Cycle

Vishwesh Sangarya — 2024-10-16

With the rapid increase in the research, development, and application of neural networks in the current era, there is a proportional increase in the energy needed to train and use models. Crucially, this is accompanied by the increase in carbon emissions into the environment. A sustainable and socially beneficial approach to reducing the carbon footprint and rising energy demands associated with the modern age of AI/deep learning is the adaptive and continuous reuse of models with regard to changes in the environment of model deployment or variations/changes in the input data. In this paper, we propose PreIndex, a predictive index to estimate the environmental and compute resources associated with model retraining to distributional shifts in data. PreIndex can be used to estimate environmental costs such as carbon emissions and energy usage when retraining from current data distribution to new data distribution. It also correlates with and can be used to estimate other resource indicators associated with deep learning, such as epochs, gradient norm, and magnitude of model parameter change. PreIndex requires only one forward pass of the data, following which it provides a single concise value to estimate resources associated with retraining to the new distribution shifted data. We show that PreIndex can be reliably used across various datasets, model architectures, different types, and intensities of distribution shifts. Thus, PreIndex enables users to make informed decisions for retraining to different distribution shifts and determine the most cost-effective and sustainable option, allowing for the reuse of a model with a much smaller footprint in the environment. The code for this work is available here: https://github.com/JEKimLab/AIES2024PreIndex

Algorithms and Recidivism: A Multi-disciplinary Systematic Review

Arul George Scaria — 2024-10-16

The adoption of algorithms across different jurisdictions have transformed the workings of the criminal justice system, particularly in predicting recidivism risk for bail, sentencing, and parole decisions. This shift from human decision-making to statistical or algorithmic tool-assisted decision-making has prompted discussions regarding the legitimacy of such adoption. Our paper presents the results of a systematic review of the literature on criminal recidivism, spanning both legal and empirical perspectives. By coalescing different approaches, we highlight the most prominent themes that have garnered the attention of researchers so far and some that warrant further investigation.

What Is Required for Empathic AI? It Depends, and Why That Matters for AI Developers and Users

Jana Schaich Borg — 2024-10-16

Interest is growing in artificial empathy, but so is confusion about what artificial empathy is or needs to be. This confusion makes it challenging to navigate the technical and ethical issues that accompany empathic AI development. Here, we outline a framework for thinking about empathic AI based on the premise that different constellations of capabilities associated with empathy are important for different empathic AI applications. We describe distinctions of capabilities that we argue belong under the empathy umbrella, and show how three medical empathic AI use cases require different sets of these capabilities. We conclude by discussing why appreciation of the diverse capabilities under the empathy umbrella is important for both AI creators and users.

Annotator in the Loop: A Case Study of In-Depth Rater Engagement to Create a Prosocial Benchmark Dataset

Sonja Schmer-Galunder — 2024-10-16

With the growing prevalence of large language models, it is increasingly common to annotate datasets for machine learning using pools of crowd raters. However, these raters often work in isolation as individual crowdworkers. In this work, we regard annotation not merely as inexpensive, scalable labor, but rather as a nuanced interpretative effort to discern the meaning of what is being said in a text. We describe a novel, collaborative, and iterative annotator-in-the-loop methodology for annotation, resulting in a 'Bridging Benchmark Dataset' of comments relevant to bridging divides, annotated from 11,973 textual posts in the Civil Comments dataset. The methodology differs from popular anonymous crowd-rating annotation processes due to its use of an in-depth, iterative engagement with seven US-based raters to (1) collaboratively refine the definitions of the to-be-annotated concepts and then (2) iteratively annotate complex social concepts, with check-in meetings and discussions. This approach addresses some shortcomings of current anonymous crowd-based annotation work, and we present empirical evidence of the performance of our annotation process in the form of inter-rater reliability. Our findings indicate that collaborative engagement with annotators can enhance annotation methods, as opposed to relying solely on isolated work conducted remotely. We provide an overview of the input texts, attributes, and annotation process, along with the empirical results and the resulting benchmark dataset, categorized according to the following attributes: Alienation, Compassion, Reasoning, Curiosity, Moral Outrage, and Respect.

The Impact of Responsible AI Research on Innovation and Development

Ali Akbar Septiandri — 2024-10-16

Translational research, especially in the fast-evolving field of Artificial Intelligence (AI), is key to converting scientific findings into practical innovations. In Responsible AI (RAI) research, translational impact is often viewed through various pathways, including research papers, blogs, news articles, and the drafting of forthcoming AI legislation (e.g., the EU AI Act). However, the real-world impact of RAI research remains an underexplored area. Our study aims to capture it through two pathways: patents and code repositories, both of which provide a rich and structured source of data. Using a dataset of 200,000 papers from 1980 to 2022 in AI and related fields, including Computer Vision, Natural Language Processing, and Human-Computer Interaction, we developed a Sentence-Transformers Deep Learning framework to identify RAI papers. This framework calculates the semantic similarity between paper abstracts and a set of RAI keywords, which are derived from the NIST's AI Risk Management Framework; a framework that aims to enhance trustworthiness considerations in the design, development, use, and evaluation of AI products, services, and systems. We identified 1,747 RAI papers published in top venues such as CHI, CSCW, NeurIPS, FAccT, and AIES between 2015 and 2022. By analyzing these papers, we found that a small subset that goes into patents or repositories is highly cited, with the translational process taking between 1 year for repositories and up to 8 years for patents. Interestingly, impactful RAI research is not limited to top U.S. institutions, but significant contributions come from European and Asian institutions. Finally, the multidisciplinary nature of RAI papers, often incorporating knowledge from diverse fields of expertise, was evident as these papers tend to build on unconventional combinations of prior knowledge.

Trusting Your AI Agent Emotionally and Cognitively: Development and Validation of a Semantic Differential Scale for AI Trust

Ruoxi Shang — 2024-10-16

Trust is not just a cognitive issue but also an emotional one, yet the research in human-AI interactions has primarily focused on the cognitive route of trust development. Recent work has highlighted the importance of studying affective trust towards AI, especially in the context of emerging human-like LLM-powered conversational agents. However, there is a lack of validated and generalizable measures for the two-dimensional construct of trust in AI agents. To address this gap, we developed and validated a set of 27-item semantic differential scales for affective and cognitive trust through a scenario-based survey study. We then further validated and applied the scale through an experiment study. Our empirical findings showed how the emotional and cognitive aspects of trust interact with each other and collectively shape a person's overall trust in AI agents. Our study methodology and findings also provide insights into the capability of the state-of-art LLMs to foster trust through different routes.

Automating Transparency Mechanisms in the Judicial System Using LLMs: Opportunities and Challenges

Ishana Shastri — 2024-10-16

Bringing more transparency to the judicial system for the purposes of increasing accountability often demands extensive effort from auditors who must meticulously sift through numerous disorganized legal case files to detect patterns of bias and errors. For example, the high-profile investigation into the Curtis Flowers case took seven reporters a full year to assemble evidence about the prosecutor's history of selecting racially biased juries. LLMs have the potential to automate and scale these transparency pipelines, especially given their demonstrated capabilities to extract information from unstructured documents. We discuss the opportunities and challenges of using LLMs to provide transparency in two important court processes: jury selection in criminal trials and housing eviction cases.

Formal Ethical Obligations in Reinforcement Learning Agents: Verification and Policy Updates

Colin Shea-Blymyer — 2024-10-16

When designing agents for operation in uncertain environments, designers need tools to automatically reason about what agents ought to do, how that conflicts with what is actually happening, and how a policy might be modified to remove the conflict. These obligations include ethical and social obligations, permissions and prohibitions, which constrain how the agent achieves its mission and executes its policy. We propose a new deontic logic, Expected Act Utilitarian deontic logic, for enabling this reasoning at design time: for specifying and verifying the agent's strategic obligations, then modifying its policy from a reference policy to meet those obligations. Unlike approaches that work at the reward level, working at the logical level increases the transparency of the trade-offs. We introduce two algorithms: one for model-checking whether an RL agent has the right strategic obligations, and one for modifying a reference decision policy to make it meet obligations expressed in our logic. We illustrate our algorithms on DAC-MDPs which accurately abstract neural decision policies, and on toy gridworld environments.

Individual Fairness in Graphs Using Local and Global Structural Information

Yonas Sium — 2024-10-16

Graph neural networks are powerful graph representation learners in which node representations are highly influenced by features of neighboring nodes. Prior work on individual fairness in graphs has focused only on node features rather than structural issues. However, from the perspective of fairness in high-stakes applications, structural fairness is also important, and the learned representations may be systematically and undesirably biased against unprivileged individuals due to a lack of structural awareness in the learning process. In this work, we propose a pre-processing bias mitigation approach for individual fairness that gives importance to local and global structural features. We mitigate the local structure discrepancy of the graph embedding via a locally fair PageRank method. We address the global structure disproportion between pairs of nodes by introducing truncated singular value decomposition-based pairwise node similarities. Empirically, the proposed pre-processed fair structural features have superior performance in individual fairness metrics compared to the state-of-the-art methods while maintaining prediction performance.

Fairness in AI-Based Mental Health: Clinician Perspectives and Bias Mitigation

Gizem Sogancioglu — 2024-10-16

There is limited research on fairness in automated decision-making systems in the clinical domain, particularly in the mental health domain. Our study explores clinicians' perceptions of AI fairness through two distinct scenarios: violence risk assessment and depression phenotype recognition using textual clinical notes. We engage with clinicians through semi-structured interviews to understand their fairness perceptions and to identify appropriate quantitative fairness objectives for these scenarios. Then, we compare a set of bias mitigation strategies developed to improve at least one of the four selected fairness objectives. Our findings underscore the importance of carefully selecting fairness measures, as prioritizing less relevant measures can have a detrimental rather than a beneficial effect on model behavior in real-world clinical use.

Public vs Private Bodies: Who Should Run Advanced AI Evaluations and Audits? A Three-Step Logic Based on Case Studies of High-Risk Industries

Merlin Stein — 2024-10-16

Artificial Intelligence (AI) Safety Institutes and governments worldwide are deciding whether they evaluate and audit advanced AI themselves, support a private auditor ecosystem or do both. Auditing regimes have been established in a wide range of industry contexts to monitor and evaluate firms’ compliance with regulation. Auditing is a necessary governance tool to understand and manage the risks of a technology. This paper draws from nine such regimes to inform (i) who should audit which parts of advanced AI; and (ii) how much resources, competence and access public bodies may need to audit advanced AI effectively. First, the effective responsibility distribution between public and private auditors depends heavily on specific industry and audit conditions. On the basis of advanced AI’s risk profile, the sensitivity of information involved in the auditing process, and the high costs of verifying safety and benefit claims of AI Labs, we recommend that public bodies become directly involved in safety critical, especially gray- and white-box, AI model audits. Governance and security audits, which are well-established in other industry contexts, as well as black-box model audits, may be more efficiently provided by a private market of auditors under public oversight. Secondly, to effectively fulfill their role in advanced AI audits, public bodies need extensive access to models and facilities. Public bodies’ capacity should scale with the industry's risk level, size and market concentration, potentially requiring 100s of employees for auditing in large jurisdictions like the EU or US, like in nuclear safety and life sciences.

Surveys Considered Harmful? Reflecting on the Use of Surveys in AI Research, Development, and Governance

Mohammad Tahaei — 2024-10-16

Calls for engagement with the public in Artificial Intelligence (AI) research, development, and governance are increasing, leading to the use of surveys to capture people's values, perceptions, and experiences related to AI. In this paper, we critically examine the state of human participant surveys associated with these topics. Through both a reflexive analysis of a survey pilot spanning six countries and a systematic literature review of 44 papers featuring public surveys related to AI, we explore prominent perspectives and methodological nuances associated with surveys to date. We find that public surveys on AI topics are vulnerable to specific Western knowledge, values, and assumptions in their design, including in their positioning of ethical concepts and societal values, lack sufficient critical discourse surrounding deployment strategies, and demonstrate inconsistent forms of transparency in their reporting. Based on our findings, we distill provocations and heuristic questions for our community, to recognize the limitations of surveys for meeting the goals of engagement, and to cultivate shared principles to design, deploy, and interpret surveys cautiously and responsibly.

Enhancing Equitable Access to AI in Housing and Homelessness System of Care through Federated Learning

Musa Taib — 2024-10-16

The top priority of a Housing and Homelessness System of Care (HHSC) is to connect people experiencing homelessness to supportive housing. An HHSC typically consists of many agencies serving the same population. Information technology platforms differ in type and quality between agencies, so their data are usually isolated from one agency to another. Larger agencies may have sufficient data to train and test artificial intelligence (AI) tools but smaller agencies typically do not. To address this gap, we introduce a Federated Learning (FL) approach enabling all agencies to train a predictive model collaboratively without sharing their sensitive data. We demonstrate how FL can be used within an HHSC to provide all agencies equitable access to quality AI and further assist human decision-makers in the allocation of resources within HHSC. This is achieved while preserving the privacy of the people within the data by not sharing identifying information between agencies without their consent. Our experimental results using real-world HHSC data from a North American city demonstrate that our FL approach offers comparable performance with the idealized scenario of training the predictive model with data fully shared and linked between agencies.

Dynamics of Moral Behavior in Heterogeneous Populations of Learning Agents

Elizaveta Tennant — 2024-10-16

Growing concerns about safety and alignment of AI systems highlight the importance of embedding moral capabilities in artificial agents: a promising solution is the use of learning from experience, i.e., Reinforcement Learning. In multi-agent (social) environments, complex population-level phenomena may emerge from interactions between individual learning agents. Many of the existing studies rely on simulated social dilemma environments to study the interactions of independent learning agents; however, they tend to ignore the moral heterogeneity that is likely to be present in societies of agents in practice. For example, at different points in time a single learning agent may face opponents who are consequentialist (i.e., focused on maximizing outcomes over time), norm-based (i.e., conforming to specific norms), or virtue-based (i.e., considering a combination of different virtues). The extent to which agents' co-development may be impacted by such moral heterogeneity in populations is not well understood. In this paper, we present a study of the learning dynamics of morally heterogeneous populations interacting in a social dilemma setting. Using an Iterated Prisoner's Dilemma environment with a partner selection mechanism, we investigate the extent to which the prevalence of diverse moral agents in populations affects individual agents' learning behaviors and emergent population-level outcomes. We observe several types of non-trivial interactions between pro-social and anti-social agents, and find that certain types of moral agents are able to steer selfish agents towards more cooperative behavior.

Misrepresented Technological Solutions in Imagined Futures: The Origins and Dangers of AI Hype in the Research Community

Savannah Thais — 2024-10-16

Technology does not exist in a vacuum; technological development, media representation, public perception, and governmental regulation cyclically influence each other to produce the collective understanding of a technology's capabilities, utilities, and risks. When these capabilities are overestimated, there is an enhanced risk of subjecting the public to dangerous or harmful technology, artificially restricting research and development directions, and enabling misguided or detrimental policy. The dangers of technological hype are particularly relevant in the rapidly evolving space of AI. Centering the research community as a key player in the development and proliferation of hype, we examine the origins and risks of AI hype to the research community and society more broadly and propose a set of measures that researchers, regulators, and the public can take to mitigate these risks and reduce the prevalence of unfounded claims about the technology.

The Supply Chain Capitalism of AI: A Call to (Re)think Algorithmic Harms and Resistance (Extended Abstract)

Ana Valdivia — 2024-10-16

Artificial Intelligence (AI) is woven into a supply chain of capital, resources and human labour that has been neglected in debates about the social impact of this technology. Given the current surge in generative AI—which is estimated to use more natural resources than classic machine learning algorithms—it is vital that we better understand its production networks. Building on Tsing’s concept of supply chain capitalism, this paper offers a journey through the AI industry by illustrating the complex, diverse, opaque and global structures of the AI supply chain. The paper then illustrates an ethnographic research in Latin America revealing that AI’s rapid infrastructural growth may be precipitating environmental struggles. Investigating the supply chain capitalism of AI reveals that eco-political frictions are arising. This demands broad critical perspectives on AI studies from a critical perspective by considering the entire capitalist production line of its industry.

Decolonial AI Alignment: Openness, Visesa-Dharma, and Including Excluded Knowledges

Kush R. Varshney — 2024-10-16

Prior work has explicated the coloniality of artificial intelligence (AI) development and deployment through mechanisms such as extractivism, automation, sociological essentialism, surveillance, and containment. However, that work has not engaged much with alignment: teaching behaviors to a large language model (LLM) in line with desired values, and has not considered a mechanism that arises within that process: moral absolutism---a part of the coloniality of knowledge. Colonialism has a history of altering the beliefs and values of colonized peoples; in this paper, I argue that this history is recapitulated in current LLM alignment practices and technologies. Furthermore, I suggest that AI alignment be decolonialized using three forms of openness: openness of models, openness to society, and openness to excluded knowledges. This suggested approach to decolonial AI alignment uses ideas from the argumentative moral philosophical tradition of Hinduism, which has been described as an open-source religion. One concept used is viśeṣa-dharma, or particular context-specific notions of right and wrong. At the end of the paper, I provide a suggested reference architecture to work toward the proposed framework.

Medical AI, Categories of Value Conflict, and Conflict Bypasses

Gavin Victor — 2024-10-16

It is becoming clear that, in the process of aligning AI with human values, one glaring ethical problem is that of value conflict. It is not obvious what we should do when two compelling values (such as autonomy and safety) come into conflict with one another in the design or implementation of a medical AI technology. This paper shares findings from a scoping review at the intersection of three concepts—AI, moral value, and health—that have to do with value conflict and arbitration. The paper looks at some important and unique cases of value conflict, and then describes three possible categories of value conflict: personal value conflict, interpersonal or intercommunal value conflict, and definitional value conflict. It then describes three general paths forward in addressing value conflict: additional ethical theory, additional empirical evidence, and bypassing the conflict altogether. Finally, it reflects on the efficacy of these three paths forward as ways of addressing the three categories of value conflict, and motions toward what is needed for better approaching value conflicts in medical AI.

Decoding Multilingual Moral Preferences: Unveiling LLM's Biases through the Moral Machine Experiment

Karina Vida — 2024-10-16

Large language models (LLMs) increasingly find their way into the most diverse areas of our everyday lives. They indirectly influence people's decisions or opinions through their daily use. Therefore, understanding how and which moral judgements these LLMs make is crucial. However, morality is not universal and depends on the cultural background. This raises the question of whether these cultural preferences are also reflected in LLMs when prompted in different languages or whether moral decision-making is consistent across different languages. So far, most research has focused on investigating the inherent values of LLMs in English. While a few works conduct multilingual analyses of moral bias in LLMs in a multilingual setting, these analyses do not go beyond atomic actions. To the best of our knowledge, a multilingual analysis of moral bias in dilemmas has not yet been conducted. To address this, our paper builds on the moral machine experiment (MME) to investigate the moral preferences of five LLMs, Falcon, Gemini, Llama, GPT, and MPT, in a multilingual setting and compares them with the preferences collected from humans belonging to different cultures. To accomplish this, we generate 6500 scenarios of the MME and prompt the models in ten languages on which action to take. Our analysis reveals that all LLMs inhibit different moral biases to some degree and that they not only differ from the human preferences but also across multiple languages within the models themselves. Moreover, we find that almost all models, particularly Llama 3, divert greatly from human values and, for instance, prefer saving fewer people over saving more.

PICE: Polyhedral Complex Informed Counterfactual Explanations

Mattia Jacopo Villani — 2024-10-16

Polyhedral geometry can be used to shed light on the behaviour of piecewise linear neural networks, such as ReLU-based architectures. Counterfactual explanations are a popular class of methods for examining model behaviour by comparing a query to the closest point with a different label, subject to constraints. We present a new algorithm, Polyhedral-complex Informed Counterfactual Explanations (PICE), which leverages the decomposition of the piecewise linear neural network into a polyhedral complex to find counterfactuals that are provably minimal in the Euclidean norm and exactly on the decision boundary for any given query. Moreover, we develop variants of the algorithm that target popular counterfactual desiderata such as sparsity, robustness, speed, plausibility, and actionability. We empirically show on four publicly available real-world datasets that our method outperforms other popular techniques to find counterfactuals and adversarial attacks by distance to decision boundary and distance to query. Moreover, we successfully improve our baseline method in the dimensions of the desiderata we target, as supported by experimental evaluations.

Strategies for Increasing Corporate Responsible AI Prioritization

Angelina Wang — 2024-10-16

Responsible artificial intelligence (RAI) is increasingly recognized as a critical concern. However, the level of corporate RAI prioritization has not kept pace. In this work, we conduct 16 semi-structured interviews with practitioners to investigate what has historically motivated companies to increase the prioritization of RAI. What emerges is a complex story of conflicting and varied factors, but we bring structure to the narrative by highlighting the different strategies available to employ, and point to the actors with access to each. While there are no guaranteed steps for increasing RAI prioritization, we paint the current landscape of motivators so that practitioners can learn from each other, and put forth our own selection of promising directions forward.

Operationalizing Content Moderation “Accuracy” in the Digital Services Act

Johnny Tian-Zheng Wei — 2024-10-16

The Digital Services Act, recently adopted by the EU, requires social media platforms to report the ``accuracy'' of their automated content moderation systems. The colloquial term is vague, or open-textured---the literal accuracy (number of correct predictions divided by the total) is not suitable for problems with large class imbalance, and the ground truth and dataset to measure accuracy against is unspecified. Without further specification, the regulatory requirement allows for deficient reporting. In this interdisciplinary work, we operationalize ``accuracy'' reporting by refining legal concepts and relating them to technical implementation. We start by elucidating the legislative purpose of the Act to legally justify an interpretation of ``accuracy'' as precision and recall. These metrics remain informative in class imbalanced settings, and reflect the proportional balancing of Fundamental Rights of the EU Charter. We then focus on the estimation of recall, as its naive estimation can incur extremely high annotation costs and disproportionately interfere with the platform's right to conduct business. Through a simulation study, we show that recall can be efficiently estimated using stratified sampling with trained classifiers, and provide concrete recommendations for its application. Finally, we present a case study of recall reporting for a subset of Reddit under the Act. Based on the language in the Act, we identify a number of ways recall could be reported due to underspecification. We report on one possibility using our improved estimator, and discuss the implications and areas for further legal clarification.

How Do AI Companies “Fine-Tune” Policy? Examining Regulatory Capture in AI Governance

Kevin Wei — 2024-10-16

Industry actors in the United States have gained extensive influence in conversations about the regulation of general-purpose artificial intelligence (AI) systems. Although industry participation is an important part of the policy process, it can also cause regulatory capture, whereby industry co-opts regulatory regimes to prioritize private over public welfare. Capture of AI policy by AI developers and deployers could hinder such regulatory goals as ensuring the safety, fairness, beneficence, transparency, or innovation of general-purpose AI systems. In this paper, we first introduce different models of regulatory capture from the social science literature. We then present results from interviews with 17 AI policy experts on what policy outcomes could compose regulatory capture in US AI policy, which AI industry actors are influencing the policy process, and whether and how AI industry actors attempt to achieve outcomes of regulatory capture. Experts were primarily concerned with capture leading to a lack of AI regulation, weak regulation, or regulation that over-emphasizes certain policy goals over others. Experts most commonly identified agenda-setting (15 of 17 interviews), advocacy (13), academic capture (10), information management (9), cultural capture through status (7), and media capture (7) as channels for industry influence. To mitigate these particular forms of industry influence, we recommend systemic changes in developing technical expertise in government and civil society, independent funding streams for the AI ecosystem, increased transparency and ethics requirements, greater civil society access to policy, and various procedural safeguards.

Automate or Assist? The Role of Computational Models in Identifying Gendered Discourse in US Capital Trial Transcripts

Andrea W Wen-Yi — 2024-10-16

The language used by US courtroom actors in criminal trials has long been studied for biases. However, systematic studies for bias in high-stakes court trials have been difficult, due to the nuanced nature of bias and the legal expertise required. Large language models offer the possibility to automate annotation. But validating the computational approach requires both an understanding of how automated methods fit in existing annotation workflows and what they really offer. We present a case study of adding a computational model to a complex and high-stakes problem: identifying gender-biased language in US capital trials for women defendants. Our team of experienced death-penalty lawyers and NLP technologists pursue a three-phase study: first annotating manually, then training and evaluating computational models, and finally comparing expert annotations to model predictions. Unlike many typical NLP tasks, annotating for gender bias in months-long capital trials is complicated, with many individual judgment calls. Contrary to standard arguments for automation that are based on efficiency and scalability, legal experts find the computational models most useful in providing opportunities to reflect on their own bias in annotation and to build consensus on annotation rules. This experience suggests that seeking to replace experts with computational models for complex annotation is both unrealistic and undesirable. Rather, computational models offer valuable opportunities to assist the legal experts in annotation-based studies.

A Relational Justification of AI Democratization

Bauke Wielinga — 2024-10-16

While much has been written about what democratized AI should look like, there has been surprisingly little attention for the normative grounds of AI democratization. Existing calls for AI democratization that do make explicit arguments broadly fall into two categories: outcome-based and legitimacy-based, corresponding to outcome-based and process-based views of procedural justice respectively. This paper argues that we should favor relational justifications of AI democratization to outcome-based ones, because the former additionally provide outcome-independent reasons for AI democratization. Moreover, existing legitimacy-based arguments often leave the why of AI democratization implicit and instead focus on the how. We present two relational arguments for AI democratization: one based on empirical findings regarding the perceived importance of relational features of decision-making procedures, and one based on Iris Marion Young’s conception of justice, according to which the main forms of injustice are domination and oppression. We show how these arguments lead to requirements for procedural fairness and thus also offer guidance on the how of AI democratization. Finally, we consider several objections to AI democratization, including worries concerning epistemic exploitation.

Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval

Kyra Wilson — 2024-10-16

Artificial intelligence (AI) hiring tools have revolutionized resume screening, and large language models (LLMs) have the potential to do the same. However, given the biases which are embedded within LLMs, it is unclear whether they can be used in this scenario without disadvantaging groups based on their protected attributes. In this work, we investigate the possibilities of using LLMs in a resume screening setting via a document retrieval framework that simulates job candidate selection. Using that framework, we then perform a resume audit study to determine whether a selection of Massive Text Embedding (MTE) models are biased in resume screening scenarios. We simulate this for nine occupations, using a collection of over 500 publicly available resumes and 500 job descriptions. We find that the MTEs are biased, significantly favoring White-associated names in 85.1% of cases and female-associated names in only 11.1% of cases, with a minority of cases showing no statistically significant differences. Further analyses show that Black males are disadvantaged in up to 100% of cases, replicating real-world patterns of bias in employment settings, and validate three hypotheses of intersectionality. We also find an impact of document length as well as the corpus frequency of names in the selection of resumes. These findings have implications for widely used AI tools that are automating employment, fairness, and tech policy.

When and Why is Persuasion Hard? A Computational Complexity Result

Zachary Wojtowicz — 2024-10-16

As generative foundation models improve, they also tend to become more persuasive, raising concerns that AI automation will enable governments, firms, and other actors to manipulate beliefs with unprecedented scale and effectiveness at virtually no cost. The full economic and social ramifications of this trend have been difficult to foresee, however, given that we currently lack a complete theoretical understanding of why persuasion is costly for human labor to produce in the first place. This paper places human and AI agents on a common conceptual footing by formalizing informational persuasion as a mathematical decision problem and characterizing its computational complexity. A novel proof establishes that persuasive messages are challenging to discover (NP-Hard) but easy to adopt if supplied by others (NP). This asymmetry helps explain why people are susceptible to persuasion, even in contexts where all relevant information is publicly available. The result also illuminates why litigation, strategic communication, and other persuasion-oriented activities have historically been so human capital intensive, and it provides a new theoretical basis for studying how AI will impact various industries.

The Implications of Open Generative Models in Human-Centered Data Science Work: A Case Study with Fact-Checking Organizations

Robert Wolfe — 2024-10-16

Calls to use open generative language models in academic research have highlighted the need for reproducibility and transparency in scientific research. However, the impact of generative AI extends well beyond academia, as corporations and public interest organizations have begun integrating these models into their data science pipelines. We expand this lens to include the impact of open models on organizations, focusing specifically on fact-checking organizations, which use AI to observe and analyze large volumes of circulating misinformation, yet must also ensure the reproducibility and impartiality of their work. We wanted to understand where fact-checking organizations use open models in their data science pipelines; what motivates their use of open models or proprietary models; and how their use of open or proprietary models can inform research on the societal impact of generative AI. To answer these questions, we conducted an interview study with N=24 professionals at 20 fact-checking organizations on six continents. Based on these interviews, we offer a five-component conceptual model of where fact-checking organizations employ generative AI to support or automate parts of their data science pipeline, including Data Ingestion, Data Analysis, Data Retrieval, Data Delivery, and Data Sharing. We then provide taxonomies of fact-checking organizations' motivations for using open models and the limitations that prevent them for further adopting open models, finding that they prefer open models for Organizational Autonomy, Data Privacy and Ownership, Application Specificity, and Capability Transparency. However, they nonetheless use proprietary models due to perceived advantages in Performance, Usability, and Safety, as well as Opportunity Costs related to participation in emerging generative AI ecosystems. Finally, we propose a research agenda to address limitations of both open and proprietary models. Our research provides novel perspective on open models in data-driven organizations.

ML-EAT: A Multilevel Embedding Association Test for Interpretable and Transparent Social Science

Robert Wolfe — 2024-10-16

This research introduces the Multilevel Embedding Association Test (ML-EAT), a method designed for interpretable and transparent measurement of intrinsic bias in language technologies. The ML-EAT addresses issues of ambiguity and difficulty in interpreting the traditional EAT measurement by quantifying bias at three levels of increasing granularity: the differential association between two target concepts with two attribute concepts; the individual effect size of each target concept with two attribute concepts; and the association between each individual target concept and each individual attribute concept. Using the ML-EAT, this research defines a taxonomy of EAT patterns describing the nine possible outcomes of an embedding association test, each of which is associated with a unique EAT-Map, a novel four-quadrant visualization for interpreting the ML-EAT. Empirical analysis of static and diachronic word embeddings, GPT-2 language models, and a CLIP language-and-image model shows that EAT patterns add otherwise unobservable information about the component biases that make up an EAT; reveal the effects of prompting in zero-shot models; and can also identify situations when cosine similarity is an ineffective metric, rendering an EAT unreliable. Our work contributes a method for rendering bias more observable and interpretable, improving the transparency of computational investigations into human minds and societies.

Representation Bias of Adolescents in AI: A Bilingual, Bicultural Study

Robert Wolfe — 2024-10-16

Popular and news media often portray teenagers with sensationalism, as both a risk to society and at risk from society. As AI begins to absorb some of the epistemic functions of traditional media, we study how teenagers in two countries speaking two languages: 1) are depicted by AI, and 2) how they would prefer to be depicted. Specifically, we study the biases about teenagers learned by static word embeddings (SWEs) and generative language models (GLMs), comparing these with the perspectives of adolescents living in the U.S. and Nepal. We find English-language SWEs associate teenagers with societal problems, and more than 50% of the 1,000 words most associated with teenagers in the pretrained GloVe SWE reflect such problems. Given prompts about teenagers, 30% of outputs from GPT2-XL and 29% from LLaMA-2-7B GLMs discuss societal problems, most commonly violence, but also drug use, mental illness, and sexual taboo. Nepali models, while not free of such associations, are less dominated by social problems. Data from workshops with N=13 U.S. adolescents and N=18 Nepalese adolescents show that AI presentations are disconnected from teenage life, which revolves around activities like school and friendship. Participant ratings of how well 20 trait words describe teens are decorrelated from SWE associations, with Pearson's rho=.02, n.s. in English FastText and rho=.06, n.s. GloVe; and rho=.06, n.s. in Nepali FastText and rho=-.23, n.s. in GloVe. U.S. participants suggested AI could fairly present teens by highlighting diversity, while Nepalese participants centered positivity. Participants were optimistic that, if it learned from adolescents, rather than media sources, AI could help mitigate stereotypes. Our work offers an understanding of the ways SWEs and GLMs misrepresent a developmentally vulnerable group and provides a template for less sensationalized characterization.

Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI

Robert Wolfe — 2024-10-16

Multimodal AI models capable of associating images and text hold promise for numerous domains, ranging from automated image captioning to accessibility applications for blind and low-vision users. However, uncertainty about bias has in some cases limited their adoption and availability. In the present work, we study 43 CLIP vision-language models to determine whether they learn human-like facial impression biases, and we find evidence that such biases are reflected across three distinct CLIP model families. We show for the first time that the the degree to which a bias is shared across a society predicts the degree to which it is reflected in a CLIP model. Human-like impressions of visually unobservable attributes, like trustworthiness and sexuality, emerge only in models trained on the largest dataset, indicating that a better fit to uncurated cultural data results in the reproduction of increasingly subtle social biases. Moreover, we use a hierarchical clustering approach to show that dataset size predicts the extent to which the underlying structure of facial impression bias resembles that of facial impression bias in humans. Finally, we show that Stable Diffusion models employing CLIP as a text encoder learn facial impression biases, and that these biases intersect with racial biases in Stable Diffusion XL-Turbo. While pretrained CLIP models may prove useful for scientific studies of bias, they will also require significant dataset curation when intended for use as general-purpose models in a zero-shot setting.

Stable Diffusion Exposed: Gender Bias from Prompt to Image

Yankun Wu — 2024-10-16

Several studies have raised awareness about social biases in image generative models, demonstrating their predisposition towards stereotypes and imbalances. This paper contributes to this growing body of research by introducing an evaluation protocol that analyzes the impact of gender indicators at every step of the generation process on Stable Diffusion images. Leveraging insights from prior work, we explore how gender indicators not only affect gender presentation but also the representation of objects and layouts within the generated images. Our findings include the existence of differences in the depiction of objects, such as instruments tailored for specific genders, and shifts in overall layouts. We also reveal that neutral prompts tend to produce images more aligned with masculine prompts than their feminine counterparts. We further explore where bias originates through representational disparities and how it manifests in the images via prompt-image dependencies, and provide recommendations for developers and users to mitigate potential bias in image generation.

Non-linear Welfare-Aware Strategic Learning

Tian Xie — 2024-10-16

This paper studies algorithmic decision-making in the presence of strategic individual behaviors, where an ML model is used to make decisions about human agents and the latter can adapt their behavior strategically to improve their future data. Existing results on strategic learning have largely focused on the linear setting where agents with linear labeling functions best respond to a (noisy) linear decision policy. Instead, this work focuses on general non-linear settings where agents respond to the decision policy with only "local information" of the policy. Moreover, we simultaneously consider objectives of maximizing decision-maker welfare (model prediction accuracy), social welfare (agent improvement caused by strategic behaviors), and agent welfare (the extent that ML underestimates the agents). We first generalize the agent best response model in previous works to the non-linear setting and then investigate the compatibility of welfare objectives. We show the three welfare can attain the optimum simultaneously only under restrictive conditions which are challenging to achieve in non-linear settings. The theoretical results imply that existing works solely maximizing the welfare of a subset of parties usually diminish the welfare of others. We thus claim the necessity of balancing the welfare of each party in non-linear settings and propose an irreducible optimization algorithm suitable for general strategic learning. Experiments on synthetic and real data validate the proposed algorithm.

Algorithmic Decision-Making under Agents with Persistent Improvement

Tian Xie — 2024-10-16

This paper studies algorithmic decision-making under human strategic behavior, where a decision-maker uses an algorithm to make decisions about human agents, and the latter with information about the algorithm may exert effort strategically and improve to receive favorable decisions. Unlike prior works that assume agents benefit from their efforts immediately, we consider realistic scenarios where the impacts of these efforts are persistent and agents benefit from efforts by making improvements gradually. We first develop a dynamic model to characterize persistent improvements and based on this construct a Stackelberg game to model the interplay between agents and the decision-maker. We analytically characterize the equilibrium strategies and identify conditions under which agents have incentives to invest efforts to improve their qualifications. With the dynamics, we then study how the decision-maker can design an optimal policy to incentivize the largest improvements inside the agent population. We also extend the model to settings where 1) agents may be dishonest and game the algorithm into making favorable but erroneous decisions; 2) honest efforts are forgettable and not sufficient to guarantee persistent improvements. With the extended models, we further examine conditions under which agents prefer honest efforts over dishonest behavior and the impacts of forgettable efforts.

Tracing the Evolution of Information Transparency for OpenAI’s GPT Models through a Biographical Approach

Zhihan Xu — 2024-10-16

Information transparency, the open disclosure of information about models, is crucial for proactively evaluating the potential societal harm of large language models (LLMs) and developing effective risk mitigation measures. Adapting the biographies of artifacts and practices (BOAP) method from science and technology studies, this study analyzes the evolution of information transparency within OpenAI’s Generative Pre-trained Transformers (GPT) model reports and usage policies from its inception in 2018 to GPT-4, one of today’s most capable LLMs. To assess the breadth and depth of transparency practices, we develop a 9-dimensional, 3-level analytical framework to evaluate the comprehensiveness and accessibility of information disclosed to various stakeholders. Findings suggest that while model limitations and downstream usages are increasingly clarified, model development processes have become more opaque. Transparency remains minimal in certain aspects, such as model explainability and real-world evidence of LLM impacts, and the discussions on safety measures such as technical interventions and regulation pipelines lack in-depth details. The findings emphasize the need for enhanced transparency to foster accountability and ensure responsible technological innovations.

LLM Voting: Human Choices and AI Collective Decision-Making

Joshua C. Yang — 2024-10-16

This paper investigates the voting behaviors of Large Language Models (LLMs), specifically GPT-4 and LLaMA-2, their biases, and how they align with human voting patterns. Our methodology involved using a dataset from a human voting experiment to establish a baseline for human preferences and conducting a corresponding experiment with LLM agents. We observed that the choice of voting methods and the presentation order influenced LLM voting outcomes. We found that varying the persona can reduce some of these biases and enhance alignment with human choices. While the Chain-of-Thought approach did not improve prediction accuracy, it has potential for AI explainability in the voting process. We also identified a trade-off between preference diversity and alignment accuracy in LLMs, influenced by different temperature settings. Our findings indicate that LLMs may lead to less diverse collective outcomes and biased assumptions when used in voting scenarios, emphasizing the need for cautious integration of LLMs into democratic processes.

You Still See Me: How Data Protection Supports the Architecture of AI Surveillance

Rui-Jie Yew — 2024-10-16

Data forms the backbone of artificial intelligence (AI). Privacy and data protection laws thus have strong bearing on AI systems. Shielded by the rhetoric of compliance with data protection and privacy regulations, privacy-preserving techniques have enabled the extraction of more and new forms of data. We illustrate how the application of privacy-preserving techniques in the development of AI systems--from private set intersection as part of dataset curation to homomorphic encryption and federated learning as part of model computation--can further support surveillance infrastructure under the guise of regulatory permissibility. Finally, we propose technology and policy strategies to evaluate privacy-preserving techniques in light of the protections they actually confer. We conclude by highlighting the role that technologists could play in devising policies that combat surveillance AI technologies.

Mitigating Urban-Rural Disparities in Contrastive Representation Learning with Satellite Imagery

Miao Zhang — 2024-10-16

Satellite imagery is being leveraged for many societally critical tasks across climate, economics, and public health. Yet, because of heterogeneity in landscapes (e.g. how a road looks in different places), models can show disparate performance across geographic areas. Given the important potential of disparities in algorithmic systems used in societal contexts, here we consider the risk of urban-rural disparities in identification of land-cover features. This is via semantic segmentation (a common computer vision task in which image regions are labelled according to what is being shown) which uses pre-trained image representations generated via contrastive self-supervised learning. We propose fair dense representation with contrastive learning (FairDCL) as a method for de-biasing the multi-level latent space of a convolution neural network. The method improves feature identification by removing spurious latent representations which are disparately distributed across urban and rural areas, and is achieved in an unsupervised way by contrastive pre-training. The pre-trained image representation mitigates downstream urban-rural prediction disparities and outperforms state-of-the-art baselines on real-world satellite images. Embedding space evaluation and ablation studies further demonstrate FairDCL’s robustness. As generalizability and robustness in geographic imagery is a nascent topic, our work motivates researchers to consider metrics beyond average accuracy in such applications.

Ontology of Belief Diversity: A Community-Based Epistemological Approach

Richard Zhang — 2024-10-16

AI applications across classification, fairness, and human interaction often implicitly require ontologies of social concepts. Constructing these well – especially when there are many relevant categories – is a controversial task but is crucial for achieving meaningful inclusivity. Here, we focus on developing a pragmatic ontology of belief systems, which isa complex and often controversial space. By iterating on our community-based design until mutual agreement is reached, we found that epistemological methods were best for categorizing the fundamental ways beliefs differ, maximally respecting our principles of inclusivity and brevity. We demonstrate our methodology’s utility and interpretability via user studies in term annotation and sentiment analysis experiments for belief fairness in language models

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society

AIES 2024 Frontmatter

PoliTune: Analyzing the Impact of Data Selection and Fine-Tuning on Economic and Political Biases in Large Language Models

All Too Human? Mapping and Mitigating the Risk from Anthropomorphic AI

Estimating Weights of Reasons Using Metaheuristics: A Hybrid Approach to Machine Ethics

Introducing the AI Governance and Regulatory Archive (AGORA): An Analytic Infrastructure for Navigating the Emerging AI Governance Landscape

Understanding Intrinsic Socioeconomic Biases in Large Language Models

Nothing Comes Without Its World – Practical Challenges of Aligning LLMs to Situated Human Values through RLHF

Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

Public Attitudes on Performance for Algorithmic and Human Decision-Makers (Extended Abstract)

Simulating Policy Impacts: Developing a Generative Scenario Writing Method to Evaluate the Perceived Effects of Regulation

The Origin and Opportunities of Developers’ Perceived Code Accountability in Open Source AI Software Development

Gender in Pixels: Pathways to Non-binary Representation in Computer Vision

Legal Minds, Algorithmic Decisions: How LLMs Apply Constitutional Principles in Complex Scenarios

A Formal Account of Trustworthiness: Connecting Intrinsic and Perceived Trustworthiness

Unsocial Intelligence: An Investigation of the Assumptions of AGI Discourse

On The Stability of Moral Preferences: A Problem with Computational Elicitation Methods

Co-designing an AI Impact Assessment Report Template with AI Practitioners and AI Compliance Experts

Foundation Model Transparency Reports

Ecosystem Graphs: Documenting the Foundation Model Supply Chain

Trustworthy Social Bias Measurement

Views on AI Aren't Binary — They’re Plural (Extended Abstract)

A Qualitative Study on Cultural Hegemony and the Impacts of AI

An FDA for AI? Pitfalls and Plausibility of Approval Regulation for Frontier Artificial Intelligence

Why Am I Still Seeing This: Measuring the Effectiveness of Ad Controls and Explanations in AI-Mediated Ad Targeting Systems

Coordinated Flaw Disclosure for AI: Beyond Security Vulnerabilities

Algorithm-Assisted Decision Making and Racial Disparities in Housing: A Study of the Allegheny Housing Assessment Tool

Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation

MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks

Sponsored is the New Organic: Implications of Sponsored Results on Quality of Search Results in the Amazon Marketplace

APPRAISE: a Governance Framework for Innovation with Artificial Intelligence Systems

Scaling Laws Do Not Scale

What Makes An Expert? Reviewing How ML Researchers Define "Expert"

SoUnD Framework: Analyzing (So)cial Representation in (Un)structured (D)ata

Outlier Detection Bias Busted: Understanding Sources of Algorithmic Bias through Data-centric Factors

Legitimating Emotion Tracking Technologies in Driver Monitoring Systems

Representation Magnitude Has a Liability to Privacy Vulnerability

Red-Teaming for Generative AI: Silver Bullet or Security Theater?

How Should AI Decisions Be Explained? Requirements for Explanations from the Perspective of European Law

Surviving in Diverse Biases: Unbiased Dataset Acquisition in Online Data Market for Fair Model Training

“I Don’t See Myself Represented Here at All”: User Experiences of Stable Diffusion Outputs Containing Representational Harms across Gender Identities and Nationalities

Do Generative AI Models Output Harm while Representing Non-Western Cultures: Evidence from A Community-Centered Approach

Interpretations, Representations, and Stereotypes of Caste within Text-to-Image Generators

The PPOu Framework: A Structured Approach for Assessing the Likelihood of Malicious Use of Advanced AI Systems

Risks from Language Models for Automated Mental Healthcare: Ethics and Structure for Implementation (Extended Abstract)

Compassionate AI for Moral Decision-Making, Health, and Well-Being

A Conceptual Framework for Ethical Evaluation of Machine Learning Systems

Identifying Implicit Social Biases in Vision-Language Models

A Causal Framework to Evaluate Racial Bias in Law Enforcement Systems

Contributory Injustice, Epistemic Calcification and the Use of AI Systems in Healthcare

ExploreGen: Large Language Models for Envisioning the Uses and Risks of AI Technologies

What's Distributive Justice Got to Do with It? Rethinking Algorithmic Fairness from a Perspective of Approximate Justice

Afrofuturist Values for the Metaverse (Extended Abstract)

The Ethico-Politics of Design Toolkits: Responsible AI Tools, From Big Tech Guidelines to Feminist Ideation Cards (Extended Abstract)

LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins

As an AI Language Model, "Yes I Would Recommend Calling the Police": Norm Inconsistency in LLM Decision-Making

Breaking the Global North Stereotype: A Global South-centric Benchmark Dataset for Auditing and Mitigating Biases in Facial Recognition Systems

Reflection of Its Creators: Qualitative Analysis of General Public and Expert Perceptions of Artificial Intelligence

Virtual Assistants Are Unlikely to Reduce Patient Non-Disclosure

Do Responsible AI Artifacts Advance Stakeholder Goals? Four Key Barriers Perceived by Legal and Civil Stakeholders

AI Failure Loops in Feminized Labor: Understanding the Interplay of Workplace AI and Occupational Devaluation

Epistemic Injustice in Generative AI

Vernacularizing Taxonomies of Harm is Essential for Operationalizing Holistic AI Safety

On the Pros and Cons of Active Learning for Moral Preference Elicitation

Algorithmic Fairness From the Perspective of Legal Anti-discrimination Principles

What’s Your Stake in Sustainability of AI?: An Informed Insider’s Guide

Anticipating the Risks and Benefits of Counterfactual World Simulation Models (Extended Abstract)

Acceptable Use Policies for Foundation Models

Responsible Reporting for Frontier AI Development

On the Trade-offs between Adversarial Robustness and Actionable Explanations

Observing Context Improves Disparity Estimation when Race is Unobserved

Human vs. Machine: Behavioral Differences between Expert Humans and Language Models in Wargame Simulations

Racial and Neighborhood Disparities in Legal Financial Obligations in Jefferson County, Alabama

Compute North vs. Compute South: The Uneven Possibilities of Compute-based AI Governance Around the Globe

How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine Studies

On Feasibility of Intent Obfuscating Attacks

“Democratizing AI” and the Concern of Algorithmic Injustice (Extended Abstract)

Foundations for Unfairness in Anomaly Detection - Case Studies in Facial Imaging Data

Uncovering the Gap: Challeging the Agential Nature of AI Responsibility Problems (Extended Abstract)

Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil