Interpretations, Representations, and Stereotypes of Caste within Text-to-Image Generators

Authors

  • Sourojit Ghosh University of Washington

DOI:

https://doi.org/10.1609/aies.v7i1.31652

Abstract

The surge in the popularity of text-to-image generators (T2Is) has been matched by extensive research into ensuring fairness and equitable outcomes, with a focus on how they impact society. However, such work has typically focused on globally-experienced identities or centered Western contexts. In this paper, we address interpretations, representations, and stereotypes surrounding a tragically underexplored context in T2I research: caste. We examine how the T2I Stable Diffusion displays people of various castes, and what professions they are depicted as performing. Generating 100 images per prompt, we perform CLIP-cosine similarity comparisons with default depictions of an `Indian person’ by Stable Diffusion, and explore patterns of similarity. Our findings reveal how Stable Diffusion outputs perpetuate systems of `castelessness’, equating Indianness with high-castes and depicting caste-oppressed identities with markers of poverty. In particular, we note the stereotyping and representational harm towards the historically-marginalized Dalits, prominently depicted as living in rural areas and always at protests. Our findings underscore a need for a caste-aware approach towards T2I design, and we conclude with design recommendations.

Downloads

Published

2024-10-16

How to Cite

Ghosh, S. (2024). Interpretations, Representations, and Stereotypes of Caste within Text-to-Image Generators. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7(1), 490-502. https://doi.org/10.1609/aies.v7i1.31652