Interpretations, Representations, and Stereotypes of Caste within Text-to-Image Generators

Sourojit Ghosh

doi:10.1609/aies.v7i1.31652

Authors

Sourojit Ghosh University of Washington

DOI:

https://doi.org/10.1609/aies.v7i1.31652

Abstract

The surge in the popularity of text-to-image generators (T2Is) has been matched by extensive research into ensuring fairness and equitable outcomes, with a focus on how they impact society. However, such work has typically focused on globally-experienced identities or centered Western contexts. In this paper, we address interpretations, representations, and stereotypes surrounding a tragically underexplored context in T2I research: caste. We examine how the T2I Stable Diffusion displays people of various castes, and what professions they are depicted as performing. Generating 100 images per prompt, we perform CLIP-cosine similarity comparisons with default depictions of an `Indian person’ by Stable Diffusion, and explore patterns of similarity. Our findings reveal how Stable Diffusion outputs perpetuate systems of `castelessness’, equating Indianness with high-castes and depicting caste-oppressed identities with markers of poverty. In particular, we note the stereotyping and representational harm towards the historically-marginalized Dalits, prominently depicted as living in rural areas and always at protests. Our findings underscore a need for a caste-aware approach towards T2I design, and we conclude with design recommendations.

Interpretations, Representations, and Stereotypes of Caste within Text-to-Image Generators

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section