Adventures of Trustworthy Vision-Language Models: A Survey

Mayank Vatsa; Anubhooti Jain; Richa Singh

doi:10.1609/aaai.v38i20.30275

Authors

Mayank Vatsa IIT Jodhpur, India
Anubhooti Jain IIT Jodhpur, India
Richa Singh IIT Jodhpur, India

DOI:

https://doi.org/10.1609/aaai.v38i20.30275

Keywords:

Vision-Language Models, Interpretability, Bias, Robustness

Abstract

Recently, transformers have become incredibly popular in computer vision and vision-language tasks. This notable rise in their usage can be primarily attributed to the capabilities offered by attention mechanisms and the outstanding ability of transformers to adapt and apply themselves to a variety of tasks and domains. Their versatility and state-of-the-art performance have established them as indispensable tools for a wide array of applications. However, in the constantly changing landscape of machine learning, the assurance of the trustworthiness of transformers holds utmost importance. This paper conducts a thorough examination of vision-language transformers, employing three fundamental principles of responsible AI: Bias, Robustness, and Interpretability. The primary objective of this paper is to delve into the intricacies and complexities associated with the practical use of transformers, with the overarching goal of advancing our comprehension of how to enhance their reliability and accountability.

Adventures of Trustworthy Vision-Language Models: A Survey

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription