Bridging the Copyright Gap: Do Large Vision-Language Models Recognize and Respect Copyrighted Content?

Naen Xu; Jinghuai Zhang; Changjiang Li; Hengyu An; Chunyi Zhou; Jun Wang; Boyu Xu; Yuyuan Li; Tianyu Du; Shouling Ji

doi:10.1609/aaai.v40i42.40910

Authors

Naen Xu Zhejiang University
Jinghuai Zhang University of California, Los Angeles
Changjiang Li Palo Alto Networks
Hengyu An Zhejiang University
Chunyi Zhou Zhejiang University
Jun Wang OPPO Research Institute
Boyu Xu Hangzhou Xuanye Digital Technology Co., Ltd
Yuyuan Li Hangzhou Dianzi University
Tianyu Du Zhejiang University
Shouling Ji Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v40i42.40910

Abstract

Large vision-language models (LVLMs) have achieved remarkable advancements in multimodal reasoning tasks. However, their widespread accessibility raises critical concerns about potential copyright infringement. Will LVLMs accurately recognize and comply with copyright regulations when encountering copyrighted content (i.e., user input, retrieved documents) in the context? Failure to comply with copyright regulations may lead to serious legal and ethical consequences, particularly when LVLMs generate responses based on copyrighted materials (e.g., retrieved book experts, news reports). In this paper, we present a comprehensive evaluation of various LVLMs, examining how they handle copyrighted content – such as book excerpts, news articles, music lyrics, and code documentation when they are presented as visual inputs. To systematically measure copyright compliance, we introduce a large-scale benchmark dataset comprising 50,000 multimodal query-content pairs designed to evaluate how effectively LVLMs handle queries that could lead to copyright infringement. Given that real-world copyrighted content may or may not include a copyright notice, the dataset includes query-content pairs in two distinct scenarios: with and without a copyright notice. For the former, we extensively cover four types of copyright notices to account for different cases. Our evaluation reveals that even state-of-the-art closed-source LVLMs exhibit significant deficiencies in recognizing and respecting the copyrighted content, even when presented with the copyright notice. To solve this limitation, we introduce a novel tool-augmented defense framework for copyright compliance, which reduces infringement risks in all scenarios. Our findings underscore the importance of developing copyright-aware LVLMs to ensure the responsible and lawful use of copyrighted content.

Bridging the Copyright Gap: Do Large Vision-Language Models Recognize and Respect Copyrighted Content?

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information