MFL-Owner: Ownership Protection for Multi-modal Federated Learning via Orthogonal Transform Watermark

Authors

  • Keke Gai Beijing Institute of Technology
  • Dongjue Wang Beijing Institute of Technology
  • Jing Yu Minzu University of China
  • Mohan Wang Beijing Institute of Technology
  • Liehuang Zhu Beijing Institute of Technology
  • Qi Wu The University of Adelaide

DOI:

https://doi.org/10.1609/aaai.v39i3.32313

Abstract

Multi-modal Federated Learning (MFL) is a distributed machine learning paradigm that enables multiple participants with multi-modal data to collaboratively train a global model for multi-modal tasks without sharing their local data. MFL typically deploys the trained global model as an Embedding-as-a-Service (EaaS), allowing participants to obtain embeddings for downstream tasks. However, it increases the risk of unauthorized copying and leakage of the model. Protecting the ownership of the MFL model while maintaining model performance is challenging. In this paper, we propose the first general model ownership protection framework for MFL, named MFL-Owner. MFL-Owner decouples the watermarking process from the model training process and addresses both ownership verification and traceability, effectively safeguarding the interests of the MFL collective. MFL-Owner leverages the concept of orthogonal transformations by incorporating a linear transformation matrix with orthogonal constraints into the model, achieving high-quality ownership verification and traceability with minimal impact on model performance. To enhance the practicality of the watermark and prevent conflicts among multiple clients during tracing, we propose a trigger dataset selection method based on out-of-distribution data combined with Gaussian noise perturbation. Our experiments on multiple datasets demonstrate that MFL-Owner is effective for model ownership verification and traceability for MFL.

Downloads

Published

2025-04-11

How to Cite

Gai, K., Wang, D., Yu, J., Wang, M., Zhu, L., & Wu, Q. (2025). MFL-Owner: Ownership Protection for Multi-modal Federated Learning via Orthogonal Transform Watermark. Proceedings of the AAAI Conference on Artificial Intelligence, 39(3), 3049–3058. https://doi.org/10.1609/aaai.v39i3.32313

Issue

Section

AAAI Technical Track on Computer Vision II