Federated Causally Invariant Feature Learning

Xianjie Guo; Kui Yu; Lizhen Cui; Han Yu; Xiaoxiao Li

doi:10.1609/aaai.v39i16.33866

Authors

Xianjie Guo Hefei University of Technology, China Key Laboratory of Knowledge Engineering with Big Data of Ministry of Education, China
Kui Yu Hefei University of Technology, China Key Laboratory of Knowledge Engineering with Big Data of Ministry of Education, China
Lizhen Cui Shandong University, China
Han Yu Nanyang Technological University, Singapore
Xiaoxiao Li Nanyang Technological University, Singapore The University of British Columbia, Canada Vector Institute, Canada

DOI:

https://doi.org/10.1609/aaai.v39i16.33866

Abstract

Federated feature selection (FFS) is a promising field for selecting informative features while preserving data privacy in federated learning (FL) settings. Existing FFS methods focus on capturing the correlations between features and labels. They struggle to achieve satisfactory performance in the face of data distribution heterogeneity among FL clients, and cannot address the out-of-distribution (OOD) problem that arises when a significant portion of clients do not actively participate in FL training. To address these limitations, we propose Federated Causally Invariant Feature Learning (FedCIFL), a novel approach for learning causally invariant features in a privacy-preserving manner. We design a sample reweighting strategy to eliminate spurious correlations introduced by selection bias and iteratively estimate the federated causal effect between each feature and the labels (with the remaining features initially treated as confounders). By iteratively refining the confounding feature set to identify the true confounders, FedCIFL mitigates the impact of limited local data on the accuracy of federated causal effect estimation. Theoretical analysis proves the correctness of FedCIFL under reasonable assumptions. Extensive experiments on synthetic and real-world datasets demonstrate the superiority of FedCIFL against eight state-of-the-art baselines, beating the best-performing approach by 3.19%, 9.07% and 2.65% in terms of average test Accuracy, RMSE and F1 score, respectively. It is a first-of-its-kind FFS approach capable of handling Non-IID and OOD data simultaneously. The source code is available at https://github.com/Xianjie-Guo/FedCIFL.

Federated Causally Invariant Feature Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information