PC-Flow: Preference Alignment in Flow Matching via Classifier
DOI:
https://doi.org/10.1609/aaai.v40i12.37971Abstract
Flow Matching (FM) is an efficient generative modeling framework, but aligning it with human preferences remains underexplored.~Although applying Direct Preference Optimization (DPO) to diffusion models has yielded improvements, directly extending DPO-like methods to FM poses three challenges: 1) Incompatibility with ODE-based models, 2) Heavy computational cost from full model fine-tuning, and 3) Reliance on reference model quality. To address these limitations, we propose Preference Classifier for Flow Matching (PC-Flow), a novel reference-free preference alignment framework. Specifically, we reinterpret FM’s deterministic ODE as an equivalent SDE to enable DPO-style learning. Then, we introduce a lightweight classifier to model relative preferences exclusively. This approach decouples alignment from the generative model, eliminating the need for costly fine-tuning or a reference model. Theoretically, PC-Flow guarantees consistent preference-guided distribution evolution, achieves a DPO-equivalent objective without a reference model, and progressively steers generation toward preferred outputs. Experiments show that PC-Flow achieves DPO-level alignment with significantly lower training costs.Published
2026-03-14
How to Cite
Wang, S., Wang, H., Dai, L., & Tang, J. (2026). PC-Flow: Preference Alignment in Flow Matching via Classifier. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 10047–10055. https://doi.org/10.1609/aaai.v40i12.37971
Issue
Section
AAAI Technical Track on Computer Vision IX