UCPO: A Universal Constrained Combinatorial Optimization Method via Preference Optimization

Authors

  • Zhanhong Fang SUN YAT-SEN UNIVERSITY
  • Debing Wang SUN YAT-SEN UNIVERSITY
  • Jinbiao Chen NATIONAL UNIVERSITY OF SINGAPORE
  • Jiahai Wang SUN YAT-SEN UNIVERSITY
  • Zizhen Zhang SUN YAT-SEN UNIVERSITY

DOI:

https://doi.org/10.1609/aaai.v40i43.41017

Abstract

Neural solvers have demonstrated remarkable success in combinatorial optimization, often surpassing traditional heuristics in speed, solution quality, and generalization. However, their efficacy deteriorates significantly when confronted with complex constraints that cannot be effectively managed through simple masking mechanisms. To address this limitation, we introduce Universal Constrained Preference Optimization (UCPO), a novel plug-and-play framework that seamlessly integrates preference learning into existing neural solvers via a specially designed loss function, without requiring architectural modifications. UCPO embeds constraint satisfaction directly into a preference-based objective, eliminating the need for meticulous hyperparameter tuning. Leveraging a lightweight warm-start fine-tuning protocol, UCPO enables pre-trained models to consistently produce near-optimal, feasible solutions on challenging constraint-laden tasks, achieving exceptional performance with as little as 1% of the original training budget.

Downloads

Published

2026-03-14

How to Cite

Fang, Z., Wang, D., Chen, J., Wang, J., & Zhang, Z. (2026). UCPO: A Universal Constrained Combinatorial Optimization Method via Preference Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(43), 36900–36908. https://doi.org/10.1609/aaai.v40i43.41017

Issue

Section

AAAI Technical Track on Search and Optimization