Beyond Monotonicity: Revisiting Factorization Principles in Multi-Agent Q-Learning

Authors

  • Tianmeng Hu University of Exeter
  • Yongzheng Cui Central South University
  • Rui Tang Central South University
  • Biao Luo Central South University
  • Ke Li University of Exeter

DOI:

https://doi.org/10.1609/aaai.v40i26.39340

Abstract

Value decomposition is a central approach in multi-agent reinforcement learning (MARL), enabling centralized training with decentralized execution by factorizing the global value function into local values. To ensure individual-global-max (IGM) consistency, existing methods either enforce monotonicity constraints, which limit expressive power, or adopt softer surrogates at the cost of algorithmic complexity. In this work, we present a dynamical systems analysis of non-monotonic value decomposition, modeling learning dynamics as continuous-time gradient flow. We prove that, under approximately greedy exploration, all zero-loss equilibria violating IGM consistency are unstable saddle points, while only IGM-consistent solutions are stable attractors of the learning dynamics. Extensive experiments on both synthetic matrix games and challenging MARL benchmarks demonstrate that unconstrained, non-monotonic factorization reliably recovers IGM-optimal solutions and consistently outperforms monotonic baselines. Additionally, we investigate the influence of temporal-difference targets and exploration strategies, providing actionable insights for the design of future value-based MARL algorithms.

Published

2026-03-14

How to Cite

Hu, T., Cui, Y., Tang, R., Luo, B., & Li, K. (2026). Beyond Monotonicity: Revisiting Factorization Principles in Multi-Agent Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(26), 21876–21884. https://doi.org/10.1609/aaai.v40i26.39340

Issue

Section

AAAI Technical Track on Machine Learning III