Planning and Learning for Decentralized MDPs With Event Driven Rewards

Authors

  • Tarun Gupta International Institute of Information Technology, Hyderabad
  • Akshat Kumar Singapore Management University
  • Praveen Paruchuri International Institute of Information Technology, Hyderabad

Keywords:

Multiagent Planning, Coordination and Collaboration, Probabilistic Planning, Sequential Decision Making

Abstract

Decentralized (PO)MDPs provide a rigorous framework for sequential multiagent decision making under uncertainty. However, their high computational complexity limits the practical impact. To address scalability and real-world impact, we focus on settings where a large number of agents primarily interact through complex joint-rewards that depend on their entire histories of states and actions. Such history-based rewards encapsulate the notion of events or tasks such that the team reward is given only when the joint-task is completed. Algorithmically, we contribute---1) A nonlinear programming (NLP) formulation for such event-based planning model; 2) A probabilistic inference based approach that scales much better than NLP solvers for a large number of agents; 3) A policy gradient based multiagent reinforcement learning approach that scales well even for exponential state-spaces. Our inference and RL-based advances enable us to solve a large real-world multiagent coverage problem modeling schedule coordination of agents in a real urban subway network where other approaches fail to scale.

Downloads

Published

2018-04-26

How to Cite

Gupta, T., Kumar, A., & Paruchuri, P. (2018). Planning and Learning for Decentralized MDPs With Event Driven Rewards. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/12096