Deep Reinforcement Learning for Scalable Offline Three-Dimensional Packing
DOI:
https://doi.org/10.1609/aaai.v40i33.40009Abstract
With the increasing number of items requiring handling simultaneously in complex logistics, offline three-dimensional packing methods need to plan larger numbers of items. Existing deep reinforcement learning (DRL)-based packing methods cannot plan for large numbers of items while keeping high-quality solutions due to limited exploration space and high computational complexity. To address this issue, this paper proposes a scalable DRL-based packing method. An attention-based pack-Q-network (PQNet) is constructed to learn the optimal packing policy by integrating unpacked items, available spaces, and packed items. To expand the valid exploration space, a bidding-based multi-policy (BBMP) framework composed of multiple PQNets is designed to efficiently explore more latent valid solutions, thus enhancing solution quality. To reduce computational complexity, a training-free dynamic candidate selection (DCS) framework is proposed to incorporate comprehensive item information during execution with minimal computation overhead, which helps in effectively planning large numbers of items. Experimental results show that across item numbers of 20~1000, our method consistently outperforms the best-performing baseline at each tested scale by 3.2%~13.1% in space utilization.Published
2026-03-14
How to Cite
Yin, H., He, H., & Chen, F. (2026). Deep Reinforcement Learning for Scalable Offline Three-Dimensional Packing. Proceedings of the AAAI Conference on Artificial Intelligence, 40(33), 27861–27869. https://doi.org/10.1609/aaai.v40i33.40009
Issue
Section
AAAI Technical Track on Machine Learning X