MetaTrader: Learning to Generalize RL Trading Policies Beyond Offline Data

Haochen Yuan; Minting Pan; Yunbo Wang; Siyu Gao; Xiaokang Yang

doi:10.1609/aaai.v40i33.40027

Authors

Haochen Yuan Shanghai Jiao Tong University
Minting Pan Shanghai Jiao Tong University
Yunbo Wang Shanghai Jiao Tong University
Siyu Gao China International Capital Corporation Limited
Xiaokang Yang Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v40i33.40027

Abstract

Reinforcement learning (RL) has shown significant promise in sequential portfolio optimization. A typical solution involves optimizing cumulative returns using historical offline data. However, it may produce less generalizable policies that merely ''memorize'' optimal buying and selling actions from the offline data while neglecting the non-stationary nature of the financial market. We frame portfolio optimization of stock data as a specific type of offline RL problem. Our method, MetaTrader, presents two key contributions. First, it introduces a novel bilevel RL algorithm that operates on both the original stock data and its transformations. The core idea is that a robust policy should generalize effectively to out-of-distribution data. Second, we propose a new temporal difference (TD) method that leverages a transformation-based conservative TD target to address value overestimation under limited offline data. Empirical results on two publicly available datasets demonstrate that MetaTrader outperforms existing methods, including both traditional stock prediction models and RL-based trading approaches.

MetaTrader: Learning to Generalize RL Trading Policies Beyond Offline Data

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information