Efficient Heuristic Search for Optimal Environment Redesign
Given an environment, the utility measure of the agents acting within it, a set of possible environment modifications, and a description of design constraints, the objective of equireward utility maximizing design (ER-UMD) is to find a valid sequence of modifications to apply to the environment in order to maximize agent utility. To efficiently traverse the typically large space of possible design options, we use heuristic search and propose new heuristics, which relax the design process; instead of computing the value achieved by a single modification, we use a dominating modification guaranteed to be at least as beneficial. The proposed technique enables heuristic caching for similar nodes thereby saving computational overhead. We specify sufficient conditions under which our approach is guaranteed to produce admissible estimates, and describe a range of models that comply with these requirements. Also, for models with lifted representations of environment modifications, we provide simple methods to automatically generate dominating modifications. We evaluate our approach on a range of stochastic settings for which our heuristic is admissible. We demonstrate its efficiency by comparing it to a previously suggested heuristic, that employs a relaxation of the environment, and to a compilation from ERUMD to planning.