Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization

Jiahao Qiu; Hui Yuan; Jinghong Zhang; Wentao Chen; Huazheng Wang; Mengdi Wang

doi:10.1609/aaai.v38i13.29386

Authors

Jiahao Qiu Princeton University
Hui Yuan Princeton University
Jinghong Zhang University of California San Diego
Wentao Chen MLAB Biosciences Inc
Huazheng Wang Oregon State University
Mengdi Wang Princeton University

DOI:

https://doi.org/10.1609/aaai.v38i13.29386

Keywords:

ML: Online Learning & Bandits, APP: Natural Sciences

Abstract

While modern biotechnologies allow synthesizing new proteins and function measurements at scale, efficiently exploring a protein sequence space and engineering it remains a daunting task due to the vast sequence space of any given protein. Protein engineering is typically conducted through an iterative process of adding mutations to the wild-type or lead sequences, recombination of mutations, and running new rounds of screening. To enhance the efficiency of such a process, we propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model. Under simplified assumptions and a Gaussian Process prior, we provide theoretical analysis and a Bayesian regret bound, demonstrating that the method can efficiently discover a near-optimal design. The full algorithm is compatible with a suite of randomized tree search heuristics, machine learning models, pre-trained embeddings, and bandit techniques. We test various instances of the algorithm across benchmark protein datasets using simulated screens. Experiment results demonstrate that the algorithm is both sample-efficient, diversity-promoting, and able to find top designs using reasonably small mutation counts.

Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information