Designing Biological Sequences without Prior Knowledge Using Evolutionary Reinforcement Learning

Authors

  • Xi Zeng School of Computer Science, Northwestern Polytechnical University
  • Xiaotian Hao College of Intelligence and Computing, Tianjin University
  • Hongyao Tang College of Intelligence and Computing, Tianjin University
  • Zhentao Tang Noah’s Ark Lab, Huawei
  • Shaoqing Jiao School of Computer Science, Northwestern Polytechnical University
  • Dazhi Lu School of Computer Science, Northwestern Polytechnical University
  • Jiajie Peng School of Computer Science, Northwestern Polytechnical University Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology School of Computer Science, Research and Development Institute of Northwestern Polytechnical University in Shenzhen

DOI:

https://doi.org/10.1609/aaai.v38i1.27792

Keywords:

APP: Natural Sciences

Abstract

Designing novel biological sequences with desired properties is a significant challenge in biological science because of the extra large search space. The traditional design process usually involves multiple rounds of costly wet lab evaluations. To reduce the need for expensive wet lab experiments, machine learning methods are used to aid in designing biological sequences. However, the limited availability of biological sequences with known properties hinders the training of machine learning models, significantly restricting their applicability and performance. To fill this gap, we present ERLBioSeq, an Evolutionary Reinforcement Learning algorithm for BIOlogical SEQuence design. ERLBioSeq leverages the capability of reinforcement learning to learn without prior knowledge and the potential of evolutionary algorithms to enhance the exploration of reinforcement learning in the large search space of biological sequences. Additionally, to enhance the efficiency of biological sequence design, we developed a predictor for sequence screening in the biological sequence design process, which incorporates both the local and global sequence information. We evaluated the proposed method on three main types of biological sequence design tasks, including the design of DNA, RNA, and protein. The results demonstrate that the proposed method achieves significant improvement compared to the existing state-of-the-art methods.

Published

2024-03-25

How to Cite

Zeng, X., Hao, X., Tang, H., Tang, Z., Jiao, S., Lu, D., & Peng, J. (2024). Designing Biological Sequences without Prior Knowledge Using Evolutionary Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(1), 383-391. https://doi.org/10.1609/aaai.v38i1.27792

Issue

Section

AAAI Technical Track on Application Domains