Controllable Protein Sequence Generation with LLM Preference Optimization

Authors

  • Xiangyu Liu State Key Laboratory for Novel Software Technology, Nanjing University, China
  • Yi Liu State Key Laboratory for Novel Software Technology, Nanjing University, China
  • Silei Chen Medical School, Nanjing University, China National Institute of Healthcare Data Science, Nanjing University, China
  • Wei Hu State Key Laboratory for Novel Software Technology, Nanjing University, China National Institute of Healthcare Data Science, Nanjing University, China

DOI:

https://doi.org/10.1609/aaai.v39i1.32030

Abstract

Designing proteins with specific attributes offers an important solution to address biomedical challenges. Pre-trained protein large language models (LLMs) have shown promising results on protein sequence generation. However, to control sequence generation for specific attributes, existing work still exhibits poor functionality and structural stability. In this paper, we propose a novel controllable protein design method called CtrlProt. We finetune a protein LLM with a new multi-listwise preference optimization strategy to improve generation quality and support multi-attribute controllable generation. Experiments demonstrate that CtrlProt can meet functionality and structural stability requirements effectively, achieving state-of-the-art performance in both single-attribute and multi-attribute protein sequence generation.

Downloads

Published

2025-04-11

How to Cite

Liu, X., Liu, Y., Chen, S., & Hu, W. (2025). Controllable Protein Sequence Generation with LLM Preference Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 39(1), 505-513. https://doi.org/10.1609/aaai.v39i1.32030

Issue

Section

AAAI Technical Track on Application Domains