DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

Jinjian Liu; Yichuan Wang; Xinxi Lyu; Rulin Shao; Joseph E. Gonzalez; Matei Zaharia; Sewon Min

doi:10.1609/aaai.v40i48.42363

DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

Authors

Jinjian Liu University of California, Berkeley
Yichuan Wang University of California, Berkeley
Xinxi Lyu University of Illinois at Urbana-Champaign
Rulin Shao University of Washington
Joseph E. Gonzalez University of California, Berkeley
Matei Zaharia University of California, Berkeley
Sewon Min University of California, Berkeley

DOI:

https://doi.org/10.1609/aaai.v40i48.42363

Abstract

We present DS SERVE, a framework that transforms large-scale text datasets—comprising half a trillion tokens—into a high-performance neural retrieval system. DS SERVE offers both a web interface and API endpoints, achieving low latency with modest memory overhead on a single node. The framework also supports inference-time tradeoffs between latency, accuracy, and result diversity. We anticipate that DS SERVE will be broadly useful for a range of applications such as large-scale retrieval-augmented generation (RAG), training data attribution, training a search agent, and beyond.

AAAI-26 / IAAI-26 / EAAI-26 Proceedings Cover

Downloads

Published

2026-03-14

How to Cite

Liu, J., Wang, Y., Lyu, X., Shao, R., Gonzalez, J. E., Zaharia, M., & Min, S. (2026). DS SERVE: A Framework for Efficient and Scalable Neural Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41631–41633. https://doi.org/10.1609/aaai.v40i48.42363

Download Citation

Issue

Vol. 40 No. 48: EAAI-26 AI for Education, Model AI Assignments, AAAI-26 Emerging Trends, Doctoral Consortium, Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Demonstration Track

DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information