DS SERVE: A Framework for Efficient and Scalable Neural Retrieval
DOI:
https://doi.org/10.1609/aaai.v40i48.42363Abstract
We present DS SERVE, a framework that transforms large-scale text datasets—comprising half a trillion tokens—into a high-performance neural retrieval system. DS SERVE offers both a web interface and API endpoints, achieving low latency with modest memory overhead on a single node. The framework also supports inference-time tradeoffs between latency, accuracy, and result diversity. We anticipate that DS SERVE will be broadly useful for a range of applications such as large-scale retrieval-augmented generation (RAG), training data attribution, training a search agent, and beyond.Downloads
Published
2026-03-14
How to Cite
Liu, J., Wang, Y., Lyu, X., Shao, R., Gonzalez, J. E., Zaharia, M., & Min, S. (2026). DS SERVE: A Framework for Efficient and Scalable Neural Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41631–41633. https://doi.org/10.1609/aaai.v40i48.42363