Siamese BERT-Based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

Matěj Kocián; Jakub Náplava; Daniel Štancl; Vladimír Kadlec

doi:10.1609/aaai.v36i11.21502

Siamese BERT-Based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

Authors

Matěj Kocián Seznam.cz
Jakub Náplava Seznam.cz
Daniel Štancl Seznam.cz
Vladimír Kadlec Seznam.cz

DOI:

https://doi.org/10.1609/aaai.v36i11.21502

Keywords:

Information Retrieval, Relevance Ranking, Web Search Engine, Dataset, Siamese Transformers

Abstract

Web search engines focus on serving highly relevant results within hundreds of milliseconds. Pre-trained language transformer models such as BERT are therefore hard to use in this scenario due to their high computational demands. We present our real-time approach to the document ranking problem leveraging a BERT-based siamese architecture. The model is already deployed in a commercial search engine and it improves production performance by more than 3%. For further research and evaluation, we release DaReCzech, a unique data set of 1.6 million Czech user query-document pairs with manually assigned relevance levels. We also release Small-E-Czech, an Electra-small language model pre-trained on a large Czech corpus. We believe this data will support endeavours both of search relevance and multilingual-focused research communities.

Downloads

Published

2022-06-28

How to Cite

Kocián, M., Náplava, J., Štancl, D., & Kadlec, V. (2022). Siamese BERT-Based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset. Proceedings of the AAAI Conference on Artificial Intelligence, 36(11), 12369-12377. https://doi.org/10.1609/aaai.v36i11.21502

Download Citation

Issue

Vol. 36 No. 11: IAAI-22, EAAI-22, AAAI-22 Special Programs and Special Track, Student Papers and Demonstrations

Section

IAAI Technical Track on Highly Innovative Applications of AI

Siamese BERT-Based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription