DetIE: Multilingual Open Information Extraction Inspired by Object Detection

Authors

  • Michael Vasilkovsky Skolkovo Institute of Science and Technology, Moscow, Russia Neuromation OU, Tallinn, Estonia
  • Anton Alekseev St. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences, St. Petersburg, Russia St. Petersburg State University, St. Petersburg, Russia
  • Valentin Malykh Huawei Noah’s Ark lab, Moscow, Russia St. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences, St. Petersburg, Russia Kazan Federal University, Kazan, Russia ISP RAS Research Center for Trusted Artificial Intelligence, Moscow, Russia
  • Ilya Shenbin St. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences, St. Petersburg, Russia
  • Elena Tutubalina HSE University, Moscow, Russia Kazan Federal University, Kazan, Russia Sber AI, Moscow, Russia
  • Dmitriy Salikhov Sber AI, Moscow, Russia
  • Mikhail Stepnov Sber AI, Moscow, Russia
  • Andrey Chertok Sber AI, Moscow, Russia Artificial Intelligence Research Institute, Moscow, Russia
  • Sergey Nikolenko St. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences, St. Petersburg, Russia ISP RAS Research Center for Trusted Artificial Intelligence, Moscow, Russia Neuromation OU, Tallinn, Estonia

DOI:

https://doi.org/10.1609/aaai.v36i10.21393

Keywords:

Speech & Natural Language Processing (SNLP), Machine Learning (ML)

Abstract

State of the art neural methods for open information extraction (OpenIE) usually extract triplets (or tuples) iteratively in an autoregressive or predicate-based manner in order not to produce duplicates. In this work, we propose a different approach to the problem that can be equally or more successful. Namely, we present a novel single-pass method for OpenIE inspired by object detection algorithms from computer vision. We use an order-agnostic loss based on bipartite matching that forces unique predictions and a Transformer-based encoder-only architecture for sequence labeling. The proposed approach is faster and shows superior or similar performance in comparison with state of the art models on standard benchmarks in terms of both quality metrics and inference time. Our model sets the new state of the art performance of 67.7% F1 on CaRB evaluated as OIE2016 while being 3.35x faster at inference than previous state of the art. We also evaluate the multilingual version of our model in the zero-shot setting for two languages and introduce a strategy for generating synthetic multilingual data to fine-tune the model for each specific language. In this setting, we show performance improvement of 15% on multilingual Re-OIE2016, reaching 75% F1 for both Portuguese and Spanish languages. Code and models are available at https://github.com/sberbank-ai/DetIE.

Downloads

Published

2022-06-28

How to Cite

Vasilkovsky, M., Alekseev, A., Malykh, V., Shenbin, I., Tutubalina, E., Salikhov, D., Stepnov, M., Chertok, A., & Nikolenko, S. (2022). DetIE: Multilingual Open Information Extraction Inspired by Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10), 11412-11420. https://doi.org/10.1609/aaai.v36i10.21393

Issue

Section

AAAI Technical Track on Speech and Natural Language Processing