280 Birds With One Stone: Inducing Multilingual Taxonomies From Wikipedia Using Character-Level Classification

Authors

  • Amit Gupta Ecole Polytechnique Fédérale de Lausanne
  • Rémi Lebret Ecole Polytechnique Fédérale de Lausanne
  • Hamza Harkous Ecole Polytechnique Fédérale de Lausanne
  • Karl Aberer Ecole Polytechnique Fédérale de Lausanne

Keywords:

taxonomy induction, multilinguality, Wikipedia

Abstract

We propose a novel fully-automated approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach first leverages the interlanguage links of Wikipedia to automatically construct training datasets for the isa relation in the target language. Character-level classifiers are trained on the constructed datasets, and used in an optimal path discovery framework to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the state-of-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages.

Downloads

Published

2018-04-26

How to Cite

Gupta, A., Lebret, R., Harkous, H., & Aberer, K. (2018). 280 Birds With One Stone: Inducing Multilingual Taxonomies From Wikipedia Using Character-Level Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/11921

Issue

Section

Main Track: NLP and Knowledge Representation