Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic - a Case Study on Media Bias

Timo Spinde; Luyang Lin; Smi Hinterreiter; Isao Echizen

doi:10.1609/icwsm.v19i1.35968

Authors

Timo Spinde National Institute of Informatics
Luyang Lin The Chinese University of Hong Kong
Smi Hinterreiter University of Würzburg
Isao Echizen National Institute of Informatics

DOI:

https://doi.org/10.1609/icwsm.v19i1.35968

Abstract

Defining complex, evolving concepts in academic research and extracting clear taxonomies from many publications is challenging. To streamline systematic reviews and capture shifts in conceptual understanding, we present our ongoing work on TaxoMatic - a framework leveraging Large Language Models (LLMs) to automate definition extraction from academic literature. The framework encompasses data collection, relevance classification to identify papers with definitions, and definition extraction using LLMs. As a first case study, we tested our relevancy evaluation component on 2,398 articles on media bias, a domain particularly rich in varying definitions and sub-concepts. Then, we evaluated our definition extraction component on manually reviewed papers, yielding 123 definitions from 113 relevant articles. Among five tested LLMs, Claude-3-sonnet achieved the highest F1 score (0.381) for relevance classification and demonstrated a median cosine similarity of 0.557 for definition extraction with role prompting. Future directions include improving relevance classification, expanding ground truth datasets, and applying this framework to other domains, potentially enhancing conceptual clarity across disciplines.

Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic - a Case Study on Media Bias

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information