Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic - a Case Study on Media Bias
DOI:
https://doi.org/10.1609/icwsm.v19i1.35968Abstract
Defining complex, evolving concepts in academic research and extracting clear taxonomies from many publications is challenging. To streamline systematic reviews and capture shifts in conceptual understanding, we present our ongoing work on TaxoMatic - a framework leveraging Large Language Models (LLMs) to automate definition extraction from academic literature. The framework encompasses data collection, relevance classification to identify papers with definitions, and definition extraction using LLMs. As a first case study, we tested our relevancy evaluation component on 2,398 articles on media bias, a domain particularly rich in varying definitions and sub-concepts. Then, we evaluated our definition extraction component on manually reviewed papers, yielding 123 definitions from 113 relevant articles. Among five tested LLMs, Claude-3-sonnet achieved the highest F1 score (0.381) for relevance classification and demonstrated a median cosine similarity of 0.557 for definition extraction with role prompting. Future directions include improving relevance classification, expanding ground truth datasets, and applying this framework to other domains, potentially enhancing conceptual clarity across disciplines.Downloads
Published
2025-06-07
How to Cite
Spinde, T., Lin, L., Hinterreiter, S., & Echizen, I. (2025). Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic - a Case Study on Media Bias. Proceedings of the International AAAI Conference on Web and Social Media, 19(1), 2660–2667. https://doi.org/10.1609/icwsm.v19i1.35968
Issue
Section
Poster Papers