Cross Media Entity Extraction and Linkage for Chemical Documents

Authors

  • Su Yan IBM Almaden Research Lab
  • Scott Spangler IBM Almaden Research Lab
  • Ying Chen IBM Almaden Research Lab

DOI:

https://doi.org/10.1609/aaai.v25i1.7832

Abstract

Text and images are two major sources of information in scientific literature. Information from these two media typically reinforce and complement each other, thus simplifying the process for human to extract and comprehend information. However, machines cannot create the links or have the semantic understanding between images and text. We propose to integrate text analysis and image processing techniques to bridge the gap between the two media, and discover knowledge from the combined information sources, which would be otherwise lost by traditional single-media based mining systems. The focus is on the chemical entity extraction task because images are well known to add value to the textual content in chemical literature. Annotation of US chemical patent documents demonstrates the effectiveness of our proposal.

Downloads

Published

2011-08-04

How to Cite

Yan, S., Spangler, S., & Chen, Y. (2011). Cross Media Entity Extraction and Linkage for Chemical Documents. Proceedings of the AAAI Conference on Artificial Intelligence, 25(1), 1455–1460. https://doi.org/10.1609/aaai.v25i1.7832

Issue

Section

Special Track on Integrated Intelligence