Document Type Classification in Online Digital Libraries

Authors

  • Cornelia Caragea University of North Texas
  • Jian Wu Pennsylvania State University
  • Sujatha Das Gollapalli Institute for Infocomm Research
  • C. Lee Giles Pennsylvania State University

DOI:

https://doi.org/10.1609/aaai.v30i2.19075

Abstract

Online digital libraries make it easier for researchers to search for scientific information. They have been proven as powerful resources in many data mining, machine learning and information retrieval applications that require high-quality data. The quality of the data highly depends on the accuracy of classifiers that identify the types of documents that are crawled from the Web, e.g., as research papers, slides, books, etc., for appropriate indexing. These classifiers in turn depend on the choice of the feature representation. We propose novel features that result in high-accuracy classifiers for document type classification. Experimental results on several datasets show that our classifiers outperform models that are employed in current systems.

Downloads

Published

2016-02-18

How to Cite

Caragea, C., Wu, J., Gollapalli, S., & Giles, C. (2016). Document Type Classification in Online Digital Libraries. Proceedings of the AAAI Conference on Artificial Intelligence, 30(2), 3997-4002. https://doi.org/10.1609/aaai.v30i2.19075