Linking Educational Resources on Data Science


  • José Luis Ambite USC Information Sciences Institute
  • Jonathan Gordon Vassar College
  • Lily Fierro USC Information Sciences Institute
  • Gully Burns USC Information Sciences Institute
  • Joel Mathew USC Information Sciences Institute



The availability of massive datasets in genetics, neuroimaging, mobile health, and other subfields of biology and medicine promises new insights but also poses significant challenges. To realize the potential of big data in biomedicine, the National Institutes of Health launched the Big Data to Knowledge (BD2K) initiative, funding several centers of excellence in biomedical data analysis and a Training Coordinating Center (TCC) tasked with facilitating online and inperson training of biomedical researchers in data science. A major initiative of the BD2K TCC is to automatically identify, describe, and organize data science training resources available on the Web and provide personalized training paths for users. In this paper, we describe the construction of ERuDIte, the Educational Resource Discovery Index for Data Science, and its release as linked data. ERuDIte contains over 11,000 training resources including courses, video tutorials, conference talks, and other materials. The metadata for these resources is described uniformly using We use machine learning techniques to tag each resource with concepts from the Data Science Education Ontology, which we developed to further describe resource content. Finally, we map references to people and organizations in learning resources to entities in DBpedia, DBLP, and ORCID, embedding our collection in the web of linked data. We hope that ERuDIte will provide a framework to foster open linked educational resources on the Web.




How to Cite

Ambite, J. L., Gordon, J., Fierro, L., Burns, G., & Mathew, J. (2019). Linking Educational Resources on Data Science. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 9404-9409.



IAAI Technical Track: Emerging Papers