Towards Continuous Scientific Data Analysis and Hypothesis Evolution

Authors

  • Yolanda Gil Information Sciences Institute (University of Southern California)
  • Daniel Garijo Information Sciences Institute (University of Southern California)
  • Varun Ratnakar Information Sciences Institute (University of Southern California)
  • Rajiv Mayani Information Sciences Institute (University of Southern California)
  • Ravali Adusumilli Stanford University School of Medicine
  • Hunter Boyce Stanford University School of Medicine
  • Arunima Srivastava Stanford University School of Medicine
  • Parag Mallick Stanford University School of Medicine

DOI:

https://doi.org/10.1609/aaai.v31i1.11157

Keywords:

automated discovery, hypothesis testing, scientific workflows, hypothesis evolution, provenance

Abstract

Scientific data is continuously generated throughout the world. However, analyses of these data are typically performed exactly once and on a small fragment of recently generated data. Ideally, data analysis would be a continuous process that uses all the data available at the time, and would be automatically re-run and updated when new data appears. We present a framework for automated discovery from data repositories that tests user-provided hypotheses using expert-grade data analysis strategies, and reassesses hypotheses when more data becomes available. Novel contributions of this approach include a framework to trigger new analyses appropriate for the available data through lines of inquiry that support progressive hypothesis evolution, and a representation of hypothesis revisions with provenance records that can be used to inspect the results. We implemented our approach in the DISK framework, and evaluated it using two scenarios from cancer multi-omics: 1) data for new patients becomes available over time, 2) new types of data for the same patients are released. We show that in all scenarios DISK updates the confidence on the original hypotheses as it automatically analyzes new data.

Downloads

Published

2017-02-12

How to Cite

Gil, Y., Garijo, D., Ratnakar, V., Mayani, R., Adusumilli, R., Boyce, H., Srivastava, A., & Mallick, P. (2017). Towards Continuous Scientific Data Analysis and Hypothesis Evolution. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.11157