Supervised Discovery of Unknown Unknowns through Test Sample Mining (Student Abstract)
Given a fixed hypothesis space, defined to model class structure in a particular domain of application, unknown unknowns (u.u.s) are data examples that form classes in the feature space whose structure is not represented in a trained model. Accordingly, this leads to incorrect class prediction with high confidence, which represents one of the major sources of blind spots in machine learning. Our method seeks to reduce the structural mismatch between the training model and that of the target space in a supervised way. We illuminate further structure through cross-validation on a modified training model, set up to mine and trap u.u.s in a marginal training class, created from examples of a random sample of the test set. Contrary to previous approaches, our method simplifies the solution, as it does not rely on budgeted queries to an Oracle whose outcomes inform adjustments to training. In addition, our empirically results exhibit consistent performance improvements over baselines, on both synthetic and real-world data sets.