Leaf-Smoothed Hierarchical Softmax for Ordinal Prediction


  • Wesley Tansey Columbia University
  • Karl Pichotta The University of Texas at Austin
  • James Scott The University of Texas at Austin


density estimation, deep learning, neural networks


We propose a new approach to conditional probability estimation for ordinal labels. First, we present a specialized hierarchical softmax variant inspired by k-d trees that leverages the inherent spatial structure of (potentially-multivariate) ordinal labels. We then adapt ideas from signal processing on noisy graphs to develop a novel regularizer for such hierarchical softmax models. Both our tree structure and regularizer independently boost the sample efficiency of a deep learning model across a series of simulation studies. Furthermore, the combination of these two techniques produces additive gains and the model does not suffer from the pathologies of other approaches in the literature. We validate our approach empirically on a suite of real-world datasets, in some cases reducing the error by nearly half in comparison to other popular methods in the literature. Our results demonstrate that our method is a powerful new modeling technique for conditional probability estimation of ordinal labels, especially in the low-to-mid sample size regimes such as those often found in biological and other physical sciences.




How to Cite

Tansey, W., Pichotta, K., & Scott, J. (2018). Leaf-Smoothed Hierarchical Softmax for Ordinal Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/11754