Adaptive Mixing of Auxiliary Losses in Supervised Learning

Authors

  • Durga Sivasubramanian Indian Institute of Technology Bombay Google Research, India
  • Ayush Maheshwari Indian Institute of Technology Bombay
  • Prathosh AP Indian Institute of Science, Bengaluru
  • Pradeep Shenoy Google Research, India
  • Ganesh Ramakrishnan Indian Institute of Technology Bombay

DOI:

https://doi.org/10.1609/aaai.v37i8.26176

Keywords:

ML: Classification and Regression, ML: Meta Learning, ML: Learning on the Edge & Model Compression

Abstract

In many supervised learning scenarios, auxiliary losses are used in order to introduce additional information or constraints into the supervised learning objective. For instance, knowledge distillation aims to mimic outputs of a powerful teacher model; similarly, in rule-based approaches, weak labeling information is provided by labeling functions which may be noisy rule-based approximations to true labels. We tackle the problem of learning to combine these losses in a principled manner. Our proposal, AMAL, uses a bi-level optimization criterion on validation data to learn optimal mixing weights, at an instance-level, over the training data. We describe a meta-learning approach towards solving this bi-level objective, and show how it can be applied to different scenarios in supervised learning. Experiments in a number of knowledge distillation and rule denoising domains show that AMAL provides noticeable gains over competitive baselines in those domains. We empirically analyze our method and share insights into the mechanisms through which it provides performance gains. The code for AMAL is at: https://github.com/durgas16/AMAL.git.

Downloads

Published

2023-06-26

How to Cite

Sivasubramanian, D., Maheshwari, A., AP, P., Shenoy, P., & Ramakrishnan, G. (2023). Adaptive Mixing of Auxiliary Losses in Supervised Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9855-9863. https://doi.org/10.1609/aaai.v37i8.26176

Issue

Section

AAAI Technical Track on Machine Learning III