TY - JOUR AU - Holland, Matthew J. PY - 2021/05/18 Y2 - 2024/03/29 TI - Scaling-Up Robust Gradient Descent Techniques JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 35 IS - 9 SE - AAAI Technical Track on Machine Learning II DO - 10.1609/aaai.v35i9.16940 UR - https://ojs.aaai.org/index.php/AAAI/article/view/16940 SP - 7694-7701 AB - We study a scalable alternative to robust gradient descent (RGD) techniques that can be used when losses and/or gradients can be heavy-tailed, though this will be unknown to the learner. The core technique is simple: instead of trying to robustly aggregate gradients at each step, which is costly and leads to sub-optimal dimension dependence in risk bounds, we choose a candidate which does not diverge too far from the majority of cheap stochastic sub-processes run over partitioned data. This lets us retain the formal strength of RGD methods at a fraction of the cost. ER -