TY - JOUR AU - Li, Xiang AU - Wang, Shusen AU - Zhang, Zhihua PY - 2020/04/03 Y2 - 2024/03/28 TI - Do Subsampled Newton Methods Work for High-Dimensional Data? JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 34 IS - 04 SE - AAAI Technical Track: Machine Learning DO - 10.1609/aaai.v34i04.5905 UR - https://ojs.aaai.org/index.php/AAAI/article/view/5905 SP - 4723-4730 AB - <p>Subsampled Newton methods approximate Hessian matrices through subsampling techniques to alleviate the per-iteration cost. Previous results require Ω (<em>d</em>) samples to approximate Hessians, where <em>d</em> is the dimension of data points, making it less practical for high-dimensional data. The situation is deteriorated when <em>d</em> is comparably as large as the number of data points <em>n</em>, which requires to take the whole dataset into account, making subsampling not useful. This paper theoretically justifies the effectiveness of subsampled Newton methods on strongly convex empirical risk minimization with high dimensional data. Specifically, we provably require only Θ˜(<em>d</em><sub>eff</sub><sup><em>γ</em></sup>) samples for approximating the Hessian matrices, where <em>d</em><sub>eff</sub><sup><em>γ</em></sup> is the γ-ridge leverage and can be much smaller than <em>d</em> as long as <em>nγ</em> ≫ 1. Our theories work for three types of Newton methods: subsampled Netwon, distributed Newton, and proximal Newton.</p> ER -