Sparse Learning for Stochastic Composite Optimization
DOI:
https://doi.org/10.1609/aaai.v28i1.8844Keywords:
Stochastic Optimization, Online Learning, Composite Gradient Mapping, Stochastic Gradient DescentAbstract
In this paper, we focus on Stochastic Composite Optimization (SCO) for sparse learning that aims to learn a sparse solution. Although many SCO algorithms have been developed for sparse learning with an optimal convergence rate $O(1/T)$, they often fail to deliver sparse solutions at the end either because of the limited sparsity regularization during stochastic optimization or due to the limitation in online-to-batch conversion. To improve the sparsity of solutions obtained by SCO, we propose a simple but effective stochastic optimization scheme that adds a novel sparse online-to-batch conversion to the traditional SCO algorithms. The theoretical analysis shows that our scheme can find a solution with better sparse patterns without affecting the convergence rate. Experimental results on both synthetic and real-world data sets show that the proposed methods are more effective in recovering the sparse solution and have comparable convergence rate as the state-of-the-art SCO algorithms for sparse learning.