Nonlinear Pairwise Layer and Its Training for Kernel Learning
Kernel learning is a fundamental technique that has been intensively studied in the past decades. For the complicated practical tasks, the traditional "shallow" kernels (e.g., Gaussian kernel and sigmoid kernel) are not flexible enough to produce satisfactory performance. To address this shortcoming, this paper introduces a nonlinear layer in kernel learning to enhance the model flexibility. This layer is pairwise, which fully considers the coupling information among examples. So our model contains a fixed single mapping layer (i.e. a Gaussian kernel) as well as a nonlinear pairwise layer, thereby achieving better flexibility than the existing kernel structures. Moreover, the proposed structure can be seamlessly embedded to Support Vector Machines (SVM), of which the training process can be formulated as a joint optimization problem including nonlinear function learning and standard SVM optimization. We theoretically prove that the objective function is gradient-Lipschitz continuous, which further guides us how to accelerate the optimization process in a deep kernel architecture. Experimentally, we find that the proposed structure outperforms other state-ofthe-art kernel-based algorithms on various benchmark datasets, and thus the effectiveness of the incorporated pairwise layer with its training approach is demonstrated.