In analytical chemistry, high dimensionality problems in regression are generally solved using two dimension reduction techniques: projection methods, one of which is PLS or variable selection algorithms, as in lasso. Sparse PLS combines both approaches by adding a variable selection step to the PLS dimension reduction scheme. However, in most existing algorithms, interpretation of the remaining coefficients is usually doubtful.
We conceived a generalization of the classical PLS1 algorithm, i.e. when the response is one-dimensional, the dual-SPLS, aiming at providing sparse coefficients for good interpretation while maintaining accurate predictions. Dual-SPLS is based on the reformulation of the PLS1 problem as a dual L2 norm procedure. Varying the underlying norm introduces regularization aspects in PLS1 algorithm.
Choosing a mix of L1 and L2 norms brings shrinkage in the selection of each PLS component in an analogous way to the lasso procedure, depending on a parameter $\nu$. The method elaborated in dual-SPLS adaptively sets the value of $\nu$ according to the amount of desired shrinkage in a user-friendly manner.
Industrial applications of this algorithm provide accurate predictions while extracting pertinent localization of the important variables.
Moreover, extending the underlying norm to cases with heterogeneous data is straightforward.
We present here some applications (simulated and real industrial cases) of these procedures while using a dedicated toolbox in R: dual.spls package.
|Keywords||Partial least squares, sparsity, regression, dual norm, lasso algorithm, Chemometrics|