Abstract
High-dimensional data, for which the variables outnumber the sample size, is now commonplace in various fields. The Lasso (L1-penalization) is a very popular method for analyzing such data in regression settings, and has been subject to intensive research. However, in many applications (e.g., genomics), the variables are subject to measurement error. Although known to cause bias in traditional methods, measurement error has not received much attention in the Lasso literature.
In this paper, we first consider the effect of measurement error in the Lasso. We show analytically how measurement error affects the statistical performance of the Lasso, with regards to estimation, prediction, and variable selection. Next, a correction method is considered, and we derive non-asymptotic conditions under which it will select the relevant variables.
Finally, the corrected and the naive (uncorrected) Lasso are compared in simulation experiments. We also discuss methods of estimating the measurement error in microarray experiments.