Jointly optimizing model complexity and data-processing parameters
Jim Garrett, PhD, (Becton Dickinson), jim_garrett@bd.com
Abstract
When predictor selection is applied prior to modeling, and modeling performance is assessed by cross-validation (or most other methods), then that performance estimate will be biased. When the number of predictors outstrips the number of data cases, the bias can be severe, a phenomenon known as selection bias. Fundamentally, a model process is applied--of which fitting the model is only the last step--yet performance estimation does not encompass the entire process. Cross-validation that examines the entire process is free of selection bias, yet such cross-validation presents a challenging optimization problem. I adapt an efficient multiparameter optimization algorithm, Simultaneous Perturbation Stochastic Approximation (``SPSA''), to handle loss functions having both continuous and ordered discrete inputs. SPSA is relatively efficient, handles noisy loss functions, and is unlikely to become trapped in inferior local optima, particularly for noisy loss functions. I demonstra! te how this mixed-input SPSA can jointly optimize data-processing parameters (including feature-selection) and model-complexity parameters, and also provide a type of cross-validation performance estimate that is free of selection bias.