Model selection for svm Our intent works Songcan Chen Feb.8,2012
Model Selection for SVM & Our intent works Songcan Chen Feb. 8, 2012
Outline Model selection for svm Our intent works
Outline • Model Selection for SVM • Our intent works
Introduction to 2 works 1. Model selection for primal SVM [MBB11, MLJ111 2. Selection of Hypothesis Space Selecting the Hypothesis Space for Improving the Generalization ability of Support Vector Machines [AGOR11, IJCNN20111 The Impact of Unlabeled patterns in Rademacher Complexity Theory for Kernel Classifiers [AGOR11, NIPS20111
Introduction to 2 works 1. Model selection for primal SVM [MBB11, MLJ11] 2. Selection of Hypothesis Space • Selecting the Hypothesis Space for Improving the Generalization Ability of Support Vector Machines [AGOR11,IJCNN2011] • The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers [AGOR11,NIPS2011]
1st work Model selection for primal sV [MBB11, MLJ111 IMBBllGregory Moore Charles bergeron Kristin P. Bennett Machine Learning(2011)85: 175-208
1 st work • Model selection for primal SVM [MBB11, MLJ11] [MBB11] Gregory Moore · Charles Bergeron · Kristin P. Bennett, Machine Learning (2011) 85:175–208
Outline Primal svm Model selection 1)Bilevel Program for Cv 2)TWo optimization Methods Implicit EXplicit methods 3)Experiments 4)Conclusions
Outline • Primal SVM • Model selection 1) Bilevel Program for CV 2) Two optimization Methods: Impilicit & Explicit methods 3) Experiments 4) Conclusions
Primal svm Advantages 1) simple to implement, theoretically sound, and easy to customize to different tasks such as classification, regression, ranking and so forth 2)very fast, linear in the number of samples · Difficulty model selection
Primal SVM • Advantages: 1) simple to implement, theoretically sound, and easy to customize to different tasks such as classification, regression, ranking and so forth. 2) very fast, linear in the number of samples • Difficulty model selection
Model selection An often-adopted approach Cross-validation(Cv over a grid Advantage simple and almost universal Weakness high computation exponential in the number of hyperparameters and the number of grid points for each hyperparameter
Model selection An often-adopted approach: Cross-validation (CV) over a grid Advantage: simple and almost universal! Weakness: high computation exponential in the number of hyperparameters and the number of grid points for each hyperparameter
Motivation CV is naturally and precisely formulated as a bilevel program ( BP)shown as follows LEADER outer-level min Y val Bilevel CV Problem model hyperparameters (BCP) weights FOLLOWER inner-level minw Ctm(w, Y)
Motivation • CV is naturally and precisely formulated as a bilevel program (BP) shown as follows. Bilevel CV Problem (BCP)
Bilevel CV Problem(BCP)( BCP for a single validation and training split The outer-level leader problem selects the nyperparameters, to perform well on a validation set The follower problem trains an optimal inner-level model for the given hyperparameters, and returns a weight vector for validation
Bilevel CV Problem (BCP) (1) BCP for a single validation and training split: • The outer-level leader problem selects the hyperparameters, γ, to perform well on a validation set. • The follower problem trains an optimal inner-level model for the given hyperparameters, and returns a weight vector w for validation