Support Vector Machines for Pattern Classification

von: Shigeo Abe

Springer-Verlag, 2010

ISBN: 9781849960984 , 473 Seiten

2. Auflage

Format: PDF, OL

Kopierschutz: Wasserzeichen

Windows PC,Mac OSX geeignet für alle DRM-fähigen eReader Apple iPad, Android Tablet PC's Online-Lesen für: Windows PC,Mac OSX,Linux

Preis: 149,79 EUR

Mehr zum Inhalt

Support Vector Machines for Pattern Classification


 

Preface

6

Acknowledgments

12

Contents

14

Symbols

20

1 Introduction

21

1.1 Decision Functions

22

1.1.1 Decision Functions for Two-Class Problems

22

1.1.2 Decision Functions for Multiclass Problems

24

1.2 Determination of Decision Functions

28

1.3 Data Sets Used in the Book

29

1.4 Classifier Evaluation

33

References

36

2 Two-Class Support Vector Machines

40

2.1 Hard-Margin Support Vector Machines

40

2.2 L1 Soft-Margin Support Vector Machines

47

2.3 Mapping to a High-Dimensional Space

50

2.3.1 Kernel Tricks

50

2.3.2 Kernels

52

2.3.3 Normalizing Kernels

62

2.3.4 Properties of Mapping Functions Associated with Kernels

63

2.3.5 Implicit Bias Terms

66

2.3.6 Empirical Feature Space

69

2.4 L2 Soft-Margin Support Vector Machines

75

2.5 Advantages and Disadvantages

77

2.5.1 Advantages

77

2.5.2 Disadvantages

78

2.6 Characteristics of Solutions

79

2.6.1 Hessian Matrix

79

2.6.2 Dependence of Solutions on C

81

2.6.3 Equivalence of L1 and L2 Support Vector Machines

86

2.6.4 Nonunique Solutions

89

2.6.5 Reducing the Number of Support Vectors

97

2.6.6 Degenerate Solutions

100

2.6.7 Duplicate Copies of Data

102

2.6.8 Imbalanced Data

104

2.6.9 Classification for the Blood Cell Data

104

2.7 Class Boundaries for Different Kernels

107

2.8 Developing Classifiers

112

2.8.1 Model Selection

112

2.8.2 Estimating Generalization Errors

112

2.8.3 Sophistication of Model Selection

116

2.8.4 Effect of Model Selection by Cross-Validation

117

2.9 Invariance for Linear Transformation

121

References

125

3 Multiclass Support Vector Machines

132

3.1 One-Against-All Support Vector Machines

133

3.1.1 Conventional Support Vector Machines

133

3.1.2 Fuzzy Support Vector Machines

135

3.1.3 Equivalence of Fuzzy Support Vector Machines and Support Vector Machines with Continuous Decision Functions

138

3.1.4 Decision-Tree-Based Support Vector Machines

141

3.2 Pairwise Support Vector Machines

146

3.2.1 Conventional Support Vector Machines

146

3.2.2 Fuzzy Support Vector Machines

147

3.2.3 Performance Comparison of Fuzzy Support Vector Machines

148

3.2.4 Cluster-Based Support Vector Machines

151

3.2.5 Decision-Tree-Based Support Vector Machines

152

3.2.6 Pairwise Classification with Correcting Classifiers

162

3.3 Error-Correcting Output Codes

163

3.3.1 Output Coding by Error-Correcting Codes

164

3.3.2 Unified Scheme for Output Coding

165

3.3.3 Equivalence of ECOC with Membership Functions

166

3.3.4 Performance Evaluation

166

3.4 All-at-Once Support Vector Machines

168

3.5 Comparisons of Architectures

171

3.5.1 One-Against-All Support Vector Machines

171

3.5.2 Pairwise Support Vector Machines

171

3.5.3 ECOC Support Vector Machines

172

3.5.4 All-at-Once Support Vector Machines

172

3.5.5 Training Difficulty

172

3.5.6 Training Time Comparison

176

References

177

4 Variants of Support Vector Machines

181

4.1 Least-Squares Support Vector Machines

181

4.1.1 Two-Class Least-Squares Support Vector Machines

182

4.1.2 One-Against-All Least-Squares Support Vector Machines

184

4.1.3 Pairwise Least-Squares Support Vector Machines

186

4.1.4 All-at-Once Least-Squares Support Vector Machines

187

4.1.5 Performance Comparison

188

4.2 Linear Programming Support Vector Machines

192

4.2.1 Architecture

193

4.2.2 Performance Evaluation

196

4.3 Sparse Support Vector Machines

198

4.3.1 Several Approaches for Sparse SupportVector Machines

199

4.3.2 Idea

201

4.3.3 Support Vector Machines Trained in the Empirical Feature Space

202

4.3.4 Selection of Linearly Independent Data

205

4.3.5 Performance Evaluation

207

4.4 Performance Comparison of Different Classifiers

210

4.5 Robust Support Vector Machines

214

4.6 Bayesian Support Vector Machines

215

4.6.1 One-Dimensional Bayesian Decision Functions

217

4.6.2 Parallel Displacement of a Hyperplane

218

4.6.3 Normal Test

219

4.7 Incremental Training

219

4.7.1 Overview

219

4.7.2 Incremental Training Using Hyperspheres

222

4.8 Learning Using Privileged Information

231

4.9 Semi-Supervised Learning

234

4.10 Multiple Classifier Systems

235

4.11 Multiple Kernel Learning

236

4.12 Confidence Level

237

4.13 Visualization

238

References

238

5 Training Methods

245

5.1 Preselecting Support Vector Candidates

245

5.1.1 Approximation of Boundary Data

246

5.1.2 Performance Evaluation

248

5.2 Decomposition Techniques

249

5.3 KKT Conditions Revisited

252

5.4 Overview of Training Methods

257

5.5 Primal--Dual Interior-Point Methods

260

5.5.1 Primal--Dual Interior-Point Methods for Linear Programming

260

5.5.2 Primal--Dual Interior-Point Methods for Quadratic Programming

264

5.5.3 Performance Evaluation

266

5.6 Steepest Ascent Methods and Newton's Methods

270

5.6.1 Solving Quadratic Programming Problems Without Constraints

270

5.6.2 Training of L1 Soft-Margin Support Vector Machines

272

5.6.3 Sequential Minimal Optimization

277

5.6.4 Training of L2 Soft-Margin Support Vector Machines

278

5.6.5 Performance Evaluation

279

5.7 Batch Training by Exact Incremental Training

280

5.7.1 KKT Conditions

281

5.7.2 Training by Solving a Set of Linear Equations

282

5.7.3 Performance Evaluation

290

5.8 Active Set Training in Primal and Dual

291

5.8.1 Training Support Vector Machines in the Primal

291

5.8.2 Comparison of Training Support Vector Machines in the Primal and the Dual

294

5.8.3 Performance Evaluation

297

5.9 Training of Linear Programming Support Vector Machines

299

5.9.1 Decomposition Techniques

300

5.9.2 Decomposition Techniques for Linear Programming Support Vector Machines

307

5.9.3 Computer Experiments

315

References

317

6 Kernel-Based Methods

322

6.1 Kernel Least Squares

322

6.1.1 Algorithm

322

6.1.2 Performance Evaluation

325

6.2 Kernel Principal Component Analysis

328

6.3 Kernel Mahalanobis Distance

331

6.3.1 SVD-Based Kernel Mahalanobis Distance

332

6.3.2 KPCA-Based Mahalanobis Distance

335

6.4 Principal Component Analysis in the EmpiricalFeature Space

336

6.5 Kernel Discriminant Analysis

337

6.5.1 Kernel Discriminant Analysis for Two-Class Problems

338

6.5.2 Linear Discriminant Analysis for Two-Class Problems in the Empirical Feature Space

341

6.5.3 Kernel Discriminant Analysis for Multiclass Problems

342

References

344

7 Feature Selection and Extraction

347

7.1 Selecting an Initial Set of Features

347

7.2 Procedure for Feature Selection

348

7.3 Feature Selection Using Support Vector Machines

349

7.3.1 Backward or Forward Feature Selection

349

7.3.2 Support Vector Machine-Based Feature Selection

352

7.3.3 Feature Selection by Cross-Validation

353

7.4 Feature Extraction

355

References

356

8 Clustering

358

8.1 Domain Description

358

8.2 Extension to Clustering

364

References

366

9 Maximum-Margin Multilayer Neural Networks

368

9.1 Approach

368

9.2 Three-Layer Neural Networks

369

9.3 CARVE Algorithm

372

9.4 Determination of Hidden-Layer Hyperplanes

373

9.4.1 Rotation of Hyperplanes

374

9.4.2 Training Algorithm

377

9.5 Determination of Output-Layer Hyperplanes

378

9.6 Determination of Parameter Values

378

9.7 Performance Evaluation

379

References

380

10 Maximum-Margin Fuzzy Classifiers

382

10.1 Kernel Fuzzy Classifiers with Ellipsoidal Regions

383

10.1.1 Conventional Fuzzy Classifiers withEllipsoidal Regions

383

10.1.2 Extension to a Feature Space

384

10.1.3 Transductive Training

385

10.1.4 Maximizing Margins

390

10.1.5 Performance Evaluation

393

10.2 Fuzzy Classifiers with Polyhedral Regions

397

10.2.1 Training Methods

398

10.2.2 Performance Evaluation

406

References

408

11 Function Approximation

410

11.1 Optimal Hyperplanes

410

11.2 L1 Soft-Margin Support Vector Regressors

414

11.3 L2 Soft-Margin Support Vector Regressors

416

11.4 Model Selection

418

11.5 Training Methods

418

11.5.1 Overview

418

11.5.2 Newton's Methods

420

11.5.3 Active Set Training

437

11.6 Variants of Support Vector Regressors

444

11.6.1 Linear Programming Support Vector Regressors

445

11.6.2 -Support Vector Regressors

446

11.6.3 Least-Squares Support Vector Regressors

447

11.7 Variable Selection

450

11.7.1 Overview

450

11.7.2 Variable Selection by Block Deletion

451

11.7.3 Performance Evaluation

452

References

453

A Conventional Classifiers

458

A.1 Bayesian Classifiers

458

A.2 Nearest-Neighbor Classifiers

459

References

460

B Matrices

462

B.1 Matrix Properties

462

B.2 Least-Squares Methods and Singular Value Decomposition

464

B.3 Covariance Matrices

467

References

469

C Quadratic Programming

470

C.1 Optimality Conditions

470

C.2 Properties of Solutions

471

D Positive Semidefinite Kernels and Reproducing Kernel Hilbert Space

474

D.1 Positive Semidefinite Kernels

474

D.2 Reproducing Kernel Hilbert Space

478

References

480

Index

482