Suchen und Finden
Preface
6
Acknowledgments
12
Contents
14
Symbols
20
1 Introduction
21
1.1 Decision Functions
22
1.1.1 Decision Functions for Two-Class Problems
22
1.1.2 Decision Functions for Multiclass Problems
24
1.2 Determination of Decision Functions
28
1.3 Data Sets Used in the Book
29
1.4 Classifier Evaluation
33
References
36
2 Two-Class Support Vector Machines
40
2.1 Hard-Margin Support Vector Machines
40
2.2 L1 Soft-Margin Support Vector Machines
47
2.3 Mapping to a High-Dimensional Space
50
2.3.1 Kernel Tricks
50
2.3.2 Kernels
52
2.3.3 Normalizing Kernels
62
2.3.4 Properties of Mapping Functions Associated with Kernels
63
2.3.5 Implicit Bias Terms
66
2.3.6 Empirical Feature Space
69
2.4 L2 Soft-Margin Support Vector Machines
75
2.5 Advantages and Disadvantages
77
2.5.1 Advantages
77
2.5.2 Disadvantages
78
2.6 Characteristics of Solutions
79
2.6.1 Hessian Matrix
79
2.6.2 Dependence of Solutions on C
81
2.6.3 Equivalence of L1 and L2 Support Vector Machines
86
2.6.4 Nonunique Solutions
89
2.6.5 Reducing the Number of Support Vectors
97
2.6.6 Degenerate Solutions
100
2.6.7 Duplicate Copies of Data
102
2.6.8 Imbalanced Data
104
2.6.9 Classification for the Blood Cell Data
104
2.7 Class Boundaries for Different Kernels
107
2.8 Developing Classifiers
112
2.8.1 Model Selection
112
2.8.2 Estimating Generalization Errors
112
2.8.3 Sophistication of Model Selection
116
2.8.4 Effect of Model Selection by Cross-Validation
117
2.9 Invariance for Linear Transformation
121
References
125
3 Multiclass Support Vector Machines
132
3.1 One-Against-All Support Vector Machines
133
3.1.1 Conventional Support Vector Machines
133
3.1.2 Fuzzy Support Vector Machines
135
3.1.3 Equivalence of Fuzzy Support Vector Machines and Support Vector Machines with Continuous Decision Functions
138
3.1.4 Decision-Tree-Based Support Vector Machines
141
3.2 Pairwise Support Vector Machines
146
3.2.1 Conventional Support Vector Machines
146
3.2.2 Fuzzy Support Vector Machines
147
3.2.3 Performance Comparison of Fuzzy Support Vector Machines
148
3.2.4 Cluster-Based Support Vector Machines
151
3.2.5 Decision-Tree-Based Support Vector Machines
152
3.2.6 Pairwise Classification with Correcting Classifiers
162
3.3 Error-Correcting Output Codes
163
3.3.1 Output Coding by Error-Correcting Codes
164
3.3.2 Unified Scheme for Output Coding
165
3.3.3 Equivalence of ECOC with Membership Functions
166
3.3.4 Performance Evaluation
166
3.4 All-at-Once Support Vector Machines
168
3.5 Comparisons of Architectures
171
3.5.1 One-Against-All Support Vector Machines
171
3.5.2 Pairwise Support Vector Machines
171
3.5.3 ECOC Support Vector Machines
172
3.5.4 All-at-Once Support Vector Machines
172
3.5.5 Training Difficulty
172
3.5.6 Training Time Comparison
176
References
177
4 Variants of Support Vector Machines
181
4.1 Least-Squares Support Vector Machines
181
4.1.1 Two-Class Least-Squares Support Vector Machines
182
4.1.2 One-Against-All Least-Squares Support Vector Machines
184
4.1.3 Pairwise Least-Squares Support Vector Machines
186
4.1.4 All-at-Once Least-Squares Support Vector Machines
187
4.1.5 Performance Comparison
188
4.2 Linear Programming Support Vector Machines
192
4.2.1 Architecture
193
4.2.2 Performance Evaluation
196
4.3 Sparse Support Vector Machines
198
4.3.1 Several Approaches for Sparse SupportVector Machines
199
4.3.2 Idea
201
4.3.3 Support Vector Machines Trained in the Empirical Feature Space
202
4.3.4 Selection of Linearly Independent Data
205
4.3.5 Performance Evaluation
207
4.4 Performance Comparison of Different Classifiers
210
4.5 Robust Support Vector Machines
214
4.6 Bayesian Support Vector Machines
215
4.6.1 One-Dimensional Bayesian Decision Functions
217
4.6.2 Parallel Displacement of a Hyperplane
218
4.6.3 Normal Test
219
4.7 Incremental Training
219
4.7.1 Overview
219
4.7.2 Incremental Training Using Hyperspheres
222
4.8 Learning Using Privileged Information
231
4.9 Semi-Supervised Learning
234
4.10 Multiple Classifier Systems
235
4.11 Multiple Kernel Learning
236
4.12 Confidence Level
237
4.13 Visualization
238
References
238
5 Training Methods
245
5.1 Preselecting Support Vector Candidates
245
5.1.1 Approximation of Boundary Data
246
5.1.2 Performance Evaluation
248
5.2 Decomposition Techniques
249
5.3 KKT Conditions Revisited
252
5.4 Overview of Training Methods
257
5.5 Primal--Dual Interior-Point Methods
260
5.5.1 Primal--Dual Interior-Point Methods for Linear Programming
260
5.5.2 Primal--Dual Interior-Point Methods for Quadratic Programming
264
5.5.3 Performance Evaluation
266
5.6 Steepest Ascent Methods and Newton's Methods
270
5.6.1 Solving Quadratic Programming Problems Without Constraints
270
5.6.2 Training of L1 Soft-Margin Support Vector Machines
272
5.6.3 Sequential Minimal Optimization
277
5.6.4 Training of L2 Soft-Margin Support Vector Machines
278
5.6.5 Performance Evaluation
279
5.7 Batch Training by Exact Incremental Training
280
5.7.1 KKT Conditions
281
5.7.2 Training by Solving a Set of Linear Equations
282
5.7.3 Performance Evaluation
290
5.8 Active Set Training in Primal and Dual
291
5.8.1 Training Support Vector Machines in the Primal
291
5.8.2 Comparison of Training Support Vector Machines in the Primal and the Dual
294
5.8.3 Performance Evaluation
297
5.9 Training of Linear Programming Support Vector Machines
299
5.9.1 Decomposition Techniques
300
5.9.2 Decomposition Techniques for Linear Programming Support Vector Machines
307
5.9.3 Computer Experiments
315
References
317
6 Kernel-Based Methods
322
6.1 Kernel Least Squares
322
6.1.1 Algorithm
322
6.1.2 Performance Evaluation
325
6.2 Kernel Principal Component Analysis
328
6.3 Kernel Mahalanobis Distance
331
6.3.1 SVD-Based Kernel Mahalanobis Distance
332
6.3.2 KPCA-Based Mahalanobis Distance
335
6.4 Principal Component Analysis in the EmpiricalFeature Space
336
6.5 Kernel Discriminant Analysis
337
6.5.1 Kernel Discriminant Analysis for Two-Class Problems
338
6.5.2 Linear Discriminant Analysis for Two-Class Problems in the Empirical Feature Space
341
6.5.3 Kernel Discriminant Analysis for Multiclass Problems
342
References
344
7 Feature Selection and Extraction
347
7.1 Selecting an Initial Set of Features
347
7.2 Procedure for Feature Selection
348
7.3 Feature Selection Using Support Vector Machines
349
7.3.1 Backward or Forward Feature Selection
349
7.3.2 Support Vector Machine-Based Feature Selection
352
7.3.3 Feature Selection by Cross-Validation
353
7.4 Feature Extraction
355
References
356
8 Clustering
358
8.1 Domain Description
358
8.2 Extension to Clustering
364
References
366
9 Maximum-Margin Multilayer Neural Networks
368
9.1 Approach
368
9.2 Three-Layer Neural Networks
369
9.3 CARVE Algorithm
372
9.4 Determination of Hidden-Layer Hyperplanes
373
9.4.1 Rotation of Hyperplanes
374
9.4.2 Training Algorithm
377
9.5 Determination of Output-Layer Hyperplanes
378
9.6 Determination of Parameter Values
378
9.7 Performance Evaluation
379
References
380
10 Maximum-Margin Fuzzy Classifiers
382
10.1 Kernel Fuzzy Classifiers with Ellipsoidal Regions
383
10.1.1 Conventional Fuzzy Classifiers withEllipsoidal Regions
383
10.1.2 Extension to a Feature Space
384
10.1.3 Transductive Training
385
10.1.4 Maximizing Margins
390
10.1.5 Performance Evaluation
393
10.2 Fuzzy Classifiers with Polyhedral Regions
397
10.2.1 Training Methods
398
10.2.2 Performance Evaluation
406
References
408
11 Function Approximation
410
11.1 Optimal Hyperplanes
410
11.2 L1 Soft-Margin Support Vector Regressors
414
11.3 L2 Soft-Margin Support Vector Regressors
416
11.4 Model Selection
418
11.5 Training Methods
418
11.5.1 Overview
418
11.5.2 Newton's Methods
420
11.5.3 Active Set Training
437
11.6 Variants of Support Vector Regressors
444
11.6.1 Linear Programming Support Vector Regressors
445
11.6.2 -Support Vector Regressors
446
11.6.3 Least-Squares Support Vector Regressors
447
11.7 Variable Selection
450
11.7.1 Overview
450
11.7.2 Variable Selection by Block Deletion
451
11.7.3 Performance Evaluation
452
References
453
A Conventional Classifiers
458
A.1 Bayesian Classifiers
458
A.2 Nearest-Neighbor Classifiers
459
References
460
B Matrices
462
B.1 Matrix Properties
462
B.2 Least-Squares Methods and Singular Value Decomposition
464
B.3 Covariance Matrices
467
References
469
C Quadratic Programming
470
C.1 Optimality Conditions
470
C.2 Properties of Solutions
471
D Positive Semidefinite Kernels and Reproducing Kernel Hilbert Space
474
D.1 Positive Semidefinite Kernels
474
D.2 Reproducing Kernel Hilbert Space
478
References
480
Index
482
Alle Preise verstehen sich inklusive der gesetzlichen MwSt.