Support Vector Machines
were invented by Vladimir
Vapnik. They are a method for creating functions from a set of labeled
training data. The function can be a classification function (the output is
binary: is the input in a category) or the function can be a general regression
function.
For classification, SVMs
operate by finding a hypersurface in the space of possible inputs. This
hypersurface will attempt to split the positive examples from the negative
examples. The split will be chosen to have the largest distance from the
hypersurface to the nearest of the positive and negative examples. Intuitively,
this makes the classification correct for testing data that is near, but not
identical to the training data. More information can be found in Burges'
tutorial or in Vapnik's book (see below).
There are various ways to
train SVMs. One particularly simple and fast method is Sequential
Minimal Optimization.
The output of an SVM is
an uncalibrated value, not a posterior probability of a class given an input.
However, I have recently created an algorithm to map SVM outputs into posterior
probabilities. This algorithm is described in
J. Platt, Probabilistic
Outputs for Support Vector Machines and Comparisons to Regularized Likelihood
Methods (84K gzipped PS file), Advances in Large Margin
Classifiers, A. Smola, P. Bartlett, B. Scholkopf, D. Schuurmans, eds., MIT
Press, (1999), to appear.
Training an SVM on a
large data set with many classes can be slow. Along with
J. Platt, N. Cristianini, J.
Shawe-Taylor, Large Margin
DAGs for Multiclass Classification (95 K PS file), in Advances in Neural Information
Processing Systems 12, pp. 547-553, MIT Press, (2000). Also available as a 84 K pdf file.
This page was written by John Platt
of the CCSP Group at Microsoft Research. Last updated:
08/07/01.