Support Vector Machines

The objective is to find the widest street and the best boundary that separates the two classes.

Imagine a vector perpendicular to the median line to the street.

SVM-1.PNG

SVM-2.PNG

SVM-3.PNG

The equation to maximize 1/2*(w^2), while classifying everything correctly.

SVM-4.PNG

svm-5.PNG

Kernels of svmsvm circular similarity

SVM-6.PNG

 

SVM optimization doesn’t get stuck in local maximum, it has a convex base, unlike Neural nets which could often get stuck in local maxima.

For linearly separable points, SVM works fine, but for nonlinearly separable points, you need to do a transformation to project these points to higher dimensional space, so that they can get linearly separable. For this, you need to use Gaussian or polynomial kernel.

Kernels represent similarity measure for different points and impart domain knowledge.

Adding a nonlinear feature often makes the SVM linearly separable.

Mercer condition:

Writing svm in sklearn:

from sklearn import svm

clf=svm.SVC(kernel=”linear”)

parameters of svm:

  1. //other kernels, kernel=”rbf”,kernel=”poly”
  2. C – Controls tradeoff between smooth decision boundary and classifying training points correctly. Large C will take the side of classifying more training points correctly.
  3. gamma – Reach of each training example. Low values -> far reach, high values -> close reach.

clf.fit(X,y)

Conclusion:

  1. A general method that is convex and guaranteed to produce a global solution.
  2. Small Sigma in Gaussian kernel can cause overfitting because then classification is shrunk right around the sample points.
  3. For handwritten character recognition, the linear kernel with n=2(nonlinear) works well.
  4. SVMs don’t perform very well in large datasets and lot of features because of the training time being cubic. Naive Bayes classifier would be better when there is lot of overlap and noise compared to svms.
  5. SVMs “don’t work well with lots and lots of noise, so when the classes are very overlapping, you have to count independent evidence.
  6. Compute/training time is way too high for this.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s