THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Dejan Sarka

Data Mining Algorithms – Support Vector Machines

Support vector machines are both, unsupervised and supervised learning models for classification and regression analysis (supervised) and for anomaly detection (unsupervised). Given a set of training examples, each marked as belonging to one of categories, an SVM training algorithm builds a model that assigns new examples into one category. An SVM model is a representation of the cases as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.

A support vector machine constructs a hyper-plane or set of hyper-planes in a high-dimensional space defined by the input variables, which can be used for classification, regression, or other tasks. A SVM is a discrete linear classifier. A good separation is achieved by the hyper-plane that has the largest distance to the nearest training data point of any class (so-called functional margin). The larger the margin the lower the generalization error. Let me show you this on a graphical example. Of course, I am showing a two-dimensional space defined by only two input variables, and therefore my separating hyper-plane is just a line.

The first figure shows the two-dimensional space with cases and a possible single-dimensional hyper-plane (line). Of course, you can see that this line cannot be a separator at all, because there are some cases on the line, or said differently, on both sides of the line.

image

The next try is better. The line is separating the cases in the space. However, this is not the best possible separation. Some cases are pretty close to the line.

image

The third picture shows the best possible separation. The hyper-plane that separates the cases the best is found, and the model is trained.

image

Support Vector Machines are powerful for some specific classifications:

  • Text and hypertext categorization
  • Images classification
  • Classifications in medicine
  • Hand-written characters recognition

One-class SVM can be used for anomaly detection, like detection of dirty data, or fraud detection. It uses a classification function without parameters, the one selected for the separation without regard to a target variable. Cases that are close to the separation hyper-plane are the suspicious cases. Therefore, the result is dichotomous: 1=regular case, 0=outlier.

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(required) 
Submit

About Dejan Sarka

Dejan Sarka, MCT and SQL Server MVP, is an independent consultant, trainer, and developer focusing on database & business intelligence applications. His specialties are advanced topics like data modeling, data mining, and data quality. He is the founder of the Slovenian SQL Server and .NET Users Group. Dejan Sarka is the main author or coauthor of fourteen books about databases and SQL Server. Dejan Sarka also developed and is developing many courses and seminars for SolidQ, Microsoft and Pluralsight. He is a regular speaker at many conferences worldwide for more than 15 years, including conferences like Microsoft TechEd, PASS Summit and others.

This Blog

Syndication

Privacy Statement