jadPython || Machine Learning
I promise this article is devoted to SVMs — but let me run you through a quick story. Believe it or not, every word is true!
Long ago the earth was inhabited by flatworms. The thing about these flatworms was that they were completely flat, so much so that they were two dimensional creatures! One day the king of these flatworms decided that he wanted to know if the earth was flat, just like them. “What other shape would it be sir!”, inquired one of his subjects. “I don’t know — but I have a strange feeling that it is not flat”, came the reply. So all of his subjects went and found the wisest of flatworms and brought him up to the king.
The wise one consulted a lot of books, and came up with a theorem that we today know as the Pythagorean theorem (Pythagoras you thief!). The king then with the wise flatworm applied this theorem — and deduced that the earth is indeed not flat! This stirred up quite a debate amongst the flatworms who couldn’t quite perceive how objects could be of any shape other than flat. This ultimately led to a full blown revolution and the king was assassinated. The flatworms got a new king and lived in their flat earth happily ever after.
You must be wondering why flatworms are taking over your screen right now. Well, flatworms can actually teach us about the importance of dimensionality. The 2-D flatworms couldn’t perceive the 3-D earth, as they were limited by their 2-D perspective — similar to how, in machine learning 1-D data limits the perspective of your model!
This is the primary problem that the support vector machines (SVMs) aim to solve. To better understand how they work, let’s first explore a support vector classifier.
Support Vector Classifier
Let’s assume we have one-dimensional data distributed in the following format:
Now, we can easily segregate the blue labeled data from the green labeled data via a line that goes through the middle—or, via a ‘hard’ margin. This ‘hard’ margin takes care that none of the data is classified wrongly. In other words, it largely overfits. This leads us to the problem of generalization.
What would happen if during training, a single data point labeled as blue ventures too close to a green data point? The hard classification would be shifted very far from its previous location — so much so that it may enter green labeled territories, where future green would arrive, creating chaos!
This is where the ‘soft’ margin classifier arrives. The soft margin classifier is ready to incorrectly classify a few labels to prevent the model from overfitting on the data too much . Thus, it does not go out of its way to classify outliers that would ultimately ruin the classification. The soft margin classifier is also known as the support vector classifier.
Now, what would happen if we had data that looked like this —
This is data that cannot be classified by either hard or soft margins. This is where support vector machines come into play.
Support Vector Machine
A support vector machine uses a kernel to raise the dimensionality of the input data. But how does that help us classify that data? To visualize how the data can be classified using a support vector machine, let’s increase the dimensionality of the data.
But from where do we bring in another dimension? The other dimension that we bring in here is nothing more than a function of the input dimension. Thus , if the input dimension is x and the new dimension is, then we can express y as —
where f is a mapping from x to y . In the example data, let’s take the function as the square of x. When we arrange data after increasing the dimension, we can visualize the data as —
Lo and behold! We have solved the seemingly complex and unsolvable classification problem — albeit with a little trick 😉 Thus, the SVM increases the dimensionality of the data, and the support vector classifier then classifies the data. Thus, SVM and SVC work hand in hand. But wait…Why did we square the data? Why use a squaring function?
This brings us to the topic of kernel functions.
Kernel Functions
A kernel function is one that helps the support vector machine decide how to increase the data’s dimensionality. In other words, SVM algorithms use a set of mathematical functions that are defined as the kernel.
The kernel functions can be of different types, namely linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid. The kernel function we used in this example to square the input data is called the polynomial kernel.The Polynomial kernel has a parameter ‘d’ that helps it decide what will be the final dimensionality on which the support vector will work.
Let’s see the support vector machine in action!
from sklearn import svm
X = [[0, 0], [1, 1]]#input
y = [0, 1]#label
classifier = svm.SVC(kernel='linear')#setting up the kernel
classifier.fit(X, y) #fitting the model with the input values
print(classifier.predict([[2., 2.]]))
#getting prediction from classification
The code is pretty simple — But it helps mute the powerful classification problems! For more parameters and customization , check out the code reference I’ve provided above.
Let me know if you’re interested in learning more about the mathematical details of kernel functions in the comments, and I will launch a sequel to this article. Also, feel free to fire away with any questions you have in the comments — Will deal with them all 😉
Check out my blog for faster updates and subscribe for quality content 😀
Hmrishav Bandyopadhyay is a 2nd year Undergraduate at the Electronics and Telecommunication department of Jadavpur University, India. His interests lie in Deep Learning, Computer Vision, and Image Processing. He can be reached at — [email protected] || https://hmrishavbandy.github.io
Comments 0 Responses