This article was published as a part of the Data Science Blogathon.

## Introduction

This article aims to provide a basic understanding of the SVM, the optimization that is happening behind the scene, and knowledge about its parameters along with its implementation in Python.

## Table of Content

1. What is SVM?

2. Optimization Technique used in SVM

3. How to choose the Correct SVM

4. Hard and Soft margin SVM

5. Relation between Regularization parameter (C) and SVM

6. Other Parameters of SVM

7. Kernel -trick in SVM

8.Implementation of Support Vector Machine using Python

9. End Notes

## 1. What is SVM?

Support Vector Machine is a supervised learning algorithm that can be used for both classification and regression problems. It is mostly used for classification problems.

We should keep in mind that the main task of the classification problem is to find the best separating hyperplane/ Decision boundary. We can have the ‘n-1’ hyperplane which can be either linear or nonlinear.

Class labels are denoted as -1 for negative class and +1 for positive class in Support Vector Machine.

We can clearly see from the above figure that both the Hyperplane (HP2 & HP3) are not able to correctly classify the test-data points because they are very close to the hyperplane.

Such data points are called Support vectors which are simply feature values in vector form.

From the above figure, we can see that Hyperplane (HP4) is the best as it is able to correctly classify all the data points including support vectors.

#### This brings us to think what exactly are Margins?

Margins are generally defined by the closest data points (called support vectors) on either of the hyperplane

Note: Another point to note from the above figure is that the further the data points are from the margins the more correctly they are classified.

## 2. Optimization Technique used in SVM

The core of any Machine learning algorithm is the Optimization technique that is happening behind the scene.

SVM maximizes the margin by learning a suitable decision boundary/decision surface/separating hyperplane.

It can be mathematically be written as:

#### Points to note from the above Figure:

a. We can clearly see that SVM tries to maximize the margins and thus called Maximum Margin Classifier.

b. The Support Vectors will have values exactly either {+1, -1}.

c. The more negative the values are for the Green data points the better it is for classification.

d. The more positive the values are for the Red data points the better it is for classification

For more in-depth knowledge regarding the maths behind Support Vector Machine refer to this article

## 3. How to choose the Correct SVM

Choosing a correct classifier is really important. Let us understand this with an example.

Suppose we are given 2 Hyperplane one with 100% accuracy (HP1) on the left side and another with >90% accuracy (HP2) on the right side. Which one would you think is the correct classifier?

Most of us would pick the HP2 thinking that it because of the maximum margin. But it is the wrong answer.

But Support Vector Machine would choose the HP1 though it has a narrow margin. Because though HP2 has maximum margin but it is going against the constrain that: each data point must lie on the correct side of the margin and there should be no misclassification. This constrain is the hard constrain that Support Vector Machine follows throughout.

This brings us to the discussion about Hard and Soft SVM.

## 4. Hard and Soft SVM

I would like to again continue with the above example.

We can now clearly state that HP1 is a Hard SVM(left side) while HP2 is a Soft SVM(right side).

By default, Support Vector Machine implements Hard margin SVM. It works well only if our data is linearly separable.

Hard margin SVM does not allow any misclassification to happen.

In case our data is non-separable/ nonlinear then the Hard margin SVM will not return any hyperplane as it will not be able to separate the data. Hence this is where Soft Margin SVM comes to the rescue.

Soft margin SVM allows some misclassification to happen by relaxing the hard constraints of Support Vector Machine.

Soft margin SVM is implemented with the help of the Regularization parameter (C).

Regularization parameter (C): It tells us how much misclassification we want to avoid.

Hard margin SVM generally has large values of C.

Soft margin SVM generally has small values of C.

## 5.Relation between Regularization parameter (C) and SVM

Now that we know what the Regularization parameter (C) does. We need to understand its relation with Support Vector Machine.

– As the value of C increases the margin decreases thus Hard SVM.

If the values of C are very small the margin increases thus Soft SVM.

Large value of C can cause overfitting therefore we need to select the correct value using Hyperparameter Tuning.

## 6.Other Parameters of SVM

Other significant parameters of Support Vector Machine are the Gamma values. It tells us how much will be the influence of the individual data points on the decision boundary.

Large Gamma: Fewer data points will influence the decision boundary. Therefore, decision boundary becomes non-linear leading to overfitting

Small Gamma: More data points will influence the decision boundary. Therefore, the decision boundary is more generic.

## 7. Kernel -trick in SVM

Support Vector Machine deals with nonlinear data by transforming it into a higher dimension where it is linearly separable. Support Vector Machine does so by using different values of Kernel. We have various options available with kernel like, ‘linear’, “rbf”, ”poly” and others (default value is “rbf”). Here “rbf” and “poly” are useful for non-linear hyper-plane.

From the above figure, it is clear that choosing the right kernel is very important in order to get the correct results.

## 8. Implementation of SVM using Python

For this part, I will be using the Iris dataset.

#### 1. Load the libraries and the dataset.

```import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.colors import ListedColormap
from sklearn import svm, datasets
from sklearn.svm import SVC```
```iris = datasets.load_iris()

X = iris.data[:, :2]  # we only take the first two features. We could
# avoid this ugly slicing by using a two-dim dataset
Y = iris.target```

#### 2. I have created a Decision Boundary function for better understanding.

```def decision_boundary(X,y,model,res,test_idx=None):
markers=['s','o','x']
colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
colormap=ListedColormap(colors[:len(np.unique(y))])
x_min,x_max=X[:,0].min()-1,X[:,0].max()+1
y_min,y_max=X[:,1].min()-1,X[:,1].max()+1
xx,yy=np.meshgrid(np.arange(x_min,x_max,res),np.arange(y_min,y_max,res))
z=model.predict(np.c_[xx.ravel(), yy.ravel()])
zz=z.reshape(xx.shape)
plt.pcolormesh(xx,yy,zz,cmap=colormap)

for idx,cl in enumerate(np.unique(y)):
plt.scatter(X[y==cl,0],X[y==cl,1],c=colors[idx],cmap=plt.cm.Paired, edgecolors='k',marker=markers[idx],label=cl,alpha=0.8)```

#### 3. Split the dataset and Standardize the data

```from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler```
```X_train,X_test,y_train,y_test=train_test_split(X,Y,test_size=0.3)
scaler=StandardScaler()
scaler.fit(X_train)
X_train_new=scaler.transform(X_train)
X_test_new=scaler.transform(X_test)```

#### 4. I have implemented the Soft & Hard SVM by experimenting with high and low values of C

```model=SVC(C=10**10)model.fit(X_train,y_train) # Hard SVM
decision_boundary(np.vstack((X_train,X_test)),np.hstack((y_train,y_test)),model,0.08,test_idx=None)
plt.xlabel('sepal length ')
plt.ylabel('sepal width ')
plt.legend(loc='upper left')
plt.tight_layout()
plt.show()```

```model=SVC(C=100)  # Soft SVM
model.fit(X_train,y_train)
decision_boundary(np.vstack((X_train,X_test)),np.hstack((y_train,y_test)),model,0.08,test_idx=None)
plt.xlabel('sepal length ')
plt.ylabel('sepal width ')
plt.legend(loc='upper left')
plt.tight_layout()
plt.show()```

We can clearly see that Soft SVM allows for some misclassification, unlike Hard SVM.

#### 5. Experimenting with gamma values.

```plt.figure(figsize=(5,5))
model = SVC(kernel='rbf', random_state=1, gamma=1.0, C=10.0)
model.fit(X_train_new,y_train)
decision_boundary(np.vstack((X_train_new,X_test_new)),np.hstack((y_train,y_test)),model,0.02,test_idx=None)
plt.title('Gamma=1.0')
plt.legend(loc='upper left')
plt.tight_layout()
plt.show()```

```plt.figure(figsize=(5,5))
model = SVC(kernel='rbf', random_state=1, gamma=10.0, C=10.0)
model.fit(X_train_new,y_train)
decision_boundary(np.vstack((X_train_new,X_test_new)),np.hstack((y_train,y_test)),model,0.02,test_idx=None)
plt.title('Gamma=10.0')
plt.legend(loc='upper left')
plt.tight_layout()
plt.show()```
```plt.figure(figsize=(5,5))
model = SVC(kernel='rbf', random_state=1, gamma=100.0, C=10.0)
model.fit(X_train_new,y_train)
decision_boundary(np.vstack((X_train_new,X_test_new)),np.hstack((y_train,y_test)),model,0.02,test_idx=None)
plt.title('Gamma=100.0')
plt.legend(loc='upper left')
plt.tight_layout()
plt.show()```

From the above plots, we can see that when we increase the value of Gamma the decision boundary becomes non-linear and leads to over-fitting.
It is generally preferred to keep Gamma value small in order to have a more ‘Generalized Model’.

#### 6. Implementing the Kernel -trick along with experimenting with the values of C.

For this part, I have created a function for creating sub-plots along with Decision-Boundary.

```def create_mesh(x,y,res=0.02):
x_min,x_max=x.min()-1,x.max()+1
y_min,y_max=y.min()-1,y.max()+1
xx,yy=np.meshgrid(np.arange(x_min,x_max,res),np.arange(y_min,y_max,res))
return xx,yy
def create_contours(ax,clf,xx,yy,**parameters):
z=clf.predict(np.c_[xx.ravel(),yy.ravel()])
zz=z.reshape(xx.shape)
out = ax.contourf(xx, yy, zz)
return out
## Creating the sub-plots
models = (svm.SVC(kernel='linear', C=1.0),
svm.SVC(C=1.0),SVC(C=10**10,kernel='linear'),SVC(C=10**10,kernel='rbf'))
models = (clf.fit(X_train, y_train) for clf in models)
# title for the plots
titles = ('Soft SVC with linear kernel',
'Soft SVC with rbf kernel', 'Hard -SVC with linear kernel','Hard -SVC with rbf kernel')
# Set-up 2x2 grid for plotting.
fig, sub = plt.subplots(2, 2,figsize=(10,10))
xx,yy=create_mesh(X[:,0], X[:,1])
for clf, title, ax in zip(models, titles, sub.flatten()):
markers=['s','o','x']
colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
colormap=ListedColormap(colors[:len(np.unique(Y))])
create_contours(ax, clf, xx, yy,cmap=colormap)
for idx,cl in enumerate(np.unique(Y)):
ax.scatter(X[Y==cl,0],X[Y==cl,1],c=colors[idx],cmap=colormap, edgecolors='k',marker=markers[idx],label=cl,alpha=0.8)
ax.set_xlim(xx.min(), xx.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xlabel('Sepal length')
ax.set_ylabel('Sepal width')
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(title)
plt.show()```

## 9. End Notes

In this article, we looked at the Support Vector Machine in detail. I have discussed in detail the important concepts which are needed for proper understanding along with its implementation in Python. I hope you like this article, if you do please like and share.

For the whole code, implementation refer to my Github Repository

Dishaa Agarwal  My interests lie in the field of Machine Learning and Data Science. Enthusiasm to learn new skills is always present in me. 