# Introduction to SVM(Support Vector Machine) Along with Python Code

*This article was published as a part of the Data Science Blogathon.*

** **

**Introduction**

This article aims to provide a basic understanding of the SVM, the optimization that is happening behind the scene, and knowledge about its parameters along with its implementation in Python.

**Table of Content**

1. What is SVM?

2. Optimization Technique used in SVM

3. How to choose the Correct SVM

4. Hard and Soft margin SVM

5. Relation between Regularization parameter (C) and SVM

6. Other Parameters of SVM

7. Kernel -trick in SVM

8.Implementation of Support Vector Machine using Python

9. End Notes

**1. **__What is SVM?__

__What is SVM?__

Support Vector Machine is a supervised learning algorithm that can be used for both classification and regression problems. It is mostly used for classification problems.

We should keep in mind that the main task of the classification problem is to find the **best separating hyperplane/ Decision boundary. **We can have the ‘n-1’ hyperplane which can be either linear or nonlinear.

Class labels are denoted as **-1** for negative class and **+1** for positive class in Support Vector Machine.

We can clearly see from the above figure that both the Hyperplane (HP2 & HP3) are not able to correctly classify the test-data points because they are very close to the hyperplane.

Such data points are called **Support vectors **which are simply feature values in vector form**.**

From the above figure, we can see that Hyperplane (HP4) is the best as it is able to correctly classify all the data points including support vectors.

**This brings us to think what exactly are Margins?**

Margins are generally defined by the closest data points (called support vectors) on either of the hyperplane

** Note**: Another point to note from the above figure is that the further the data points are from the margins the more correctly they are classified.

**2. **__Optimization Technique used in SVM__

__Optimization Technique used in SVM__

The core of any Machine learning algorithm is the Optimization technique that is happening behind the scene.

SVM ** maximizes the margin** by learning a suitable decision boundary/decision surface/separating hyperplane.

It can be mathematically be written as:

**Points to note from the above Figure:**

**Points to note from the above Figure:**

**a. **We can clearly see that SVM tries to maximize the margins and thus called __Maximum Margin Classifier.__

**b. **The Support Vectors will have values exactly either {+1, -1}.

**c. **The more negative the values are for the Green data points the better it is for classification.

**d. **The more positive the values are for the Red data points the better it is for classification

*For more in-depth knowledge regarding the maths behind Support Vector Machine refer to this* article

__3. __**How to choose the Correct SVM**

**How to choose the Correct SVM**

Choosing a correct classifier is really important. Let us understand this with an example.

Suppose we are given 2 Hyperplane one with 100% accuracy (HP1) on the left side and another with >90% accuracy (HP2) on the right side. Which one would you think is the correct classifier?

Most of us would pick the *HP2* thinking that it because of the maximum margin. But it is the wrong answer.

But Support Vector Machine would choose the *HP1* though it has a narrow margin. Because though HP2 has maximum margin but it is going against the constrain that: **each data point must lie on the correct side of the margin and there should be no misclassification. **This constrain is the

*that Support Vector Machine follows throughout.*

**hard constrain**This brings us to the discussion about Hard and Soft SVM.

**4. **__Hard and Soft SVM__

__Hard and Soft SVM__

I would like to again continue with the above example.

We can now clearly state that __HP1 is a Hard SVM__(left side) while __HP2 is a Soft SVM__(right side).

*By default, Support Vector Machine implements Hard margin SVM*. It works well only if our data is linearly separable.

Hard margin SVM does not allow any misclassification to happen.

In case our data is non-separable/ nonlinear then the Hard margin SVM will not return any hyperplane as it will not be able to separate the data. Hence this is where Soft Margin SVM comes to the rescue.

Soft margin SVM allows some misclassification to happen by relaxing the hard constraints of Support Vector Machine.

Soft margin SVM is implemented with the help of the __Regularization parameter (C).__

** Regularization parameter (C): **It tells us how much misclassification we want to avoid.

– *Hard margin SVM generally has large values of C.*

– *Soft margin SVM generally has small values of C.*

**5.** __Relation between Regularization parameter (C) and SVM__

__Relation between Regularization parameter (C) and SVM__

Now that we know what the Regularization parameter (C) does. We need to understand its relation with Support Vector Machine.

– As the value of C increases the margin decreases thus Hard SVM.

– If the values of C are very small the margin increases thus Soft SVM.

– Large value of C can cause overfitting therefore we need to select the correct value using Hyperparameter Tuning.

**6.** __Other Parameters of SVM__

__Other Parameters of SVM__

Other significant parameters of Support Vector Machine are the ** Gamma** values. It tells us how much will be the influence of the individual data points on the decision boundary.

– Large Gamma: Fewer data points will influence the decision boundary. Therefore, decision boundary becomes non-linear leading to __overfitting__

– Small Gamma: More data points will influence the decision boundary. Therefore, the decision boundary is more *generic.*

**7. **__Kernel -trick in SVM __

__Kernel -trick in SVM__

Support Vector Machine deals with nonlinear data by transforming it into a higher dimension where it is linearly separable. Support Vector Machine does so by using different values of Kernel. We have various options available with kernel like, **‘linear’, “rbf”, ”poly”** and others (default value is “rbf”). *Here “rbf” and “poly” are useful for non-linear hyper-plane.*

From the above figure, it is clear that choosing the right kernel is very important in order to get the correct results.

**8.**__Implementation of SVM using Python__

__Implementation of SVM using Python__

For this part, I will be using the Iris dataset.

**1. Load the libraries and the dataset.**

**Python Code:**

**2. I have created a Decision Boundary function for better understanding.**

** **

def decision_boundary(X,y,model,res,test_idx=None): markers=['s','o','x'] colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan') colormap=ListedColormap(colors[:len(np.unique(y))]) x_min,x_max=X[:,0].min()-1,X[:,0].max()+1 y_min,y_max=X[:,1].min()-1,X[:,1].max()+1 xx,yy=np.meshgrid(np.arange(x_min,x_max,res),np.arange(y_min,y_max,res)) z=model.predict(np.c_[xx.ravel(), yy.ravel()]) zz=z.reshape(xx.shape) plt.pcolormesh(xx,yy,zz,cmap=colormap) for idx,cl in enumerate(np.unique(y)): plt.scatter(X[y==cl,0],X[y==cl,1],c=colors[idx],cmap=plt.cm.Paired, edgecolors='k',marker=markers[idx],label=cl,alpha=0.8)

** **

**3. Split the dataset and Standardize the data**

from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler

X_train,X_test,y_train,y_test=train_test_split(X,Y,test_size=0.3) scaler=StandardScaler() scaler.fit(X_train) X_train_new=scaler.transform(X_train) X_test_new=scaler.transform(X_test)

** **

**4. I have implemented the Soft & Hard SVM by experimenting with high and low values of C**

model=SVC(C=10**10)model.fit(X_train,y_train) # Hard SVM decision_boundary(np.vstack((X_train,X_test)),np.hstack((y_train,y_test)),model,0.08,test_idx=None) plt.xlabel('sepal length ') plt.ylabel('sepal width ') plt.legend(loc='upper left') plt.tight_layout() plt.show()

** **

model=SVC(C=100) # Soft SVM model.fit(X_train,y_train) decision_boundary(np.vstack((X_train,X_test)),np.hstack((y_train,y_test)),model,0.08,test_idx=None) plt.xlabel('sepal length ') plt.ylabel('sepal width ') plt.legend(loc='upper left') plt.tight_layout() plt.show()

*We can clearly see that Soft SVM allows for some misclassification, unlike Hard SVM.*

** **

**5. ****Experimenting with gamma values.**

plt.figure(figsize=(5,5)) model = SVC(kernel='rbf', random_state=1, gamma=1.0, C=10.0) model.fit(X_train_new,y_train) decision_boundary(np.vstack((X_train_new,X_test_new)),np.hstack((y_train,y_test)),model,0.02,test_idx=None) plt.title('Gamma=1.0') plt.legend(loc='upper left') plt.tight_layout() plt.show()

plt.figure(figsize=(5,5)) model = SVC(kernel='rbf', random_state=1, gamma=10.0, C=10.0) model.fit(X_train_new,y_train) decision_boundary(np.vstack((X_train_new,X_test_new)),np.hstack((y_train,y_test)),model,0.02,test_idx=None) plt.title('Gamma=10.0') plt.legend(loc='upper left') plt.tight_layout() plt.show()

plt.figure(figsize=(5,5)) model = SVC(kernel='rbf', random_state=1, gamma=100.0, C=10.0) model.fit(X_train_new,y_train) decision_boundary(np.vstack((X_train_new,X_test_new)),np.hstack((y_train,y_test)),model,0.02,test_idx=None) plt.title('Gamma=100.0') plt.legend(loc='upper left') plt.tight_layout() plt.show()

** **

__.__

**over-fitting***t is generally preferred to keep Gamma value small in order to have a more ‘Generalized Model’.*

* *

**6. Implementing the Kernel -trick along with experimenting with the values of C.**

For this part, I have created a function for creating sub-plots along with Decision-Boundary.

def create_mesh(x,y,res=0.02): x_min,x_max=x.min()-1,x.max()+1 y_min,y_max=y.min()-1,y.max()+1 xx,yy=np.meshgrid(np.arange(x_min,x_max,res),np.arange(y_min,y_max,res)) return xx,yy def create_contours(ax,clf,xx,yy,**parameters): z=clf.predict(np.c_[xx.ravel(),yy.ravel()]) zz=z.reshape(xx.shape) out = ax.contourf(xx, yy, zz) return out ## Creating the sub-plots models = (svm.SVC(kernel='linear', C=1.0), svm.SVC(C=1.0),SVC(C=10**10,kernel='linear'),SVC(C=10**10,kernel='rbf')) models = (clf.fit(X_train, y_train) for clf in models) # title for the plots titles = ('Soft SVC with linear kernel', 'Soft SVC with rbf kernel', 'Hard -SVC with linear kernel','Hard -SVC with rbf kernel') # Set-up 2x2 grid for plotting. fig, sub = plt.subplots(2, 2,figsize=(10,10)) plt.subplots_adjust(wspace=0.4, hspace=0.4) xx,yy=create_mesh(X[:,0], X[:,1]) for clf, title, ax in zip(models, titles, sub.flatten()): markers=['s','o','x'] colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan') colormap=ListedColormap(colors[:len(np.unique(Y))]) create_contours(ax, clf, xx, yy,cmap=colormap) for idx,cl in enumerate(np.unique(Y)): ax.scatter(X[Y==cl,0],X[Y==cl,1],c=colors[idx],cmap=colormap, edgecolors='k',marker=markers[idx],label=cl,alpha=0.8) ax.set_xlim(xx.min(), xx.max()) ax.set_ylim(yy.min(), yy.max()) ax.set_xlabel('Sepal length') ax.set_ylabel('Sepal width') ax.set_xticks(()) ax.set_yticks(()) ax.set_title(title) plt.show()

**9. **__End Notes__

__End Notes__

In this article, we looked at the Support Vector Machine in detail. I have discussed in detail the important concepts which are needed for proper understanding along with its implementation in Python. I hope you like this article, if you do please like and share.

For the whole code, implementation refer to my Github Repository

Dishaa Agarwal My interests lie in the field of Machine Learning and Data Science. Enthusiasm to learn new skills is always present in me.