Feature Selection Techniques in Machine Learning

Dhanya Thailappan 26 May, 2023 • 10 min read

Feature selection plays a crucial role in building accurate and efficient machine learning models. In this article, we explore various feature selection techniques, from filter to wrapper methods, to help reduce data dimensionality and improve model performance. Learn how to choose the most appropriate approach for your dataset.

This article was published as a part of the Data Science Blogathon.

What is Feature Selection Techniques in Machine Learning?

Feature selection is an important process in machine learning and data analysis. It involves selecting a subset of relevant features from a larger set of available features. These features are also known as variables, predictors, or attributes. The primary objective of feature selection is to identify and retain the most informative and relevant features while discarding or ignoring the irrelevant or redundant ones. By doing so, we can improve the performance of our models by focusing on the most meaningful information and avoiding noise or unnecessary complexity.

Feature selection techniques in machine learning involve selecting the most relevant features or variables from a dataset, which helps to reduce the dimensionality of the data and improve model performance. There are various methods, including filter and wrapper methods, for selecting the best set of features for a given dataset. The goal is to eliminate irrelevant or redundant features while retaining those that have the most predictive power.

Need of Feature Selection Techniques in Machine Learning

  • Feature selection reduces the dimensionality of the data, making it easier for the model to learn and reducing the risk of overfitting.
  • It removes irrelevant or redundant features that can negatively impact model performance and accuracy.
  • It helps to identify the most important features that have the most predictive power, allowing models to be more efficient and effective.
  • By reducing the number of features, feature selection can also help to reduce training time and computational costs.
  • Feature selection is essential in building accurate and efficient machine learning models that can generalize well to new data.
  • It can also improve the interpretability of models by highlighting the most important factors that contribute to predictions.
  • Different feature selection techniques, including filter, wrapper, and embedded methods, can be used depending on the type of data and the modeling approach.
  • It is an ongoing process, and it may be necessary to revisit feature selection as new data becomes available or as the model is refined.

Types of Feature Selection Techniques

The choice of feature selection technique depends on the type and amount of data available, as well as the modeling approach. It’s important to experiment with different methods to find the best approach for a given problem.

  1. Filter methods: These methods rank the features based on statistical measures such as correlation, mutual information, or chi-squared tests. Features with the highest scores are selected for the model.
  2. Wrapper methods: These methods involve training and evaluating the model with different subsets of features, using a search algorithm to find the optimal set of features that maximizes model performance.
  3. Embedded methods: These methods incorporate feature selection into the model training process, selecting the most relevant features during the training of the model.
  4. Principal Component Analysis (PCA): This method transforms the data into a lower-dimensional space by identifying linear combinations of features that capture the most significant variability in the data.
  5. Recursive Feature Elimination (RFE): This method iteratively removes the least important features from the model until the desired number of features is reached.
  6. Lasso Regression: This method performs regularization by adding a penalty term to the model’s loss function, which encourages the model to select a sparse set of features.
  7. Genetic Algorithms: These methods use an evolutionary search algorithm to find the optimal set of features that maximizes model performance.
  8. Univariate Feature Selection: This method selects the features that have the strongest relationship with the target variable, based on statistical tests such as ANOVA or t-tests.

Let’s understand each of these methods in depth!

Filter Method

First, we will see about the filter method.

feature selection techniques| filter method

Source

In the filter method, we have three sub-components. The first component is that suppose I have all the set of features I will be selecting the best subset.

How I will be selecting the best subset?

We can apply various techniques. Some of the techniques I would like to tell you are the ANOVA test which is a statistical method and other one is the CHI SQUARE test and one more method I would specify is correlation coefficient. These are the three techniques we use to select some important features. The important features mean that these features will be much correlated with the target output.

Let’s take an example. Here I am having an independent variable X and a target variable Y.

X Y
1 10
2 20
3 30
4 40

In this scenario, you can see that as X increases, Y also increases. So, concerning the correlation coefficient, you can say that X and Y are highly correlated. We have two terms. One is covariance and the other one is a correlation. Covariance maps the value between 0 and 1. Correlation is between -1 to +1. This correlation is for the Pearson correlation coefficient.

The second technique is the wrapper method.

Wrapper Method

Wrapper method | feature selection techniques

Source

The wrapper method is quite simple when compared to the filter method. Here, you don’t need to apply any statistical kinds of stuff. You have to apply only a simple mechanism. There are three basic mechanisms in this.

Let me explain it.

Forward Selection

This method is used to select the best important features from the particular dataset concerning the target output. Forward selection works simply. It is an iterative method in which we start having no feature in the model. In each iteration, it will keep adding the feature.

Let me explain this with an example.

I am considering A, B, C, D, and E as my independent features. Let F be the output or target feature.

Initially, the model will train with feature A only and record the accuracy. In the next iteration, it will take A and B and train and record accuracy. If this accuracy is better than the previous accuracy, it will be considering adding B in its features set. Likewise, in each iteration, it will be adding different features until it reaches better accuracy.

This is what forward selection is.

Next, we will see about backward selection.

Backward Elimination

This works slightly differently. Let’s discuss the same example. A, B, C, D, and E are independent features. F is the target variable. Now, I will take all the independent features and train the model. Before training the model, I will just apply a statistical test. This test will say that which feature is having the lowest impact on the target variable. This is how backward elimination is implemented.

Let me explain the recursive feature elimination.

Recursive Feature Elimination

It is a greedy optimization algorithm. The main aim of this method is to select a best-performing feature subset. It will not randomly select any feature. Rather than, it will find out which is the most useful feature. And in the next iteration, it will add the next useful feature concerning the target variable. Finally, it will rank all the features and eliminate the lower ones.

Remember that the above-mentioned techniques are useful when the dataset is small.

But in reality, you will get a large dataset.

Let’s try to understand the third technique called embedded methods.

Embedded Methods

embeded method | feature selection techniques

Source

Let me start with an example. I am having A, B, C, D, and E as independent variables. F is the target variable. The embedded technique creates a lot of subsets from the particular dataset. Sometimes, it may give A to the model and find the accuracy. It may give AB to the model and find the accuracy. It will try to do all the permutations and combinations. Whichever subset is having the maximum accuracy, that will be selected as a subset of features which will later be given to the dataset for training. That is how an embedded method works.

Let’s go and find out how univariate selection is done.

Univariate Selection

Univariate selection is a statistical test and it can be used to select those features that have the strongest relationship with the target variable.

Here, I am using the SelectKBest library. Suppose if you give K value as 5. It will find out the best 5 attributes concerning the target variable.

I am using a mobile price classification dataset. you can download it here.

import pandas as pd
import numpy as np
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
data = pd.read_csv("train.csv")
X = data.iloc[:,0:20]  
y = data.iloc[:,-1]

The dataset has many features. We have to select the best one. Because as you know in the curse of dimension, if I increase the number of features after a particular threshold value, the accuracy of the model will decrease.
For that, I am using univariate selection and the SelectKBest.

bestfeatures = SelectKBest(score_func=chi2, k=10)
fit = bestfeatures.fit(X,y)

After fitting, I will get two different parameters. One is fit.scores which will calculate the score with respect to the chi-square test value.

dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X.columns)

I am concatenating in the next statement for better visualization and I am renaming the column as Specs and Score.

featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Specs','Score']

Here, you can see all the features. The higher the score, the more important the feature is. here, the ram has the highest score.

featureScores
feature scores
feature scores 2

I am printing the top 10 features.

print(featureScores.nlargest(10,'Score'))
top 10 feature scores

These 10 best features can be used to train the model.

Let’s look into the next technique called feature importance.

Feature Importance

Here, you can get the feature importance of every feature. The higher the score, the more important the feature is. An inbuilt classifier called Extra Tree Classifier is used here to extract the best 10 features.

from sklearn.ensemble import ExtraTreesClassifier
import matplotlib.pyplot as plt
model = ExtraTreesClassifier()
model.fit(X,y)
extratree model | feature selection techniques

After fitting, you can see the scores of the features.

print(model.feature_importances_)
feature importance | feature selection techniques

The best 10 features can be seen like this.

feat_importances = pd.Series(model.feature_importances_, index=X.columns)
feat_importances.nlargest(10).plot(kind='barh')
plt.show()
plot feature importance

Let me explain the last technique.

Correlation Matrix with Heatmap

Here, we are checking each and every feature. The correlation can be plotted like this.

import seaborn as sns
corrmat = data.corr()
top_corr_features = corrmat.index
plt.figure(figsize=(20,20))
g=sns.heatmap(data[top_corr_features].corr(),annot=True,cmap="RdYlGn")
feature selection techniques | corelation matrix
corelation matrix 2

Here, the correlation value ranges from 0 to 1. The correlation between price_range and ram is very high and between battery and price_range is low.

Tips and Tricks for Feature Selection

  1. Understand your data: Before starting feature selection, it is essential to understand your data and its properties, such as the type of features, their correlation, and the target variable.
  2. Use domain knowledge: Incorporating domain knowledge into feature selection can lead to more relevant and meaningful features.
  3. Consider multiple methods: Several feature selection methods are available, and it’s essential to try multiple methods to determine which one works best for your data.
  4. Evaluate performance: It’s important to evaluate your model’s performance with different feature sets and select the one that yields the best results.
  5. Use ensemble methods: Ensemble methods combines the results of multiple feature selection techniques and provide a more robust feature set.
  6. Regularization: Regularization methods can penalize including irrelevant or redundant features in the model.
  7. Visualize feature importance: Plotting feature importance scores can provide a better understanding of the relevance of each feature.
  8. Avoid overfitting: Overfitting can occur when more features are included in the model, resulting in better generalization performance. It’s important to balance the number of features and the model’s complexity.
  9. Consider feature engineering: Feature engineering can be used to create new features that are more informative and relevant to the target variable.
  10. Automate feature selection: Automated feature selection techniques can save time and reduce the risk of human error.

Master the ML Feature Selection Techniques

These are basic techniques of feature selection. Now, you know that you just have to choose which features are important with respect to the target output. They reduces the dimensionality of the data, improves model performance, and identifies the most important features that have the most predictive power. By using a variety of feature selection techniques such as filter, wrapper, and embedded methods, data scientists can select the best set of features for a given dataset and modeling approach.

To enhance your skills in feature selection and other key data science techniques, consider enrolling in the our Data Science Black Belt program. This program offers a comprehensive curriculum that covers all aspects of data science, from programming languages and data visualization to machine learning and deep learning. With hands-on projects and mentorship, you’ll gain practical experience and the skills you need to succeed in this exciting field. Enroll today and take your data science skills to the next level.

Frequently Asked Questions

Q1. What are feature selection techniques?

A. Feature selection techniques in machine learning involve selecting the most important features or variables from a dataset, to reduce the dimensionality of the data and improve model performance.

Q2. What are the 3 feature selection techniques?

A. The three main feature selection techniques are filter methods, wrapper methods, and embedded methods.

Q3. What are the 2 different techniques for feature selection?

A. The two main techniques for feature selection are feature ranking and feature subset selection.

Q4. Which technique is a popular technique used for feature attribute selection in machine learning?

A. Filter methods are a popular technique for feature attribute selection in machine learning. These methods rank the features based on statistical measures such as correlation or mutual information, and select the top-ranked features for the model.

Q5. What is an example of feature selection?

A. An example of feature selection is when a researcher tries to determine which variables to include in a regression model. They may use a feature selection method to identify the subset of variables that best predicts the outcome of interest.

Q6. Which are the feature selection methods?

1. Filter methods: Pearson correlation, chi-squared test, mutual information, variance threshold, etc.
2. Wrapper methods: recursive feature elimination, forward feature selection, backward feature selection, etc.
3. Embedded methods: LASSO (Least Absolute Shrinkage and Selection Operator), Ridge Regression, Elastic Net, etc.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Data Science and AI enthusiast

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Related Courses

Machine Learning
Become a full stack data scientist