The Curse of Dimensionality in Machine Learning!

Swapnil Vishwakarma 21 Nov, 2023 • 6 min read

This article was published as a part of the Data Science Blogathon

What is the curse of dimensionality?


Curse of dimensionality
Photo by Tim Carey on Unsplash

It refers to the phenomena of strange/weird things happening as we try to analyze the data in high-dimensional spaces. Let us understand this peculiarity with an example, suppose we are building several machine learning models to analyze the performance of a Formula One (F1) driver. Consider the following cases:

i) Model_1 consists of only two features say the circuit name and the country name.

ii) Model_2 consists of 4 features say weather and max speed of the car including the above two.

iii) Model_3 consists of 8 features say driver’s experience, number of wins, car condition, and driver’s physical fitness including all the above features.

iv) Model_4 consists of 16 features say driver’s age, latitude, longitude, driver’s height, hair color, car color, the car company, and driver’s marital status including all the above features.

v) Model_5 consists of 32 features.

vi) Model_6 consists of 64 features.

vii) Model_7 consists of 128 features.

viii) Model_8 consists of 256 features.

ix) Model_9 consists of 512 features.

x) Model_10 consists of 1024 features.

Assuming the training data remains constant, it is observed that on increasing the number of features the accuracy tends to increase until a certain threshold value and after that, it starts to decrease. From the above example the accuracy of Model_1 < accuracy of Model_2 < accuracy of Model_3 but if we try to extrapolate this trend it doesn’t hold true for all the models having more than 8 features. Now you might wonder if we are providing some extra information for the model to learn why is it so that the performance starts to degrade. My friends welcome to the curse of dimensionality!

If we think logically some of the features provided to Model_4 don’t actually contribute anything towards analyzing the performance of the F1 driver. For example, the driver’s height, hair color, car color, car company, and the driver’s marital status is giving useless information for the model to learn, hence the model gets confused with all this extra information, and the accuracy starts to go down.

The curse of dimensionality was first termed by Richard E. Bellman when considering problems in dynamic programming.

Curse of dimensionality in various domains

There are several domains where we can see the effect of this phenomenon. Machine Learning is one such domain. Other domains include numerical analysis, sampling, combinatorics, data mining, and databases. As it is clear from the title we will see its effect only in Machine Learning.

What problems does curse of dimensionality cause?

The curse of dimensionality refers to the difficulties that arise when analyzing or modeling data with many dimensions. These problems can be summarized in the following points:

  • Data Sparsity: Data points become increasingly spread out, making it hard to find patterns or relationships.
  • Computational Complexity: The computational burden of algorithms increases exponentially.
  • Overfitting: Models become more likely to memorize the training data without generalizing well.
  • Distortion of Distance Metrics: Traditional distance metrics become less reliable in measuring proximity.
  • Visualization Challenges: Projecting high-dimensional data onto lower dimensions leads to loss of information.
  • Data Preprocessing: Identifying relevant features and reducing dimensionality is crucial for effective analysis.
  • Algorithmic Efficiency: Algorithms need to be scalable and efficient to handle the complexity of high-dimensional spaces.
  • Domain-Specific Challenges: Each domain faces unique challenges in high-dimensional spaces, requiring tailored approaches.
  • Interpretability Issues: Understanding the decision-making process of high-dimensional models becomes increasingly difficult.
  • Data Storage Requirements: Efficient data storage and retrieval strategies are essential for managing large volumes of high-dimensional data

    How to overcome its effect

    This was a general overview of the curse of dimensionality. Now we will go slightly technical to understand it completely. In ML, it can be defined as follows: as the number of features or dimensions ‘d’ grows, the amount of data we require to generalize accurately grows exponentially. As the dimensions increase the data becomes sparse and as the data becomes sparse it becomes hard to generalize the model. In order to better generalize the model, more training data is required.

    1. Hughes phenomenon

    Again let’s take an example under this phenomenon. Assume all the features in a dataset are binary. If the dimensionality is 3 i.e. there are 3 features then the total number of data points will be equal to 23 = 8. If the dimensionality is 10 i.e. there are 10 features then the total number of data points will be equal to 210 = 1024. It is clear that as dimensionality increases the number of data points also increases exponentially which implies dimensionality is directly proportional to the number of data points required for training a machine learning model.

    There is a very interesting phenomenon called the Hughes phenomenon which states that for a fixed size dataset the performance of a machine learning model decreases as the dimensionality increases.

    2. Distance functions (especially Euclidean distance)

    Let’s think of a 1D world where n points are spread randomly between 0 and 1, we have a point xi.

    From the above two figures, it is clear that the Euclidean distance between pair of points is very close to 0.

    Now let me define two terms,

    Dist_min (xi) = min{euc-dist(xi, xj} where xi is not equal to xj.

    Dist_max (xi) = max{euc-dist(xi, xj} where xi is not equal to xj.

    For 1D, 2D and 3D,

    {[dist-max(xi) – dist-min(xi)] / dist-min(xi)} > 0

    Taking the limit as d -> infinity, {[dist-max(xi) – dist-min(xi)] / dist-min(xi)} tends towards 0. Now you might wonder what happens if this ratio tends to 0.

    From the above figures, we can see how those peaks are getting formed as the dimensions are increasing. At the heart of KNN, it works well if the pair of points are closer together in a cluster but at higher dimensions, we can see the pair of points that are very close to each other reduces and we have lot many pair of points having distance 5-10 and 15-20 when d=100 and it only increases on increasing the dimensions. So we know for sure KNN will break apart in such conditions.

    Let me break it down for you even further.

    {[dist-max(xi) – dist-min(xi)] / dist-min(xi)}

    The above ratio will only become 0 when the numerator becomes 0 i.e. dist-max and dist-min are equal, which means in higher dimensional spaces every pair of points are equally distant from every other pair of points. For example, the distance between xi and xj is almost equal to the distance between xi and xk. This is true for every pair of points.

    In high dimensional spaces, whenever the distance of any pair of points is the same as any other pair of points, any machine learning model like KNN which depends a lot on Euclidean distance, makes no more sense logically. Hence KNN doesn’t work well when the dimensionality increases. Even though this was theoretically proven for n random points, it has been observed experimentally also that KNN doesn’t work well in higher dimensional spaces. So what is the solution?

    The solution is very simple. Use cosine-similarity instead of Euclidean distance as it is impacted less in higher dimensional spaces. That’s why especially in-text problems where we use a bag of words, TF-IDF, word-to-vec, etc., cosine similarity is preferred because of high dimensional space.

    It is important to note that all these observations were made assuming the spread of points is uniform and random. So the very next thing that comes into mind is what if the spread of points are not uniform and random. We can think of this from a different angle i.e.

    a) When dimensionality is high and points are dense, the impact of dimensionality is high.

    b) When dimensionality is high and points are sparse, the impact of dimensionality is low.

    3. Overfitting and Underfitting

    There is a relationship between ‘d’ and overfitting which is as follows:

    ‘d’ is directly proportional to overfitting i.e. as the dimensionality increases the chances of overfitting also increases.

    Let’s discuss the solutions to tackle this problem.

    a) Model-dependent approach: Whenever we have a large number of features, we can always perform forward feature selection to determine the most relevant features for the prediction.

    b) Unlike the above solution which is classification-oriented, we can also perform dimensionality reduction techniques like PCA and t-SNE which do not use the class labels to determine the most relevant features for the prediction.

    So it is important to keep in mind whenever you download a new dataset that has a large number of features, you can reduce it by some of the techniques like PCA, t-SNE, or forward selection in order to ensure your model is not affected by the curse of dimensionality.

    If you liked my article, you can connect with me via LinkedIn


    In summary, the Curse of Dimensionality in Machine Learning highlights challenges when dealing with high-dimensional data. It affects diverse domains, increasing computational demands and reducing model performance. Overcoming it involves feature selection, dimensionality reduction, and careful algorithm choices. Understanding and addressing these aspects are crucial for efficient and accurate machine learning models in various applications.


    Q1. What is the curse of dimensionality Bayes?

    The curse of dimensionality challenges the performance of Bayesian models in high-dimensional data analysis due to increased computational complexity, data sparsity, sensitivity to outliers, and model selection difficulties. To address these challenges, dimensionality reduction, regularization, hierarchical Bayesian models, and ensemble methods can be employed to improve model performance.

    Q2. What is the curse of dimensionality in SVM?

     SVM classifiers struggle with high-dimensional data due to data sparsity, computational complexity, overfitting, and noise sensitivity. To address these challenges, dimensionality reduction, regularization, feature selection, kernel methods, and ensemble methods can be employed to enhance SVM performance in high-dimensional settings.

    Q3. Which algorithms suffer from curse of dimensionality?

    Many machine learning algorithms, like KNN, k-means clustering, SVM, Decision Trees, and Neural Networks, face difficulties in high-dimensional data due to dimensionality issues. Techniques like dimensionality reduction, regularization, and feature selection can help mitigate these challenges, but the curse of dimensionality remains a significant obstacle in machine learning research.



    The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

    Frequently Asked Questions

    Lorem ipsum dolor sit amet, consectetur adipiscing elit,

    Responses From Readers


    Related Courses
    0 Hrs 23 Lessons

    Dimensionality Reduction for Machine Learning

    0 Hrs 261 Lessons

    Machine Learning Certification Course for Beginners-2

    0 Hrs 261 Lessons

    Machine Learning Certification Course for Beginners


    • [tta_listen_btn class="listen"]