Interview Questions on KNN in Machine Learning

Parth Shukla Last Updated : 03 Dec, 2022

5 min read

This article was published as a part of the Data Science Blogathon.

Introduction

K nearest neighbors are one of the most popular and best-performing algorithms in supervised machine learning. Furthermore, the KNN algorithm is the most widely used algorithm among all the other algorithms developed due to its speed and accurate results. Therefore, the data science interviews might ask in-depth questions about the k nearest neighbors. In this article, we will discuss and solve advanced interview questions related to the k nearest neighbors in machine learning.

knn — Source: https://resources.biginterview.com/wp-content/uploads/2022/07/Panel-Interview-101-1080×675.jpg

1. Why is the Time Complexity Very High For the Prediction Phase in KNN? Explain with Reasons.

In almost every machine learning algorithm, the algorithm trains first on the training data and then makes predictions based on the dataset it was prepared before. K nearest neighbor is a machine learning clustering algorithm that divides the training data into a particular number of clusters by calculating the distance of the specific points from other points. Then while predicting for careful observation, it again calculates the length of the issue and tries to settle the matter in a particular cluster to make predictions.

There are two machine learning algorithms: Lazy Learning and Eager Learning. Lazy learning is a machine learning algorithm that does not train on the training data provided. Instead, when the query is made to the algorithm to predict for it, it only trains on the training dataset. While in eager learning algorithms, the algorithm tries to teach the training data when provided. Then, when the new query is made for prediction, the algorithm predicts based on the training on the previous data. K nearest neighbor also stores the training data. Then, when there is a time for the prediction phase, this algorithm calculates the distances of the query point from other points and tries to assign the cluster to the particular topic. So it only trains on the data when a query is made to the system, which is why it is known as a lazy learning algorithm.

FINDING NEIGHBOURS — Source: https://qph.cf2.quoracdn.net/main-qimg-53d74e4e12547a448799d5ebb126ebfc

As lazy learning algorithms store the data, it requires more space. That is why KNN requires more space to store.
The reason behind the speed of the KNN is that it does not train on the training data, so training in KNN is very fast.
As KNN trains on the training data while the prediction phase, predictions tend to be very slow in the KNN algorithm.

2. Why is KNN Algorithm Said to be More Flexible?

The K’s nearest neighbor is the non-parametric algorithm, which does not make any primary assumption while training and testing on the dataset. The parametric machine learning algorithms like linear regression, logistic regression, and naive Bayes make primary assumptions like the data should be linear or there should be no multicollinearity in the dataset. Due to this, some of the algorithms could be more flexible. If the assumptions they made are satisfied, we can only use them. For example, if the data is not linear, then linear regression cannot be applied; if the dataset has multicollinearity, then naive Bayes can not be applied.

But in the case of the KNN algorithm, as it is a non-parametric algorithm, it does not make any primary assumption on the dataset so that it can apply it to any dataset, and also it returns good results. So this is the main reason behind the flexibility of the KNN algorithm.

3. Why is KNN Algorithm Less Efficient Than Other Machine Learning Algorithms?

If you prefer flexibility, then KNN would be the best fit for the problem statement, but it also has a drawback in efficiency. Suppose one wants efficiency for the particular model. In that case, one should go for other algorithms available as KNN is not a very efficient machine learning algorithm compared to the different machine learning algorithms. As KNN is a lazy learning algorithm, it generally stores the input or the training data and does not train while the raining data is fed.

Instead, it trains when the query for prediction is made, which is the main reason behind the more time complexity in the prediction phase. While some of the eager learning algorithms, like linear regression, instantly train on the training data and predict the data very fast. So that due to this reason, KNN is said to be less efficient compared to the other machine learning algorithms.

4. Why Does KNN Performs Well on Normalized Datasets?

We know that K’s nearest neighbor is the distance-based machine learning algorithm, which calculated the euclidian distance between points and returned the output. Noe, in some cases, the scale of the features of the dataset might be very different; in that case, the distances between points will also be very high or very low. Due to this, there will be errors or noisy data in euclidian lengths; hence, the algorithm will not perform well. For example, we have a dataset of the Age and Salary of the person, now, the Age may vary from 0 to 99, and the salary can be in lakhs or crores. So here, the scale is very different between the two features, so it also affects the euclidian distance, and hence the algorithm will perform poorly if the data is not normalized.Now, if the data is normalized, in that case, all the values will be between 0 and 1. So calculating the euclidian distances on the same scale of the data will be very easy o=for the algorithm, and hence the model will perform well.

5. How Could the Less Value of K Lead to Overfitting in the KNN Algorithm? Explain.

The value of K in the KNN algorithm means the number of neighbors. So suppose the value of k is 3. Then we want to consider three neighbors for the model’s training. Let’s have a case where we have a significantly less value of k, say 1. We will only consider one neighbor for training the model in this case. So many clusters will be created, and the model will try to fit every data point of the dataset, leading to good performance on training data and poor performance on testing data.

On the other side, if we have a very high value of K, then there will be less amount of clusters that will be created, which will lead to the abysmal performance of the model; the case of under-fitting, where the model will perform poorly on training and testing both data.

Conclusion

In this article, we discussed advanced interview questions related to the k nearest neighbors and their solutions with core intuitions and logical reasons behind them. Knowledge about these concepts will help one answer these tricky and different questions efficiently.

Some Key Takeaways from this article are:

1. KNN is a lazy learning algorithm that stores the data while the training phase and does not use the stored data while the training phase. While in the predictions phase of KNN, so many calculations are involved as it is a lazy learning algorithm.

2. The time complexity for KNN in the training phase is low, and the testing phase is high, as it is a lazy learning algorithm which never does any calculations while the training phase. The space complexity also follows the same trend as time complexity in the KNN algorithm.

3. KNN is a non-parametric machine learning algorithm that provides higher flexibility and lower efficiency. As it is a non-parametric algorithm, it has no pre-assumptions like linear regression.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Parth Shukla

Free Courses

AI Interview Questions & Answers Masterclass

Master AI interview questions with expert answers.

4.5

Model Deployment using FastAPI; Prepare, Train, and Test FastAPI Application

Deploy a fastapi machine learning model with XGBoost and Docker APIs.

Build Data Pipelines with Apache Airflow

Learn ETL pipeline building and workflow orchestration with Airflow.

4.6

Evaluation Metrics for Machine Learning Models

This course covers evaluation metrics to improve ML model performance.

4.8

The A to Z of Unsupervised Machine Learning

Learn Unsupervised ML & DBSCAN with real-world applications.

vipul

u explain that knn uses euclidean distance but we can use manhatten,also.In general minkowski distance

Hello There, Thank you for commenting. Yes you are absolutely right that KNN algorithms uses other distance methods also, but for ease of understanding I mentioned the same as it is the easiest to understand. The concept here is important to understand, If one want to use another distance method, then he/she can do it easily. Thank You.

Reading list

Interview Questions on KNN in Machine Learning

Introduction

1. Why is the Time Complexity Very High For the Prediction Phase in KNN? Explain with Reasons.

2. Why is KNN Algorithm Said to be More Flexible?

3. Why is KNN Algorithm Less Efficient Than Other Machine Learning Algorithms?

4. Why Does KNN Performs Well on Normalized Datasets?

5. How Could the Less Value of K Lead to Overfitting in the KNN Algorithm? Explain.

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

AI Interview Questions & Answers Masterclass

Model Deployment using FastAPI; Prepare, Train, and Test FastAPI Application

Build Data Pipelines with Apache Airflow

Evaluation Metrics for Machine Learning Models

The A to Z of Unsupervised Machine Learning

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Interview Questions on KNN in Machine Learning

Introduction

1. Why is the Time Complexity Very High For the Prediction Phase in KNN? Explain with Reasons.

2. Why is KNN Algorithm Said to be More Flexible?

3. Why is KNN Algorithm Less Efficient Than Other Machine Learning Algorithms?

4. Why Does KNN Performs Well on Normalized Datasets?

5. How Could the Less Value of K Lead to Overfitting in the KNN Algorithm? Explain.

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

AI Interview Questions & Answers Masterclass

Model Deployment using FastAPI; Prepare, Train, and Test FastAPI Application

Build Data Pipelines with Apache Airflow

Evaluation Metrics for Machine Learning Models

The A to Z of Unsupervised Machine Learning

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques