Top 20 Conceptual Questions To Test Your Data Science Skills In 2025

Chirag Goyal Last Updated : 06 Dec, 2024

7 min read

This article was published as a part of the Data Science Blogathon.

Introduction

In this article, I have curated a list of 20 Questions on Data Science Concepts consisting of MCQs(One or more correct), True-False, and Integer Type Questions to check your understanding.

Let’s Get Started.

Question Context: 1- 3

Suppose we want to use an automatic classification system to differentiate between COVID-19 negative (Negative class) and Covid-19 positive(Positive class). We have evaluated two pattern classification systems and the data obtained is given below –

Captionless Image

1. The number of False Positives(FP) and False Negatives(FN) for both systems respectively are:

(a) System A: FP = 20,FN = 25 ; System B: FP = 15, FN = 30

(b) System A: FP = 15,FN = 30 ; System B: FP = 20, FN = 25

(d) System A: FP = 30,FN = 20 ; System B: FP = 15, FN = 25

Answer: [ a ]

Hint: Read the confusion matrix carefully and use basic concepts.

2. The Sensitivity and Specificity for System-A respectively are:

(a) Sensitivity = 0.75, Specificity = 0.80

(b) Sensitivity = 0.70, Specificity = 0.85

(d) Sensitivity = 0.70, Specificity = 0.80

Answer: [ a ]

Hint: Use the formula to calculate sensitivity and specificity for a given confusion matrix.

3. Which system should we use to rule out the presence of COVID-19?

(a) System-A

(b) System-B

(d) Can’t be determined

Answer: [ b ]

Explanation: The reason being system B has more specificity than system A.

4. If N is the number of rows/instances in the training dataset, then what is the time complexity of the K- nearest neighbors algorithm run in Big-O notation?

(a) O(1)

(b) O( N )

(d) O( N² )

Answer: [ b ]

Explanation: K-Nearest neighbors need to compute distances of points to each of the N training instances. Hence, the classification run time complexity is O(N).

5. A company manager wants to predict the time before a break-down of its production machines. As a Machine Learning student, you are asked to solve the problem. How will you formulate it?

(a) as a classification problem statement

(b) as a regression problem statement

(d) as an association rule-based problem statement

Answer: [ b ]

Explanation: For a regression problem statement our target column is numerical(continuous).

6. Which of the following statements are correct about the Regression line?

(a) The Regression line always goes through the mean of the data.

(b) The sum of the deviation of the values from their regression line is always zero.

(d) If regression lines coincide with each other, then there is no correlation.

Answer: [ a, b, c ]

Explanation: If regression lines coincide with each other, it shows perfect correlation i.e, r=1.

7. Which of the following options are incorrect about the Mahalanobis distance?

(a) It transforms the columns into correlated variables.

(b) It changes the values of the features so that the standard deviation becomes zero.

(d) It includes only variances in its formula while calculating the distance

Answer: [a, b, c, d ]

Explanation: Mahalanobis distance takes Covariance into account while calculating distances.

8. Choose the correct options for Random Variables X₁ and X₂:

(a) If Cov(X₁, X₂)=0, then the random variables X₁ and X₂are independent

(b) If random variables X₁ and X₂ are independent, then Cov(X₁, X₂)=0

(d) If Cov(X₁, X₂)=0, then Corr(X₁, X₂)=0

Answer: [ b, c, d ]

Explanation: Independence implies zero covariance but zero covariance not necessarily implies Independence.

9. Which of the following statements are TRUE?

(a) Supervised learning does not require target attributes while unsupervised learning requires it.

(b) In a supermarket, categorization of the items to be placed in aisles and on the shelves can be an application of unsupervised learning.

(d) Decision trees can also be used to do clustering tasks.

Answer: [ b, d ]

Explanation: Unsupervised machine learning does not require target attributes while supervised machine learning requires it.

10. The algorithm which can only be used when the training data are linearly separable is:

(a) Linear hard-margin SVM

(b) Linear Logistic Regression

(d) The centroid method

Answer: [ a ]

Explanation: Hard margin SVM can work only when data is completely linearly separable without any errors(outliers/noise). In hard margin SVM, we have very strict constraints to correctly classify data points.

11. Which of the following statements are correct about the Backpropagation Algorithm?

(a) It is also known as the Generalized delta rule

(b) In Backpropagation, error in output is propagated backward only to determine weight updates

(d)It is an algorithm for unsupervised learning of artificial neural networks

Answer: [ a, b, c ]

Explanation: Backpropagation algorithm is used for supervised learning of artificial neural networks

12. Integer Answer Type Question:

How many of the following statements are incorrect about the K-Means Clustering algorithm?

(a) In presence of possible outliers in the data, one should not go for ‘complete link’ distance measures during the clustering tasks

(b) Two different runs of k-means clustering algorithms always result in the same clusters

(d) In k-means clustering, the number of centroids change during the algorithm run

(e) It tries to maximize the within class-variance for a given number of clusters.

(f) It converges to the global optimum if and only if the initial means(initialization) is chosen as some of the samples themselves.

(g) It requires the dimension of the feature space to be no bigger than the number of samples.

Answer: [ {b, c, d, e, f, g} – 6 ]

Explanation: The objective of the K-Means clustering algorithm is to minimize total intra-cluster variance(inside clusters). Within-cluster-variance is simple to understand the measure of compactness(compact partitioning).

13. Which of the following statements are correct about the characteristics of Hierarchical clustering?

(a) It is a Merging approach

(b) Measuring distance between two clusters

(d) It is a semi-unsupervised clustering algorithm

Answer: [ a, b ]

Explanation: Divisive hierarchical clustering works in a top-down approach.

14. Which of the following statements are correct about Bayesian Classification?

(a) Decision boundary in Bayesian classification depends on evidence

(b) Decision boundary in Bayesian classification depends on priors

Answer: [ b, c ]

Explanation: Decision boundary in Bayesian classification doesn’t depend on evidence.

15. Integer Answer Type Question:

How many of the following statements are incorrect about neural networks?

(a) An activation function must be monotonic in neural networks

(b) The logistic function is a monotonically increasing function

(d) They can only be trained with stochastic gradient descent

(e) Optimize a convex objective function.

(f) Can use a mix of different activation functions

Answer: [ {a, c, d, e} – 4 ]

Explanation: Neural networks can use a mix of different activation functions like sigmoid, tanh, and RELU functions.

16. The capacity of a neural network model i.e. the ability of the network to model a complex function _____________ with the increase in drop out rate.

(a) Increases

(b) Decreases

(d) First decreases and then increases

Answer: [ b ]

Answer: The capacity of a neural network model decreases with the increase in the dropout rate.

For further reference, refer to the link

Integer Answer Type Question:

17. How many of the following options are TRUE about Support Vector Machines(SVMs)?

(a) Support vectors only have non-zero Lagrangian multipliers in the formulation of SVMs.

(b) SVMs linear discriminant function focuses on a dot product between the test point and the support vectors.

(d) Support vectors are the data points that are farthest from the decision boundary.

(e) The only training points necessary to compute f(x) in an SVM are support vectors.

Answer: [ {a, b, c, d, e} – 5 ]

Explanation: A support vector machine(SVM) performs classification by choosing the hyperplane that maximizes the margin between the two classes. The vectors that define the Hyperplane are the support vectors and they have non-zero Lagrangian multipliers.

True or False

18. The linear discriminant function(classifier) with the maximum margin in SVMs is the best since it is robust to outliers and has a strong generalization ability.

Answer: [ True ]

Explanation: A support vector machine tries to find the line that “best” separates two classes of points. By “best”, we mean the line that gives the largest margin between the two classes.

19. For the given Dendrogram, if you draw a horizontal line on the y-axis for y=0.50. What will be the number of clusters formed?

Dendrogram conceptual questions

(a) 1 (b) 3 (c) 4 (d) 7

Answer: [ c ]

Hint: Self Explanatory.

20. How do you handle missing values or corrupted data in a dataset for categorical variables?

(a) Drop missing rows or columns

(b) Replace missing value with the most frequent value

(d) All of the above

Answer: [ d ]

Hint: For reference, refer to the link

End Notes

Thanks for reading!

I hope you enjoyed the questions and were able to test your knowledge about Data Science.

If you liked this and want to know more, go visit my other articles on Data Science and Machine Learning by clicking on the Link

Please feel free to contact me on Linkedin, Email.

Something not mentioned or want to share your thoughts? Feel free to comment below And I’ll get back to you.

About the author

Chirag Goyal

Currently, I pursuing my Bachelor of Technology (B.Tech) in Computer Science and Engineering from the Indian Institute of Technology Jodhpur(IITJ). I am very enthusiastic about Machine learning, Deep Learning, and Artificial Intelligence.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Chirag Goyal

I am a B.Tech. student (Computer Science major) currently in the pre-final year of my undergrad. My interest lies in the field of Data Science and Machine Learning. I have been pursuing this interest and am eager to work more in these directions. I feel proud to share that I am one of the best students in my class who has a desire to learn many new things in my field.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Top 20 Conceptual Questions To Test Your Data Science Skills In 2025

Introduction

Let’s Get Started.

Question Context: 1- 3

Suppose we want to use an automatic classification system to differentiate between COVID-19 negative (Negative class) and Covid-19 positive(Positive class). We have evaluated two pattern classification systems and the data obtained is given below –

1. The number of False Positives(FP) and False Negatives(FN) for both systems respectively are:

3. Which system should we use to rule out the presence of COVID-19?

4. If N is the number of rows/instances in the training dataset, then what is the time complexity of the K- nearest neighbors algorithm run in Big-O notation?

5. A company manager wants to predict the time before a break-down of its production machines. As a Machine Learning student, you are asked to solve the problem. How will you formulate it?

6. Which of the following statements are correct about the Regression line?

7. Which of the following options are incorrect about the Mahalanobis distance?

8. Choose the correct options for Random Variables X1 and X2:

9. Which of the following statements are TRUE?

10. The algorithm which can only be used when the training data are linearly separable is:

11. Which of the following statements are correct about the Backpropagation Algorithm?

12. Integer Answer Type Question:

13. Which of the following statements are correct about the characteristics of Hierarchical clustering?

14. Which of the following statements are correct about Bayesian Classification?

15. Integer Answer Type Question:

16. The capacity of a neural network model i.e. the ability of the network to model a complex function _____________ with the increase in drop out rate.

Integer Answer Type Question:

17. How many of the following options are TRUE about Support Vector Machines(SVMs)?

18. The linear discriminant function(classifier) with the maximum margin in SVMs is the best since it is robust to outliers and has a strong generalization ability.

19. For the given Dendrogram, if you draw a horizontal line on the y-axis for y=0.50. What will be the number of clusters formed?

End Notes

About the author

Chirag Goyal

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

8. Choose the correct options for Random Variables X₁ and X₂: