Top 30 MCQs to Ace Your Data Science Interviews
Data Science is getting popular day by day with the world using Artificial Intelligence and Machine Learning to solve various challenging and complex problems. It is one of the hottest fields that every person dreams of getting into.
According to a recent survey, there has been an increase in the number of opportunities related to Data Science during the COVID-19 pandemic. Ever wonder what it takes to ace the data science interview in top product-based companies and startups?
So, I have curated a list of 30 questions spanning from Probability and Statistics to Machine Learning and Deep Learning which I have faced during several data science interviews.
There are about 30 Multiple Choice questions which are broadly divided into three sections:
Probability, Statistics and Machine Learning Algorithms
So Let’s get started,
1. How do we perform Bayesian classification when some features are missing?
(A) We assuming the missing values as the mean of all values.
(B) We ignore the missing features.
(C) We integrate the posteriors probabilities over the missing features.
(D) Drop the features completely.
Explanation: Here we don’t use general methods of handling missing values instead we integrate the posterior probabilities over the missing features for better predictions.
2. Which of the following statement is False in the case of the KNN Algorithm?
(A) For a very large value of K, points from other classes may be included in the neighborhood.
(B) For the very small value of K, the algorithm is very sensitive to noise.
(C) KNN is used only for classification problem statements.
(D) KNN is a lazy learner.
Explanation: We can use KNN for both regression and classification problem statements. In classification, we use the majority class based on the value of K while in regression we take an average of all points and then give the predictions.
3. Which of the following statement is TRUE?
(A) Outliers should be identified and removed always from a dataset.
(B) Outliers can never be present in the testing dataset.
(C) Outliers is a data point that is significantly close to other data points.
(D) The nature of our business problem determines how outliers are used.
Explanation: The nature of a business problem often determines the use of outliers e.g, in case of problems where class imbalance condition exists like Credit Card Fraud detection, where the records for fraud class are very less with respect to no fraud class.
4. The following data
is used to apply a linear regression algorithm with least squares regression line Y=a1X. Then, the approximate value of a1 is given by:(X-Independent variable, Y-Dependent variable)
(A) 27.876 (B) 32.650 (C) 40.541 (D) 28.956
Explanation: Hint: Use the ordinary least square method.
5. The robotic arm will be able to paint every corner in the automotive parts while minimizing the quantity of paint wasted in the process. Which learning technique is used in this problem?
(A) Supervised Learning.
(B) Unsupervised Learning.
(C) Reinforcement Learning.
(D) Both (A) and (B).
Explanation: Here robot is learning from the environment, by taking the rewards for positive actions and penalties for negative actions.
6. Which one of the following statements is TRUE for a Decision Tree?
(A) Decision tree is only suitable for the classification problem statement.
(B) In a decision tree, the entropy of a node decreases as we go down a decision tree.
(C) In a decision tree, entropy determines purity.
(D) Decision tree can only be used for only numeric valued and continuous attributes.
Explanation: Entropy helps to determine the impurity of a node and as we go down the decision tree, entropy decreases.
7. How do you choose the right node while constructing a decision tree?
(A) An attribute having high entropy
(B) An attribute having high entropy and information gain
(C) An attribute having the lowest information gain.
(D) An attribute having the highest information gain.
Explanation: We select first those attributes which are having maximum information gain.
8. What kind of distance metric(s) are suitable for categorical variables to find the closest neighbors?
(A) Euclidean distance.
(B) Manhattan distance.
(C) Minkowski distance.
(D) Hamming distance.
Explanation: Hamming distance is a metric for comparing two binary data strings i.e, suitable for categorical variables.
9. In the Naive Bayes algorithm, suppose that prior for class w1 is greater than class w2, would the decision boundary shift towards the region R1(region for deciding w1) or towards region R2(region for deciding w2)?
(A) towards region R1.
(B) towards region R2.
(C) No shift in decision boundary.
(D) It depends on the exact value of priors.
Explanation: Upon shifting the decision boundary towards region R2, we preserve the prior probabilities proportion since prior for w1 is greater than w2.
10. Which of the following statements is FALSE about Ridge and Lasso Regression?
(A) These are types of regularization methods to solve the overfitting problem.
(B) Lasso Regression is a type of regularization method.
(C) Ridge regression shrinks the coefficient to a lower value.
(D) Ridge regression lowers some coefficients to a zero value.
Explanation: Ridge regression never drops any feature instead it shrinks the coefficients. However, Lasso regression drops some features by making the coefficient of that feature zero. Therefore, the latter one is used as a Feature Selection Technique.
11. Which of the following is FALSE about Correlation and Covariance?
(A) A zero correlation does not necessarily imply independence between variables.
(B) Correlation and covariance values are the same.
(C) The covariance and correlation are always the same sign.
(D) Correlation is the standardized version of Covariance.
Explanation: Correlation is defined as covariance divided by standard deviations and therefore, is the standardized version of covariance.
12. In Regression modeling we develop a mathematical equation that
describes how, (Predictor-Independent variable, Response-Dependent variable)
(A) one predictor and one or more response variables are related.
(B) several predictors and several response variables response are related.
(C) one response and one or more predictors are related.
(D) All of these are correct.
Explanation: In the regression problem statement, we have several independent variables but only one dependent variable.
13. True or False: In a naive Bayes algorithm, when an attribute value in the testing record has no example in the training set, then the entire posterior probability will be zero.
(A) True (B) False (C) Can’t determined (D) None of these.
Explanation: Since for a particular value in the attribute, the probability will be zero due to the absence of an example present in the training dataset. This usually leads to the problem of zero probability in the Naive Bayes algorithm. For further reference refer to the given article Link.
14. Which of the following is NOT True about Ensemble Techniques?
(A) Bagging decreases the variance of the classifier.
(B) Boosting helps to decrease the bias of the classifier.
(C) Bagging combines the predictions from different models and then finally gives the results.
(D) Bagging and Boosting are the only available ensemble techniques.
Explanation: Apart from bagging and boosting there are other various types of ensemble techniques such as Stacking, Extra trees classifier, Voting classifier, etc.
15. Which of the following statement is TRUE about the Bayes classifier?
(A) Bayes classifier works on the Bayes theorem of probability.
(B) Bayes classifier is an unsupervised learning algorithm.
(C) Bayes classifier is also known as maximum apriori classifier.
(D) It assumes the independence between the independent variables or features.
Explanation: Bayes classifier internally uses the concept of Bayes theorem for doing the predictions for unseen data points.
16. Which of the following SGD variants is based on both momentum and adaptive learning?
Explanation: Adam, being a popular deep learning optimizer is based on both momentum and adaptive learning.
17. Which of the following activation function output is zero centered?
(A) Hyperbolic Tangent.
(D) Rectified Linear unit(ReLU).
Explanation: Hyperbolic Tangent activation function gives output in the range [-1,1], which is symmetric about zero.
18. Which of the following is FALSE about Radial Basis Function Neural Network?
(A) It resembles Recurrent Neural Networks(RNNs) which have feedback loops.
(B) It uses radial basis function as activation function.
(C) While outputting, it considers the distance of a point with respect to the center.
(D) The output given by the Radial basis function is always an absolute value.
Explanation: Radial basis functions do not resemble RNN but are used as an artificial neural network, which takes a distance of all the points from the center rather than the weighted sum.
19. In which of the following situations, you should NOT prefer Keras over TensorFlow?
(A) When you want to quickly build a prototype using neural networks.
(B) When you want to implement simple neural networks in your initial learning phase.
(C) When you are doing critical and intensive research in any field.
(D) When you want to create simple tutorials for your students and friends.
Explanation: Keras is not preferred since it is built on the top of Tensorflow which provides both high-level and low-level APIs.
20. Which of the following is FALSE about Deep Learning and Machine Learning algorithms?
(A) Deep Learning algorithms work efficiently on a high amount of data.
(B) Feature Extraction needs to be done manually in both ML and DL algorithms.
(C) Deep Learning algorithms are best suited for unstructured data.
(D) Deep Learning algorithms require high computational power.
Explanation: Usually, in deep learning algorithms, feature extraction happens automatically in hidden layers.
21. Which of the following is FALSE for neural networks?
(A) Artificial neurons are similar in operation to biological neurons.
(B) Training time for a neural network depends on network size.
(C) Neural networks can be simulated on conventional computers.
(D) The basic unit of neural networks are neurons.
Explanation: Artificial neuron is not similar in working as compared to biological neuron since artificial neuron first takes a weighted sum of all inputs along with bias followed by applying an activation function to gives the final result whereas the working of biological neuron involves axon, synapses, etc.
22. Which of the following logic function cannot be implemented by a perceptron having 2 inputs?
(A) AND. (B) OR. (C) NOR. (D) XOR.
Explanation: Perceptron always gives a linear decision boundary, however for the Implementation of the XOR function we need a non-linear decision boundary.
23. Inappropriate selection of learning rate value in gradient descent gives rise to:
(A) Local Minima.
(C) Slow convergence.
(D) All of the above.
Explanation: Learning rate decides how much fast or slow our optimizer is able to achieve the global minimum so, by choosing an inappropriate value of learning rate we may not reach global minimum instead stuck at a local minimum, oscillate around the minimum and because of which the convergence times increases.
24. What will be the output of the following code?
import numpy as np n_array = np.array([1, 0, 2, 0, 3, 0, 0, 5, 6, 7, 5, 0, 8]) res = np.where(n_array == 0) print(res.sum( ))
(A) 25 (B) 26 (C) 6 (D) None of these
Explanation: where( ) function gives an array of indices where the value of the particular index is zero in n_array.
25. What will be the output of the following code?
import numpy as np p = [[1, 0], [0, 1]] q = [[1, 2], [3, 4]] result1 = np.cross(p, q) result2 = np.cross(q, p) print((result1==result2).shape)
(A) 0 (B) 1 (C) 2 (D) Code is not executable.
Explanation: Cross-product of two vectors are not commutative.
26. What will be the output of the following code?
import pandas as pd import numpy as np s = pd.Series(np.random.randn(2)) print(s.size)
(A) 0 (B) 1 (C) 2 (D)Answer not fixed due to randomness.
Explanation: random function returns samples from the “standard normal” distribution.
27. What will be the output of the following code?
import numpy as np student_id = np.array([1023, 5202, 6230, 1671, 1682, 5241, 4532]) i = np.argsort(student_id) print(i)
(A) 2 (B) 3 (C) 4 (D) 5
Explanation: argsort( ) function first sorts the array in ascending order and then gives the output as an index of those sorted array elements in the initial array.
28. What will be the output of the following code?
import pandas as pd import numpy as np s = pd.Series(np.random.randn(4)) print(s.ndim)
(A) 1 (B) 2 (C) 0 (D) 3
Explanation: ndim function returns the dimension.
29. What will be the output of the following code?
import numpy as np my_array = np.arange(6).reshape(2,3) result = np.trace(my_array) print(result)
(A) 2 (B) 4 (C) 6 (D) 8
Explanation: arange( ) function gives a 1-d array with values from 0 to 5 and reshape function resize our array to 2-d. Accordingly, trace gives the sum of diagonal elements of the result matrix.
30. What will be the output of the following code?
import numpy as np from numpy import linalg a = np.array([[1, 0], [1, 2]]) print(type(np.linalg.det(a)))
(A) INT (B) FLOAT (C) STR (D) BOOL.
Explanation: Final output represents the type of determinant value of the matrix formed.
Thanks for reading!
I hope you enjoyed the questions and were able to test your knowledge about machine learning and deep learning.
If you liked this and want to know more, go visit my other articles on Data Science and Machine Learning by clicking on the Link
Something not mentioned or want to share your thoughts? Feel free to comment below And I’ll get back to you.
Till then Stay Home, Stay Safe to prevent the spread of COVID-19, and Keep Learning!
About the author
Currently, I pursuing my Bachelor of Technology (B.Tech) in Computer Science and Engineering from the Indian Institute of Technology Jodhpur(IITJ). I am very enthusiastic about Machine learning, Deep Learning, and Artificial Intelligence.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.