Have you come across a dataset with hundreds of columns and wondered how to build a predictive model on it? Or have come across a situation where a lot of variables might be correlated? It is difficult to escape these situations while working on real-life problems. Thankfully, dimensionality reduction techniques come to our rescue here. Dimensionality Reduction MCQs is an important technique in artificial intelligence. It is a must-have skill set for any data scientist for data analysis. To test your knowledge of dimensionality reduction techniques, we have conducted this skill test. These questions include topics like Principal Component Analysis (PCA), t-SNE, and LDA.

Check out more challenging competitions coming up here

A total of 582 people participated in this skill test. The questions varied from theoretical to practical. If you missed taking the test, here is your opportunity for you to find out how many questions you could have answered correctly.

Below is the distribution of scores; this will help you evaluate your performance.

You can access your performance here. More than 180 people participated in the skill test and the highest score was 34. Here are a few statistics about the distribution.

**Overall distribution**

Mean Score: 19.52

Median Score: 20

Mode Score: 19

- Beginners Guide To Learn Dimension Reduction Techniques
- Practical Guide to Principal Component Analysis (PCA) in R & Python
- Comprehensive Guide on t-SNE algorithm with implementation in R & Python
- Support Vector Machine (SVM) and Principal Component Analysis Tutorial for Beginners

*Are you just getting started with Dimensionality Reduction Techniques? Do you want to learn how to use these techniques to work on real-life projects and improve the model performance? Presenting two comprehensive courses which cover all the important concepts like feature selection and dimensionality reduction-*

**You have to select the 100 most important features based on the relationship between input features and the target features. Do you think this is an example of dimensionality reduction?**

A. Yes

B. No

**Solution: (A)**

A. TRUE

B. FALSE

**Solution: (A)**

Explanation: LDA is an example of an unsupervised dimensionality reduction algorithm.

**Step 1:** **Using the above variables, I have created two more variables, namely E = A + 3 B and F = B + 5 C + D.**

A. True

B. False

**Solution: (A)**

Explanation: Yes, Because Step 1 could be used to represent the data in 2 lower dimensions.

A. Removing columns that have too many missing values

B. Removing columns that have high variance in data

C. Removing columns with dissimilar data trends

D. None of these

**Solution: (A)**

Explanation: If columns have too many missing values (say 99%), then we can remove such columns.

A. TRUE

B. FALSE

**Solution: (A)**

Explanation: Reducing the dimension of data will take less time to train a model.

A. t-SNE

B. PCA

C. LDA

D. None of these

**Solution: (D)**

Explanation: All of the algorithms are examples of dimensionality reduction algorithms.

A. TRUE

B. FALSE

**Solution: (A)**

Explanation: Sometimes it is very useful to plot the data in lower dimensions. We can take the first 2 principal components and then use visualization of the data using a scatter plot.

**PCA is an unsupervised method of optimization****It searches for the directions that data have the largest variance****Maximum number of principal components <= number of features****All principal components are orthogonal to each other**

A. 1 and 2

B. 1 and 3

C. 2 and 3

D. 1, 2 and 3

E. 1,2 and 4

F. All of the above

**Solution: (F)**

Explanation: All options are self-explanatory.

**And then use these PCA projections as our features. Which of the following statement is correct?**

A. Higher ‘k’ means more regularization

B. Higher ‘k’ means less regularization

C. Can’t say

**Solution: (B)**

Explanation: Higher k would lead to less smoothening as we would be able to preserve more characteristics in data, hence less regularization. By increasing regularization, we can avoid overfitting.

A. Dataset with 1 Million entries and 300 features

B. Dataset with 100000 entries and 310 features

C. Dataset with 10,000 entries and 8 features

D. Dataset with 10,000 entries and 200 features

**Solution: (C)**

Explanation: t-SNE has quadratic time and space complexity. Thus it is a very heavy algorithm in terms of system resource utilization.

A. It is asymmetric in nature.

B. It is symmetric in nature.

C. It is the same as the cost function for SNE.

**Solution: (B)**

Explanation: The cost function of SNE is asymmetric in nature. Which makes it difficult to converge using gradient descent. An asymmetric cost function is one of the major differences between SNE and t-SNE.

**Imagine you are dealing with text data. To represent the words, you are using word embedding (Word2vec). In word embedding, you will end up with 1000 dimensions. Now, you want to reduce the dimensionality of this high-dimensional data such that similar words should have a similar meaning in the nearest neighbor space. In such a case,**

A. t-SNE

B. PCA

C. LDA

D. None of these

**Solution: (A)**

Explanation: t-SNE stands for t-Distributed Stochastic Neighbor Embedding, which considers the nearest neighbors for reducing the data.

A. TRUE

B. FALSE

**Solution: (A)**

Explanation: t-SNE learns a non-parametric mapping, which means that it does not learn an explicit function that maps data from the input space to the map. For more information, refer to this link.

A. t-SNE is linear, whereas PCA is non-linear

B. t-SNE and PCA are both linear

C. t-SNE and PCA are both nonlinear

D. t-SNE is nonlinear, whereas PCA is linear

**Solution: (D)**

Explanation: Option D is correct. Read the explanation from this link

A. Number of dimensions

B. Smooth measure of the effective number of neighbors

C. Maximum number of iterations

D. All of the above

**Solution: (D)**

Explanation: All of the hyper-parameters in the option can be tuned.

A. When the data is huge (in size), t-SNE may fail to produce better results.

B. T-NSE always produces better results regardless of the size of the data

C. PCA always performs better than t-SNE for smaller-sized data.

D. None of these

**Solution: (A)**

Explanation: Option A is correct

**The similarity of datapoint Xi to datapoint Xj is the conditional probability p (j|i).****The similarity of datapoint Yi to datapoint Yj is the conditional probability q (j|i).**

**Which of the following must be true for a perfect representation of xi and xj in lower dimensional space?**

A. p (j|i) = 0 and q (j|i) = 1

B. p (j|i) < q (j|i)

C. p (j|i) = q (j|i)

D. p (j|i) > q (j|i)

**Solution: (C)**

Explanation: The conditional probabilities related to Bayes’ theorem for the similarity of two points must be equal because the similarity between the points must remain unchanged in both higher and lower dimensions for them to be perfect representations.

A. LDA aims to maximize the distance between classes and minimize the within-class distance.

B. LDA aims to minimize both distances between classes and the distance within the class.

C. LDA aims to minimize the distance between classes and maximize the distance within the class.

D. LDA aims to maximize both distances between classes and the distance within the class.

**Solution: (A)**

Explanation: Option A is correct.

A. If the discriminatory information is not in the mean but in the variance of the data

B. If the discriminatory information is in the mean but not in the variance of the data

C. If the discriminatory information is in the mean and variance of the data

D. None of these

**Solution: (A)**

Explanation: Option A is correct

**Both LDA and PCA are linear transformation techniques****LDA uses supervised learning, whereas PCA uses unsupervised learning****PCA maximizes the variance of the data, whereas LDA maximizes the separation between different classes**,

A. 1 and 2

B. 2 and 3

C. 1 and 3

D. Only 3

E. 1, 2, and 3

**Solution: (E)**

Explanation: All of the options are correct

A. PCA will perform outstandingly

B. PCA will perform badly

C. Can’t say

D. None of above

**Solution: (B)**

Explanation: When all eigenvectors are the same in such case you won’t be able to select the principal components because, in that case, all principal components are equal.

**A linear structure in the data****If the data lies on a curved surface and not on a flat surface****If variables are scaled in the same unit**

A. 1 and 2

B. 2 and 3

C. 1 and 3

D. 1, 2 and 3

**Solution: (C)**

Explanation: Option C is correct

**The features will still have interpretability****The features will lose interpretability****The features must carry all information present in the data****The features may not carry all information present in the data**

A. 1 and 3

B. 1 and 4

C. 2 and 3

D. 2 and 4

**Solution: (D)**

Explanation: When you get the features in lower dimensions, then you will lose some information of data most of the time, and you won’t be able to interpret the lower dimension data.

**Select the angle which will capture maximum variability along a single axis.**

A. ~ 0 degree

B. ~ 45 degree

C. ~ 60 degree

D. ~ 90 degree

**Solution: (B)**

Explanation: Option B has the largest possible variance in data.

**You need to initialize parameters in PCA****You don’t need to initialize parameters in PCA****PCA can be trapped in local minima problem****PCA can’t be trapped in local minima problem**

A. 1 and 3

B. 1 and 4

C. 2 and 3

D. 2 and 4

**Solution: (D)**

Explanation: PCA is a deterministic algorithm that doesn’t have parameters to initialize and doesn’t have a local minima problem like most machine learning algorithms.

**Question Context: 26**

The below snapshot shows the scatter plot of two features (X1 and X2) with the class information (Red, Blue). You can also see the direction of PCA and LDA.

A. Building a classification algorithm with PCA (A principal component in the direction of PCA)

B. Building a classification algorithm with LDA

C. Can’t say

D. None of these

**Solution: (B)**

Explanation: If our goal is to classify these points, PCA projection does only more harm than good—the majority of blue and red points would land overlapped on the first principal component. hence PCA would confuse the classifier.

**It can be used to effectively detect deformable objects.****It is invariant to affine transforms.****It can be used for lossy image compression.****It is not invariant to shadows.**

A. 1 and 2

B. 2 and 3

C. 3 and 4

D. 1 and 4

**Solution: (C)**

Explanation: Option C is correct

A. When data has zero median

B. When data has zero mean

C. Both are always the same

D. None of these

**Solution: (B)**

Explanation: When the data has a zero mean vector, otherwise, you have to center the data first before taking SVD.

**Question Context: 29 – 31**

Consider 3 data points in the 2-d space: (-1, -1), (0,0), (1,1).

**[ √ 2 /2, √ 2/ 2 ]****(1/ √ 3, 1/ √ 3)****([ -√ 2/ 2, √ 2/ 2 ])****(- 1/ √ 3, – 1/ √ 3)**

A. 1 and 2

B. 3 and 4

C. 1 and 3

D. 2 and 4

**Solution: (C)**

Explanation: The first principal component is v = [ √ 2 /2, √ 2/ 2 ] T (you shouldn’t really need to solve any SVD or eigenproblem to see this). Note that we should apply normalization to the principal component to have unit length. (The negation v = [− √ 2/ 2, − √ 2/ 2 ] T is also correct.)

**What are their coordinates in the 1-d subspace?**

A. (− √ 2 ), (0), (√ 2)

B. (√ 2 ), (0), (√ 2)

C. ( √ 2 ), (0), (-√ 2)

D. (-√ 2 ), (0), (-√ 2)

**Solution: (A)**

Explanation: The coordinates of three points after projection should be z1 = x T 1 v = [−1, −1][ √ 2/ 2, √ 2 /2 ] T = − √ 2, z2 = x T 2 v = 0, z3 = x T 3 v = √ 2.

**For the projected data, you just obtained projections ( (− √ 2 ), (0), (√ 2) ). We then represent them in the original 2-d space and consider them as the reconstruction of the original data points.**

A. 0%

B. 10%

C. 30%

D. 40%

**Solution: (A)**

Explanation: The reconstruction error is 0 since all three points are perfectly located in the direction of the first principal component. Or, you can actually calculate the reconstruction: z1 ·v.

xˆ1 = − √ 2·[ √ 2/ 2 , √ 2/2 ] T = [−1, −1]T

xˆ2 = 0*[0, 0]T = [0,0] xˆ3 = √ 2* [1, 1]T = [1,1]

which are exactly x1, x2, x3.

A. LD1

B. LD2

C. Both

D. None of these

**Solution: (A)**

Explanation: LD1 Is a good projection because it best separates the class.

**Question Context: 33**

PCA is a good technique to try because it is simple to understand and is commonly used to reduce the dimensionality of the data. Obtain the eigenvalues λ1 ≥ λ2 ≥ • • • ≥ λN and plot.

To see how f(M) increases with M and takes the maximum value 1 at M = D. We have two graphs given below:

A. Left

B. Right

C. Any of A and B

D. None of these

**Solution: (A)**

Explanation: PCA is good if f(M) asymptotes rapidly to 1. This happens if the first eigenvalues are big and the remainder is small. PCA is bad if all the eigenvalues are roughly equal. See examples of both cases in the figure.

A. LDA explicitly attempts to model the difference between the classes of data. On the other hand, PCA does not consider any difference in class.

B. Both attempt to model the difference between the classes of data.

C. PCA explicitly attempts to model the difference between the classes of data. LDA, on the other hand, does not consider any difference in class.

D. Both don’t attempt to model the difference between the classes of data.

**Solution: (A)**

Explanation: Options are self-explanatory.

**(0.5, 0.5, 0.5, 0.5) and (0.71, 0.71, 0, 0)****(0.5, 0.5, 0.5, 0.5) and (0, 0, -0.71, -0.71)****(0.5, 0.5, 0.5, 0.5) and (0.5, 0.5, -0.5, -0.5)****(0.5, 0.5, 0.5, 0.5) and (-0.5, -0.5, 0.5, 0.5)**

A. 1 and 2

B. 1 and 3

C. 2 and 4

D. 3 and 4

**Solution: (D)**

Explanation: The two loading vectors are not orthogonal for the first two choices.

**If the classes are well separated, the parameter estimates for logistic regression can be unstable.****If the sample size is small and the distribution of features is normal for each class. In such cases, linear discriminant analysis is more stable than logistic regression.**

A. 1

B. 2

C. 1 and 2

D. None of these

**Solution: (C)**

Explanation: Refer to this video

A. Vertical offset

B. Perpendicular offset

C. Both

D. None of these

**Solution: (B)**

Explanation: We always consider residuals as vertical offsets. Perpendicular offsets are useful in the case of PCA.

A. 20

B. 9

C. 21

D. 11

E. 10

**Solution: (B)**

Explanation: LDA produces, at most c − 1 discriminant vector. You may refer to this link for more information.

**Question Context: 39**

The given dataset consists of images of the “Hoover Tower” and some other towers. Now, you want to use PCA (Eigenface) and the nearest neighbor method to build a classifier that predicts whether a new image depicts a “Hoover tower” or not. The figure gives a sample of your input training dataset images.

**Align the towers in the same position in the image.****Scale or crop all images to the same size.**

A. 1

B. 2

C. 1 and 2

D. None of these

**Solution: (C)**

Explanation: Both statements are correct.

A. 7

B. 30

C. 40

D. Can’t Say

**Solution: (B)**

Explanation: We can see in the above figure that the number of components = 30 is giving highest variance with the lowest number of components. Hence option ‘B’ is the right answer.

A. Yes, we can use a type of neural network called autoencoder with an activation function for dimensionality reduction.

A. Principle Component Analysis, Linear Discriminant Analysis, and T-distributed Stochastic Neighbor Embedding are three examples of dimensionality reduction.

A. It can be used in data mining of big data so that we can easily use various learning techniques on the resultant data.

The Dimensionality Reduction MCQs skill test provided a comprehensive exploration of various techniques essential in artificial intelligence and data science. With 582 participants, the test covered theoretical concepts and practical applications, ranging from Principal Component Analysis (PCA) to Linear Discriminant Analysis (LDA) and t-SNE. The distribution of scores reflected varying levels of understanding among participants. Furthermore, the helpful resources provided offer avenues for further learning and skill enhancement. Whether you aimed to test your knowledge or deepen your understanding, this test served as a valuable tool. Keep exploring and practicing to excel in the dynamic field of data science. Visit our platform for more insightful content and challenging competitions across diverse domains.

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Hi, I think the answers and explanations of questions 10 and 11 are not in sync. Please revisit and correct.

Hi Pratima,Thanks for noticing!I change the explanation of question number 10 which was addressing some other issue. Answers for questions 10 and 11 are remain sameBest Regards, Ankit Gupta

Hi , could it be that in question 33 solution and explanation are contradicting or did I get it wrong?

Hi Marvin,Explanation is correct but solution was incorrectly marked. Thanks for noticingBest! Ankit Gupta

[…] 查看原文>>> […]

Answer for the Q.20 should be C because both LDA and PCA are unsupervised methods.