# Machine Learning based Multiple choice questions

Carvia Tech | September 10, 2019 | 4 min read | 117,792 views

1. Which of the following is a widely used and effective machine learning algorithm based on the idea of bagging?

1. Decision Tree

2. Regression

3. Classification

2. To find the minimum or the maximum of a function, we set the gradient to zero because:

1. The value of the gradient at extrema of a function is always zero - answer

2. Depends on the type of problem

3. Both A and B

4. None of the above

3. The most widely used metrics and tools to assess a classification model are:

1. Confusion matrix

2. Cost-sensitive accuracy

3. Area under the ROC curve

4. All of the above - answer

4. Which of the following is a good test dataset characteristic?

1. Large enough to yield meaningful results

2. Is representative of the dataset as a whole

3. Both A and B - answer

4. None of the above

5. Which of the following is a disadvantage of decision trees?

1. Factor analysis

2. Decision trees are robust to outliers

3. Decision trees are prone to be overfit - answer

4. None of the above

6. How do you handle missing or corrupted data in a dataset?

1. Drop missing rows or columns

2. Replace missing values with mean/median/mode

3. Assign a unique category to missing values

4. All of the above - answer

7. What is the purpose of performing cross-validation?

1. To assess the predictive performance of the models

2. To judge how the trained model performs outside the sample on test data

3. Both A and B - answer

8. Why is second order differencing in time series needed?

1. To remove stationarity

2. To find the maxima or minima at the local point

3. Both A and B - answer

4. None of the above

9. When performing regression or classification, which of the following is the correct way to preprocess the data?

1. Normalize the data → PCA → training - answer

2. PCA → normalize PCA output → training

3. Normalize the data → PCA → normalize PCA output → training

4. None of the above

10. Which of the folllowing is an example of feature extraction?

1. Constructing bag of words vector from an email

2. Applying PCA projects to a large high-dimensional data

3. Removing stopwords in a sentence

4. All of the above - answer

11. What is pca.components_ in Sklearn?

1. Set of all eigen vectors for the projection space - answer

2. Matrix of principal components

3. Result of the multiplication matrix

4. None of the above options

12. Which of the following is true about Naive Bayes ?

1. Assumes that all the features in a dataset are equally important

2. Assumes that all the features in a dataset are independent

3. Both A and B - answer

4. None of the above options

13. Which of the following statements about regularization is not correct?

1. Using too large a value of lambda can cause your hypothesis to underfit the data.

2. Using too large a value of lambda can cause your hypothesis to overfit the data.

3. Using a very large value of lambda cannot hurt the performance of your hypothesis.

4. None of the above - answer

14. How can you prevent a clustering algorithm from getting stuck in bad local optima?

1. Set the same seed value for each run

2. Use multiple random initializations - answer

3. Both A and B

4. None of the above

15. Which of the following techniques can be used for normalization in text mining?

1. Stemming

2. Lemmatization

3. Stop Word Removal

4. Both A and B - answer

16. In which of the following cases will K-means clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes

1. 1 and 2

2. 2 and 3

3. 1, 2, and 3 - answer

4. 1 and 3

17. Which of the following is a reasonable way to select the number of principal components "k"?

1. Choose k to be the smallest value so that at least 99% of the varinace is retained. - answer

2. Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).

3. Choose k to be the largest value so that 99% of the variance is retained.

4. Use the elbow method

18. You run gradient descent for 15 iterations with a=0.3 and compute J(theta) after each iteration. You find that the value of J(Theta) decreases quickly and then levels off. Based on this, which of the following conclusions seems most plausible?

1. Rather than using the current value of a, use a larger value of a (say a=1.0)

2. Rather than using the current value of a, use a smaller value of a (say a=0.1)

3. a=0.3 is an effective choice of learning rate - answer

4. None of the above

19. What is a sentence parser typically used for?

1. It is used to parse sentences to check if they are utf-8 compliant.

2. It is used to parse sentences to derive their most likely syntax tree structures. - answer

3. It is used to parse sentences to assign POS tags to all tokens.

4. It is used to check if sentences can be parsed into meaningful tokens.

20. Suppose you have trained a logistic regression classifier and it outputs a new example x with a prediction ho(x) = 0.2. This means

1. Our estimate for P(y=1 | x)

2. Our estimate for P(y=0 | x) - answer

3. Our estimate for P(y=1 | x)

4. Our estimate for P(y=0 | x)

##### Top articles in this category:

###### Recommended books for interview preparation:
Book you may be interested in..
Book you may be interested in..