Machine Learning based Multiple choice questions
Upasana  September 10, 2019  4 min read  117,792 views

Which of the following is a widely used and effective machine learning algorithm based on the idea of bagging?

Decision Tree

Regression

Classification

Random Forest  answer


To find the minimum or the maximum of a function, we set the gradient to zero because:

The value of the gradient at extrema of a function is always zero  answer

Depends on the type of problem

Both A and B

None of the above


The most widely used metrics and tools to assess a classification model are:

Confusion matrix

Costsensitive accuracy

Area under the ROC curve

All of the above  answer


Which of the following is a good test dataset characteristic?

Large enough to yield meaningful results

Is representative of the dataset as a whole

Both A and B  answer

None of the above


Which of the following is a disadvantage of decision trees?

Factor analysis

Decision trees are robust to outliers

Decision trees are prone to be overfit  answer

None of the above


How do you handle missing or corrupted data in a dataset?

Drop missing rows or columns

Replace missing values with mean/median/mode

Assign a unique category to missing values

All of the above  answer


What is the purpose of performing crossvalidation?

To assess the predictive performance of the models

To judge how the trained model performs outside the sample on test data

Both A and B  answer


Why is second order differencing in time series needed?

To remove stationarity

To find the maxima or minima at the local point

Both A and B  answer

None of the above


When performing regression or classification, which of the following is the correct way to preprocess the data?

Normalize the data → PCA → training  answer

PCA → normalize PCA output → training

Normalize the data → PCA → normalize PCA output → training

None of the above


Which of the folllowing is an example of feature extraction?

Constructing bag of words vector from an email

Applying PCA projects to a large highdimensional data

Removing stopwords in a sentence

All of the above  answer


What is pca.components_ in Sklearn?

Set of all eigen vectors for the projection space  answer

Matrix of principal components

Result of the multiplication matrix

None of the above options


Which of the following is true about Naive Bayes ?

Assumes that all the features in a dataset are equally important

Assumes that all the features in a dataset are independent

Both A and B  answer

None of the above options


Which of the following statements about regularization is not correct?

Using too large a value of lambda can cause your hypothesis to underfit the data.

Using too large a value of lambda can cause your hypothesis to overfit the data.

Using a very large value of lambda cannot hurt the performance of your hypothesis.

None of the above  answer


How can you prevent a clustering algorithm from getting stuck in bad local optima?

Set the same seed value for each run

Use multiple random initializations  answer

Both A and B

None of the above


Which of the following techniques can be used for normalization in text mining?

Stemming

Lemmatization

Stop Word Removal

Both A and B  answer


In which of the following cases will Kmeans clustering fail to give good results? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes

1 and 2

2 and 3

1, 2, and 3  answer

1 and 3


Which of the following is a reasonable way to select the number of principal components "k"?

Choose k to be the smallest value so that at least 99% of the varinace is retained.  answer

Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).

Choose k to be the largest value so that 99% of the variance is retained.

Use the elbow method


You run gradient descent for 15 iterations with a=0.3 and compute J(theta) after each iteration. You find that the value of J(Theta) decreases quickly and then levels off. Based on this, which of the following conclusions seems most plausible?

Rather than using the current value of a, use a larger value of a (say a=1.0)

Rather than using the current value of a, use a smaller value of a (say a=0.1)

a=0.3 is an effective choice of learning rate  answer

None of the above


What is a sentence parser typically used for?

It is used to parse sentences to check if they are utf8 compliant.

It is used to parse sentences to derive their most likely syntax tree structures.  answer

It is used to parse sentences to assign POS tags to all tokens.

It is used to check if sentences can be parsed into meaningful tokens.


Suppose you have trained a logistic regression classifier and it outputs a new example x with a prediction ho(x) = 0.2. This means

Our estimate for P(y=1  x)

Our estimate for P(y=0  x)  answer

Our estimate for P(y=1  x)

Our estimate for P(y=0  x)

Top articles in this category:
 Top 100 interview questions on Data Science & Machine Learning
 Configure Logging in gunicorn based application in docker container
 Flask Interview Questions
 Google Data Scientist interview questions with answers
 Introduction to regression, correlation, multi collinearity and 99th percentile
 Deploying Keras Model in Production using Flask
 Machine Learning: Understanding Logistic Regression