It’s time to put what you’ve learned to the test, get 8 questions right to pass this week.
Q1.
How do you calculate the interquartile range value?
Choose the correct answer.
A
Q3-Q1
B
Q4-Q2
C
Q2-Q1
D
Q4-Q1
Q2.
In which task of the Modeling phase of the cross-industry standard process for data mining (CRISP-DM) methodology do you describe the plan for training, testing, and evaluating the models?
Choose the correct answer.
A
Select modeling technique
B
Generate test design
C
Build model
D
Assess model
Q3.
Which types of analysis can be heavily biased by outliers?
There are 3 correct answers.
A
Linear regression
B
Simple mean
C
Correlation
D
Median
E
Trimmed mean
Q4.
What would be the confidence of the rule {Product A} => {Product B}, if the total number of baskets analyzed is 300, when 100 customers purchased Product A, 60 customers purchases product B, and 20 customers purchased both product A and B?
Choose the correct answer.
A
20.00%
B
166.67%
C
60.00%
D
6.67%
Q5.
Which statements regarding association rules are true?
There are 2 correct answers.
A
The lift of a rule is symmetric.
B
The support of a rule is symmetric.
C
The confidence of a rule is symmetric.
D
Any rule with a lift greater than 1 does not indicate a real cross-selling opportunity.
Q6.
What does a Silhouette value of -1 indicate?
Choose the correct answer.
A
There is an error, as Silhouette is measured between 0 and +1.
B
All records are located on the cluster centers of some other cluster.
C
On average, the records are equidistant between their own cluster center and the next nearest cluster center.
D
All records are located directly on their cluster centers.
Q7.
Which parameter settings can influence a cluster model?
There are 3 correct answers.
A
The number of clusters to be created
B
The choice of distance measure
C
The support
D
The lift
E
The initial choice of cluster centers
Q8.
What do you try to achieve when building a robust model?
Choose the correct answer.
A
No training error with high-test error
B
High training error with low-test error
C
Low training error with low-test error
D
Low training error with high-test error
Q9.
What does a coefficient of determination (R2) value of 0.4 indicate?
Choose the correct answer.
A
40 percent of the variance in Y (the target) is predictable from X (the explanatory variables).
B
60 percent of the variance in Y (the target) is predictable from X (the explanatory variables).
C
60 percent of the variance in X (the explanatory variables) is predictable from Y (the target).
D
40 percent of the variance in X (the explanatory variables) is predictable from Y (the target).
Q10.
Which outputs are generated in the Modeling phase of the cross-industry standard process for data mining (CRISP-DM) methodology?