Ghost Map

1. (1 point)Write a one sentence summary of how John Snow used his crude form of data mining to conclude the Broad Street well was the source of cholera.

Type of data and management scales

2.(8 points) Use the four data and management scales on the left to categorize the descriptions on the right.

a.nominal____customer service rating from 1 to 5

b.ordinal____gender: male or female

c.interval____today’s low temperature is 50F and today’s high is 75F

d.ratio____hair color such as black, brown, red

____he is 6 feet tall

____pain level from 1 to 10

____ average age in the course is 24.3

____ he ran the mile in exactly 4 minutes

Scatter diagram

3. (1 point) Using one sentence, explain the correlation between the number of beach visitors and the average daily temperature.

4. (3 points) Gini Index

Use the following table to calculate your answers to the three questions below.

a.What is the Gini Index for Home Owners?

b.What is the Gini Index for non-Home Owners?

c.Compute the weighted average for the Home Owner type.

5. (1 point) Bayes Theorem

Probability of a dangerous fire = 1%

Probability of smoke is common mainly due to barbeques = 10%

Probability of dangerous fires when there is smoke = 90%

Calculate the probability of a dangerous fire when there is smoke.

6. (6 points) Decision Trees

a.Examine the following dataset.If a datapoint with an x coordinate = 3 is added, what color would the datapoint be?

b.Given the following dataset, write rules for each color of datapoints.

1) green datapoints

2) red datapoints

3) blue datapoints

c.Calculate the Gini impurities for the following imperfect split.

1) Left =

2) Right =