The following dataset is a historic record of 14 houses that were sold in a small town in BC. The dataset is used to predict whether a new house in the same town will be sold in 10 days if listed with a specific price based on certain attributes. We are considering only four attributes (price, number of bedrooms, size, and distance to bus stop) just to simplify the calculations in this assignment but more attributes should be considered in real applications. House Price Number of Bedrooms Size (sqft) Distance to Bus-Stop House sold in 10 days? House 1 $300,000 1 3,500 sqft far No House 2 $300,000 1 3,500 sqft near No House 3 $250,000 1 3,500 sqft far Yes House 4 $350,000 2 3,500 sqft far Yes House 5 $350,000 3 5,000 sqft far Yes House 6 $350,000 3 5,000 sqft near No House 7 $250,000 3 5,000 sqft near Yes House 8 $300,000 2 3,500 sqft far No House 9 $300,000 3 5,000 sqft far Yes House 10 $350,000 2 5,000 sqft far Yes House 11 $300,000 2 5,000 sqft near Yes House 12 $250,000 2 3,500 sqft near Yes House 13 $250,000 1 5,000 sqft far Yes House 14 $350,000 2 3,500 sqft near No Build a decision tree to predict whether a new house listing in the same town will be sold in 10 days based on the given attributes. Use ID3 algorithm. To answer this question, you need to complete the following steps: Calculate the entropy of the whole dataset.  Identify the first attribute to split on such that the corresponding information gain is maximal. To do that, you need to calculate the information gain for each one of the four attributes. Draw the final tree.

icon
Related questions
Question

The following dataset is a historic record of 14 houses that were sold in a small town in BC. The dataset is used to predict whether a new house in the same town will be sold in 10 days if listed with a specific price based on certain attributes. We are considering only four attributes (price, number of bedrooms, size, and distance to bus stop) just to simplify the calculations in this assignment but more attributes should be considered in real applications.

House

Price

Number of Bedrooms

Size (sqft)

Distance to Bus-Stop

House sold in 10 days?

House 1

$300,000

1

3,500 sqft

far

No

House 2

$300,000

1

3,500 sqft

near

No

House 3

$250,000

1

3,500 sqft

far

Yes

House 4

$350,000

2

3,500 sqft

far

Yes

House 5

$350,000

3

5,000 sqft

far

Yes

House 6

$350,000

3

5,000 sqft

near

No

House 7

$250,000

3

5,000 sqft

near

Yes

House 8

$300,000

2

3,500 sqft

far

No

House 9

$300,000

3

5,000 sqft

far

Yes

House 10

$350,000

2

5,000 sqft

far

Yes

House 11

$300,000

2

5,000 sqft

near

Yes

House 12

$250,000

2

3,500 sqft

near

Yes

House 13

$250,000

1

5,000 sqft

far

Yes

House 14

$350,000

2

3,500 sqft

near

No

Build a decision tree to predict whether a new house listing in the same town will be sold in 10 days based on the given attributes. Use ID3 algorithm.

To answer this question, you need to complete the following steps:

  1. Calculate the entropy of the whole dataset. 

  2. Identify the first attribute to split on such that the corresponding information gain is maximal. To do that, you need to calculate the information gain for each one of the four attributes.

  3. Draw the final tree. 

Expert Solution
steps

Step by step

Solved in 9 steps with 16 images

Blurred answer
Knowledge Booster
Types of trees
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, ai-and-machine-learning and related others by exploring similar questions and additional content below.