Question

07. [Classification] Consider the following data set for a binary-class problem. [20] Customer ID Gender M Class CO CO M M M

5. Compute the Gini index for the Shirt Size attribute using multi-way split. 6. Which attribute is better, Gender, Car Type,

0 0
Add a comment Improve this question Transcribed image text
Answer #1

1.) Here different instances are

(M,Family,Small),(M,Sports,Medium), and so on. Every example besides (F,Luxury,Large) is unsplitted.

Gini value for (M,Family,Small) = ((1/1)²+(0/1)²)

Also its weight is (1/20)

Similarly calculating for other training examples, we get overall gini value as:

((1/20)*((1/1)²+(0/1)²)+(2/20)*((2/2)²+(0/2)²)+(1/20)*((1/1)²+(0/1)²)+(2/20)*((2/2)²+(0/2)²)+(2/20)*((2/2)²+(0/2)²)+(1/20)*((1/1)²+(0/1)²)+(2/20)*((1/2)²+(1/2)²)+(1/20)*((1/1)²+(0/1)²)+(1/20)*((1/1)²+(0/1)²)+(1/20)*((1/1)²+(0/1)²)+(1/20)*((1/1)²+(0/1)²)+(2/20)*((2/2)²+(0/2)²)+(3/20)*((3/3)²+(0/3)²))=19/20

Gini Index=1-(19/20)=1/20=0.05

2.) Here each Customer ID perfectly classifies, thus gini value for each ID is 1/20

=>overall gini value=1

gini Index=1-1=0

3.)For gender (10 M, 10 F)

a)For M : 6 belong to C0 and 4 to C1

gini value=((6/10)²+(4/10)²)=0.52

b)For F : 4 belong to C0 and 6 to C1

gini value=((4/10)²+(6/10)²)=0.52

gini index for gender = 1-((10/20)*0.52+(10/20)*0.52)=0.48

4.)For Car ( 4 Family, 8 Sports , 8 Luxury)

a)Family(C0:1, C1:3)

  gini value= ((1/4)²+(3/4)²)=10/16

b)Sports(C0:8, C1:0)

  gini value= ((8/8)²+(0/8)²)=1

c)Luxury(C0:1, C1:7)

  gini value= ((1/8)²+(7/8)²)=50/64

Gini index for car type= 1-((4/20)*(10/16)+(8/20)*(1)+(8/20)*(50/64))=1-((1/8)+(2/5)+(5/16))=13/80=0.1625

5.) For Shirt(5 Small, 7 Medium, 4 Large, 4 Extra Large)

a)Small(C0:3 ,C1:2)

  gini value= ((3/5)²+(2/5)²)=13/25

b)Medium(C0:3 ,C1:4)

  gini value= ((3/7)²+(4/7)²)=25/49

c)Large(C0:2 ,C1:2)

  gini value= ((2/4)²+(2/4)²)=1/2

d)Extra Large(C0:2 ,C1:2)

  gini value= ((2/4)²+(2/4)²)=1/2

Gini Index for shirt size=1-((5/20)*(13/25)+(7/20)*(25/49)+(4/20)*(1/2)+(4/20)*(1/2))=1-(0.13+0.17857+0.1+0.1)=1-0.50857=0.49143

6.)Since Gini index for Car Type is lowest, Car Type is the best attribute.

7.)Though Customer ID has lowest gini index it does not carry any information .It is a coincidental irregularity and poses the threat of leading to overfitting since each value of customer ID would map to a node in decision tree.Thus the selection of attributes with many uniformly distributed values should be discouraged.

Add a comment
Know the answer?
Add Answer to:
07. [Classification] Consider the following data set for a binary-class problem. [20] Customer ID Gender M...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • 1. Consider the training examples shown in the table below for a binary classification problem. (...

    1. Consider the training examples shown in the table below for a binary classification problem. (60 points) der Car Type Shirt Size Class CO Customer ID Gen Small Medium Medium Large Family Sports Sports Sports SportsExtra LargeCO SportsExtra Large CO Sports Sports Sports Luxury Family FamilyExtra Large C1 Family LuxuryExtra Large C1 Luxury Luxury Luxury Luxury Luxury Luxury Small Small Medium Large Large 10 C1 12 13 14 15 16 17 18 19 20 Medium C1 Small Small Medium Medium...

  • Consider the training examples shown above in Table 3.5 for a binary classification problem. (a) Compute...

    Consider the training examples shown above in Table 3.5 for a binary classification problem. (a) Compute the Gini index for the overall collection of training examples. (b) Compute the Gini index for the Customer ID attribute. (c) Compute the Gini index for the Gender attribute. Table 3.5. Data set for Exercise 2 Customer ID Gender Car Type Shirt Size Class amily Sports Sports Sports SportsExtra LargeC Sports Extra LargeC Sports Sports Sports Luxury Family Family Extra Large Cl Family LuxuryExtra...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT