Can you give me a poste for Science Writing TOPIC: DECISION TREE Decision Tree Algorithm Pseudocode:-...

Question

Question

Can you give me a poste for Science Writing TOPIC: DECISION TREE Decision Tree Algorithm Pseudocode:-...

Can you give me a poste for Science Writing

TOPIC: DECISION TREE

Decision Tree Algorithm Pseudocode:-
1) Place the best attribute of the dataset at the root node of the tree.
2) Split the training set into subsets. Subsets should be make in such a way that each subset contains data with the same value for an attribute.
3) Repeat steps 1 and 2 on each subset until you find leaf nodes in all the branches of the tree.
Two features for using the selection of attribute:-
1) Information gain
2) gini index
In case of information gain, the more the gain value,it is suitable for selecting the attribute as the root node/internal node of the tree.
In case of the gini index. the less the gini value,it is suitable for selecting the attribute as the root node/internal node of the decision tree.

Examples:-

Solution:-

similarly for all the sub trees we calculate the gini index for each feature, 2) Example for ID3:-

similarly do calculations for all the features for the internal nodes,then we get final decision tree for this dataset

Formulas:-

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

From what I have understood you want a good example to explain decision tree, impurity algorithms. Here is an elaborative example.

Let us say we want to create a tree that uses chest pain, good blood circulation, and blocked artery status to predict whether or not a patient has heart disease. (Data is as shown in the table attached)
We have to decide which node will be at the top; in other words, we need to decide which node will become the root node. To do so, we have to calculate 'impurity.' Impurity is the state of actual results with false positives. To find contamination, we use the Gini index or Information Gain. Extending this example, we do it something like this.

Assumptions:
number of people with heart disease = x
number of people with no heart disease = y

Let us say that from our data we got the following results:

a) Making 'Chest Pain' as root:
if yes: x=105 and y=39
if no: x=34 and y=125
This means, out of all the people having chest pain, 105 have heart disease, whereas 39 do not. Also, out of all the people not having chest pain, 34 have heart disease, whereas 125 do not.

b) Similarly for making 'Good blood Circulation' as root:
if yes: x=37 and y=127
if no: x=100 and y=33

c) Making 'Blocked Arteries' as root:
if yes: x=92 and y=31
if no: x= 45 and y = 129

1) Gini Impurity:
Algorithm:
1) Calculate all of the Gini impurity scores.
2) If the node itself has the lowest score, then there is no point in separating the patients anymore, and it becomes a leave node.
3) If separating the data results in an improvement, then pick the separation with the lowest impurity value.

Formula:
GI = 1 - (probability of yes)² - (probability of no)²

a) For chest pain:
For yes:
GI = 1 - (105/(105+39))² - (39/(105+39))²
GI = 0.395
For no:
GI = 1 - (34/(34+125))² - (125/(34+125))²
GI = 0.336

Total GI:
Note: In both side (yes and no) the number of patients is not equal. Thus, we take a weighted average.

TGI = ((Total of yes)/Total patients * GI of yes) + ((Total of no)/Total patients * GI of no)

TGI = (144/144+159)*0.395 + (159/144+159)*0.336
TGI = 0.364

b) Similarly, we calculate for good blood circulation:
TGI = 0.360

c) And for blocked arteries:
TGI = 0.381

Thus we find that for good blood circulation total Gini impurity is the least and therefore, we use it as the root node.

Note: Now the number of patients in each separated node is different, so the Gini impurity has to be calculated again for remaining features.

2) Information Gain.
Algorithm:
1) Calculate all of the gain scores.
2) If the node itself has the highest score, then there is no point in separating the patients anymore, and it becomes a leave node.
3) If separating the data results in an improvement, then pick the separation with the highest score value.

Formula:
(Base of the log is 2)
Entropy of class(Ce) = -(p/p+n) (log(p/p+n)) - (n/p+n) (log(n/p+n))
Information Gain of each attribute (IG) = -(p/p+n) (log(p/p+n)) - (n/p+n) (log(n/p+n))
Entropy of attribute (Ea) = Sum(Pi + Ni)/p+n (IG)
Gain = Ce - Ea

Ce = -139/(139+164) (log(139/139+164)) - 164/(139+164) (log(164/139+164))
Ce = 0.995 or Ce=1

a) For chest pain:
IG for yes:
IG = -105/(105+39) (log(105/105+39)) - 39/(105+39) (log(39/105+39))
IG = 0.842

IG for no:
IG = -34/(34+125) (log(34/(34+125))) - 125/(34+125) (log(125/34+125))
IG = 0.749

Ea = (105+39)/(303) * (0.842) + (125+34)/303 * 0.749
Ea = 0.794

Gain = Ce - Ea = 1 - 0.794
Gain = 0.206

Similarly, calculate for the other attributes and find the highest score.
In this case, the score of Blocked Arteries comes out to be highest, and thus, we make it the root node.

Note: Now the number of patients in each separated node is different, so the Gain has to be calculated again for remaining features.

Add a comment

Answer 2

Can you give me a poste for Science Writing TOPIC: DECISION TREE Decision Tree Algorithm Pseudocode:-...

Homework Answers

Add Answer to:
Can you give me a poste for Science Writing TOPIC: DECISION TREE Decision Tree Algorithm Pseudocode:-...

Post as a guest

Earn Coins

1. Decision trees As part of this question you will implement and compare the Information Gain,...

C++ Binary Search Tree question. I heed help with the level 2 question please, as level...

C++ Binary Search Tree question. I heed help with the level 2 question please, as level...

Below is a example of a ID3 algorithm in Unity using C# im not sure how...

using java to write,show me the output. please write some common. You CAN NOT use inbuild...

Summary You will write an application to build a tree structure called Trie for a dictionary...

I need this in the form of a decision tree Play now? Play later? You can become a millionaire! That's what the junk mail said. But then there was the fine print If you act before midnight tonight...

hi all three questions are multiple choice can you please help wit answer thank you Which...

For this assignment, you will write a program to work with Huffman encoding. Huffman code is...

please help with 3 question in detail on excel. thank you CASE STUDY The Sourcing Decision...

Can you give me a poste for Science Writing TOPIC: DECISION TREE Decision Tree Algorithm Pseudocode:-...

Homework Answers

Add Answer to: Can you give me a poste for Science Writing TOPIC: DECISION TREE Decision Tree Algorithm Pseudocode:-...

Post as a guest

Earn Coins

Add Answer to:
Can you give me a poste for Science Writing TOPIC: DECISION TREE Decision Tree Algorithm Pseudocode:-...