Question

Can you give me a poste for Science Writing TOPIC: DECISION TREE Decision Tree Algorithm Pseudocode:-...

Can you give me a poste for Science Writing

TOPIC: DECISION TREE

Decision Tree Algorithm Pseudocode:-
1) Place the best attribute of the dataset at the root node of the tree.
2) Split the training set into subsets. Subsets should be make in such a way that each subset contains data with the same value for an attribute.
3) Repeat steps 1 and 2 on each subset until you find leaf nodes in all the branches of the tree.
Two features for using the selection of attribute:-
1) Information gain
2) gini index
In case of information gain, the more the gain value,it is suitable for selecting the attribute as the root node/internal node of the tree.
In case of the gini index. the less the gini value,it is suitable for selecting the attribute as the root node/internal node of the decision tree.

Examples:-

Solution:-

similarly for all the sub trees we calculate the gini index for each feature, 2) Example for ID3:-

similarly do calculations for all the features for the internal nodes,then we get final decision tree for this dataset

Formulas:-

0 0
Add a comment Improve this question Transcribed image text
Answer #1

From what I have understood you want a good example to explain decision tree, impurity algorithms. Here is an elaborative example.

Let us say we want to create a tree that uses chest pain, good blood circulation, and blocked artery status to predict whether or not a patient has heart disease. (Data is as shown in the table attached)
We have to decide which node will be at the top; in other words, we need to decide which node will become the root node. To do so, we have to calculate 'impurity.' Impurity is the state of actual results with false positives. To find contamination, we use the Gini index or Information Gain. Extending this example, we do it something like this.

Assumptions:
number of people with heart disease = x
number of people with no heart disease = y

Let us say that from our data we got the following results:

a) Making 'Chest Pain' as root:
if yes: x=105 and y=39
if no: x=34 and y=125
This means, out of all the people having chest pain, 105 have heart disease, whereas 39 do not. Also, out of all the people not having chest pain, 34 have heart disease, whereas 125 do not.

b) Similarly for making 'Good blood Circulation' as root:
if yes: x=37 and y=127
if no: x=100 and y=33

c) Making 'Blocked Arteries' as root:
if yes: x=92 and y=31
if no: x= 45 and y = 129


1) Gini Impurity:
Algorithm:

1) Calculate all of the Gini impurity scores.
2) If the node itself has the lowest score, then there is no point in separating the patients anymore, and it becomes a leave node.
3) If separating the data results in an improvement, then pick the separation with the lowest impurity value.

Formula:
GI = 1 - (probability of yes)2 - (probability of no)2

a) For chest pain:
For yes:
GI = 1 - (105/(105+39))2 - (39/(105+39))2
GI = 0.395
For no:
GI = 1 - (34/(34+125))2 - (125/(34+125))2
GI = 0.336

Total GI:
Note: In both side (yes and no) the number of patients is not equal. Thus, we take a weighted average.

TGI = ((Total of yes)/Total patients * GI of yes) + ((Total of no)/Total patients * GI of no)

TGI = (144/144+159)*0.395 + (159/144+159)*0.336
TGI = 0.364

b) Similarly, we calculate for good blood circulation:
TGI = 0.360

c) And for blocked arteries:
TGI = 0.381

Thus we find that for good blood circulation total Gini impurity is the least and therefore, we use it as the root node.

Note: Now the number of patients in each separated node is different, so the Gini impurity has to be calculated again for remaining features.

2) Information Gain.
Algorithm:

1) Calculate all of the gain scores.
2) If the node itself has the highest score, then there is no point in separating the patients anymore, and it becomes a leave node.
3) If separating the data results in an improvement, then pick the separation with the highest score value.

Formula:
(Base of the log is 2)

Entropy of class(Ce) = -(p/p+n) (log(p/p+n)) - (n/p+n) (log(n/p+n))
Information Gain of each attribute (IG) = -(p/p+n) (log(p/p+n)) - (n/p+n) (log(n/p+n))
Entropy of attribute (Ea) = Sum(Pi + Ni)/p+n (IG)
Gain = Ce - Ea

Ce = -139/(139+164) (log(139/139+164)) - 164/(139+164) (log(164/139+164))
Ce = 0.995 or Ce=1

a) For chest pain:
IG for yes:
IG = -105/(105+39) (log(105/105+39)) - 39/(105+39) (log(39/105+39))
IG = 0.842

IG for no:
IG = -34/(34+125) (log(34/(34+125))) - 125/(34+125) (log(125/34+125))
IG = 0.749

Ea = (105+39)/(303) * (0.842) + (125+34)/303 * 0.749
Ea = 0.794

Gain = Ce - Ea = 1 - 0.794
Gain = 0.206

Similarly, calculate for the other attributes and find the highest score.
In this case, the score of Blocked Arteries comes out to be highest, and thus, we make it the root node.

Note: Now the number of patients in each separated node is different, so the Gain has to be calculated again for remaining features.

Add a comment
Know the answer?
Add Answer to:
Can you give me a poste for Science Writing TOPIC: DECISION TREE Decision Tree Algorithm Pseudocode:-...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • 1. Decision trees As part of this question you will implement and compare the Information Gain,...

    1. Decision trees As part of this question you will implement and compare the Information Gain, Gini Index and CART evaluation measures for splits in decision tree construction.Let D= (x,y), D = n be a dataset with n samples. The entropy of the dataset is defined as H(D)= P(c|D)log2P(c|D), where P(CD) is the fraction of samples in class i. A split on an attribute of the form X, <c partitions the dataset into two subsets Dy and Dn based on...

  • C++ Binary Search Tree question. I heed help with the level 2 question please, as level...

    C++ Binary Search Tree question. I heed help with the level 2 question please, as level 1 is already completed. I will rate the answer a 100% thumbs up. I really appreciate the help!. Thank you! searching.cpp #include <getopt.h> #include <iostream> #include <sstream> #include <stdlib.h> #include <unistd.h> using namespace std; // global variable for tree operations // use to control tree maintenance operations enum Mode { simple, randomised, avl } mode; // tree type // returns size of tree //...

  • C++ Binary Search Tree question. I heed help with the level 2 question please, as level...

    C++ Binary Search Tree question. I heed help with the level 2 question please, as level 1 is already completed. I will rate the answer a 100% thumbs up. I really appreciate the help!. Thank you! searching.cpp #include <getopt.h> #include <iostream> #include <sstream> #include <stdlib.h> #include <unistd.h> using namespace std; // global variable for tree operations // use to control tree maintenance operations enum Mode { simple, randomised, avl } mode; // tree type // returns size of tree //...

  • Below is a example of a ID3 algorithm in Unity using C# im not sure how...

    Below is a example of a ID3 algorithm in Unity using C# im not sure how the ID3Example works in the whole thing can someone explain the whole thing in more detail please. i am trying to use it with this data set a txt file Alternates?:Bar?:Friday?:Hungry?:#Patrons:Price:Raining?:Reservations?:Type:EstWaitTime:WillWait? Yes:No:No:Yes:Some:$$$:No:Yes:French:0-10:True Yes:No:No:Yes:Full:$:No:No:Thai:30-60:False No:Yes:No:No:Some:$:No:No:Burger:0-10:True Yes:No:Yes:Yes:Full:$:Yes:No:Thai:10-30:True Yes:No:Yes:No:Full:$$$:No:Yes:French:>60:False No:Yes:No:Yes:Some:$$:Yes:Yes:Italian:0-10:True No:Yes:No:No:None:$:Yes:No:Burger:0-10:False No:No:No:Yes:Some:$$:Yes:Yes:Thai:0-10:True No:Yes:Yes:No:Full:$:Yes:No:Burger:>60:False Yes:Yes:Yes:Yes:Full:$$$:No:Yes:Italian:10-30:False No:No:No:No:None:$:No:No:Thai:0-10:False Yes:Yes:Yes:Yes:Full:$:No:No:Burger:30-60:True Learning to use decision trees We already learned the power and flexibility of decision trees for adding a decision-making component to...

  • using java to write,show me the output. please write some common. You CAN NOT use inbuild...

    using java to write,show me the output. please write some common. You CAN NOT use inbuild functions for Tree ADT operations. using code below to finsih public class Main {    public static void main(String[] args) {        BinaryTree tree = new BinaryTree(); tree.root = new Node(1); tree.root.left = new Node(2); tree.root.right = new Node(3); tree.root.left.left = new Node(4); tree.root.left.right = new Node(5); tree.root.right.left = new Node(6); tree.root.right.right = new Node(7); tree.root.left.left.left = new Node(8); tree.root.left.left .right= new Node(9);...

  • Summary You will write an application to build a tree structure called Trie for a dictionary...

    Summary You will write an application to build a tree structure called Trie for a dictionary of English words, and use the Trie to generate completion lists for string searches. Trie Structure A Trie is a general tree, in that each node can have any number of children. It is used to store a dictionary (list) of words that can be searched on, in a manner that allows for efficient generation of completion lists. The word list is originally stored...

  • I need this in the form of a decision tree Play now? Play later? You can become a millionaire! That's what the junk mail said. But then there was the fine print If you act before midnight tonight...

    I need this in the form of a decision tree Play now? Play later? You can become a millionaire! That's what the junk mail said. But then there was the fine print If you act before midnight tonight, then here are you chances: 0.15% that you receive $1,000,000; 50% that you get nothing, otherwise you must PAY $5000. But wait, there's more! If you don't win the million AND you don't have to pay on your first attempt then you...

  • hi all three questions are multiple choice can you please help wit answer thank you Which...

    hi all three questions are multiple choice can you please help wit answer thank you Which one of the following options is NOT a reason for massively-parallel computing to be used to improve scientific research outcomes? With massively-parallel computing, it is not possible to save time when performing a simulation. With massively-parallel computing, it becomes more feasible to explore parameter space. With massively-parallel computing, it is possible to perform simulations at much higher resolution. With massively-parallel computing, it is possible...

  • For this assignment, you will write a program to work with Huffman encoding. Huffman code is...

    For this assignment, you will write a program to work with Huffman encoding. Huffman code is an optimal prefix code, which means no code is the prefix of another code. Most of the code is included. You will need to extend the code to complete three additional methods. In particular, code to actually build the Huffman tree is provided. It uses a data file containing the frequency of occurrence of characters. You will write the following three methods in the...

  • please help with 3 question in detail on excel. thank you CASE STUDY The Sourcing Decision...

    please help with 3 question in detail on excel. thank you CASE STUDY The Sourcing Decision at Forever Young Forever Young is a retailer of trendy and low-cost apparel in the United States. The company divides the year into four sales seasons of about three months each supplier costs 55 yuan/unit (inclusive of all delivery and brings in new merchandise for each season. The company has historically outsourced production to China given the lower costs. Sourcing from the Chinese costs),...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT