|
Customer ID |
Transaction ID |
Items Bought |
|
1 |
0001 |
{a,d,e,f} |
|
1 |
0024 |
{a,b,c} |
|
2 |
0012 |
{b,d,e,f} |
|
2 |
0031 |
{a,c,e} |
|
3 |
0015 |
{b,d,f} |
|
3 |
0022 |
{a,b} |
|
4 |
0029 |
{a,b,c} |
|
4 |
0040 |
{a,b,d,e} |
|
5 |
0033 |
{e,b,d} |
|
5 |
0038 |
{f,c,e} |
ANSWER: Support can be found as the percentage of occurrence in the given table.
| Itemset | Transaction ID | Support |
| {e} | 0001, 0012, 0031, 0040, 0033, 0038 | 6/10 = .6 = 60% |
| {b, d} | 0012, 0015, 0040, 0033 | 4/10 = .4 = 40% |
| {b, d, e} | 0012, 0040, 0033 | 3/10 = .3 = 30% |
ANSWER: Confidence can be calculated as:
Confidence {b,d} −→ {e} = {b,d, e} count / {b,d} count
= 3 / 4 = .75 = 75%
Confidence {e} −→ {b,d} = {e, b,d} count / {e} count
= 3/6 = .5 = 50%
ANSWER: Based on the result above, confidence is NOT a symmetric measure, because in part B the confidence is different.
4. Repeat part (a) by treating each customer ID as a market basket. Each item should be treated as a binary variable (1 if an item appears in at least one transaction bought by the customer, and 0 otherwise.)
ANSWER: Draw the table according to as per below tab;e and recompute support
| Customer ID | a | b | c | d | e | f |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 1 | 1 | 1 | 1 | 1 | 1 |
| 3 | 1 | 1 | 0 | 1 | 0 | 1 |
| 4 | 1 | 1 | 1 | 1 | 1 | 0 |
| 5 | 0 | 1 | 1 | 1 | 1 | 1 |
| Itemset | Transaction ID | Support |
| {e} | 1, 2, 4, 5 | 4/5 = .8 = 80% |
| {b, d} | 1, 2, 3, 4, 5 | 5/5 = 1 = 100% |
| {b, d, e} | 1, 2, 4, 5 | 4/5 = .8 = 80% |
5. Use the results in part (d) to compute the confidence for the association rules {b, d} −→ {e} and {e} −→ {b, d}.
ANSWER: Confidence can be calculated as:
Confidence {b,d} −→ {e} = {b,d, e} count / {b,d} count
= 4 / 5 = .8 = 80%
Confidence {e} −→ {b,d} = {e, b,d} count / {e} count
= 4/4 = 1 = 100%
Consider the data set of market basket transactions shown in following table: Customer ID Transaction ID...
Table 1: Data set of market-basket transactions ansaction ID Items Bought [A, B, D, E (B, C, D (A, B, D, E) A, C, D, E) (B,C, D, E B, D, E (C, D) (A, B, C (A, D, E) 6 7 [15 points] Answer the following questions for the data set in Table 1. (a) What is the maximum number of association rules that can be extracted from this data set (including rules that have zero support)? (b) What...
I
need help with a data mining problem
Consider the following transaction dataset. T1: a, d, e T2: a, b, c, e T2: a, b, d, e T4: a, c, d, e T5: b, c, e T6: b, d, e T7:c, d T8: a, b, d a) Compute the support for itemsets {e}, {b, d}, and {b, d, e}. b) Compute the confidence for the association rules {b, d} rightarrow {e} and {e} rightarrow {b, d). c) Is confidence a...
Consider the transactional database shown in the following table. Transaction ID Items Bought T100 Plum, Apple, Peach, Orange, Pear, Banana T200 Cherry, Apple, Peach, Orange, Pear, Banana T300 Plum, Mango, Orange, Pear, Kiwi, Strawberry T400 Plum, Watermelon, Avocado, Orange, Banana T500 Avocado, Apple, Orange, Lemon, Pear CONDITION: The minimum support is 60% and minimum confidence is 70%. Based on the CONDITION above, answer the following five questions. (1) Find all frequent itemsets using the Apriori algorithm. Show how the algorithm...
(1)A database has five transactions (T100 to T500) as shown in the table below. Let min sup-3 and mi-conf-8090. TID T100 M, O, N, K, E, Y T200 D, O, N, K, E, Y ) T300{M, A, K, E) T400 M, U, C, K, Y) T500 | {C, О. О. К. 1 ,E) items bought Find all the frequent itemset晜using Apriori algorithm. You must show the contents of Ck and Lk tables in each step (please refer to your lecture...
The information in the table identifies a market basket purchased by the average urban consumer, and the prices of the goods in two different years. The base year is 2015. Item Market Basket Price 2015 Price 2016 Movie tickets 4 $10 $13 Bags of 2 $5 $8 popcorn Drinks of soda 4 $4 $6 Calculate the consumer price index for 2016, Enter a number rounded to two decimal places. Next Previous A wx] MacBook 80 DOO * & 7 %...