Question

This question is for frequent pattern mining algorithm Apriori and closed pattern mining algorithm like CLOSET....

This question is for frequent pattern mining algorithm Apriori and closed pattern mining algorithm like CLOSET. Implement Apriori algorithm to mine frequent pattern from a transaction dataset Implement an algorithm to mine closed frequent pattern from the same dataset. You can either write a code to extract closed patterns from the result that you got in Part 1 or code CLOSET. Input Format The input dataset is a transaction dataset. The first line of the input corresponds to the minimum support. Each following line of the input corresponds to one transaction. Items in each transaction are seperated by a space. Please refer to the sample input below. In sample input 0, the minimum support is 2, and the dataset contains 3 transactions and 5 item types (A, B, C, D and E). Constraints NA Output Format The output are the frequent patterns you mined out from the input dataset. Each line in the output should be of the format : Support [frequent pattern] Frequent patterns should be listed in a descending order based on support. e.g. 3 [C] is listed before 2 [A]. Ties should be resolved based on lexicographical order. e.g. 2 [A] is listed before 2 [A C] Items within each pattern should be listed in lexicographical order as well seperated by a single space. e.g. 2 [B C D] First print the frequent patterns and then closed pattern. Seperate the output for two parts by an empty line. In sample output 0, first 9 lines correspond to frequent patterns and last 3 lines correspond to closed pattern. Sample Input 0 2 B A C E D A C C B D Sample Output 0 3 [C] 2 [A] 2 [A C] 2 [B] 2 [B C] 2 [B C D] 2 [B D] 2 [C D] 2 [D] 3 [C] 2 [A C] 2 [B C D] Sample Input 1 2 data mining frequent pattern mining mining frequent patterns from the transaction dataset closed and maximal pattern mining Sample Output 1 4 [mining] 2 [frequent] 2 [frequent mining] 2 [mining pattern] 2 [pattern] 4 [mining] 2 [frequent mining] 2 [mining pattern] Note: The solution is expected in python 2.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

import sys
import itertools
my_list = []
for line in sys.stdin:
    data = line.strip().split(' ')
    my_list.append(data)
items = []
min_support = int(my_list[0][0])
del my_list[0]
# single frequent items
for i in range(len(my_list)):
    for j in range(len(my_list[i])):
        items.append(my_list[i][j])
items = sorted(list(set(items)))
item_count = {}
for key in items:
    for i in range(len(my_list)):
        if key in my_list[i]:
            if key in item_count:
                item_count[key] += 1
            else:
                item_count[key] = 1
shortlisted_items = []
useless_items = []
for key in item_count:
    if item_count[key] >= min_support:
        shortlisted_items.append(key)
    else:
        useless_items.append(key)
shortlisted_items.sort()
for key in useless_items:
    del item_count[key]
# single frequent items end
# generating all possible combinations
master_list = []
for j in range(0, len(my_list)):
    temp_list = []
    for L in range(2, len(my_list[j]) + 1):
        subset = list(itertools.combinations(sorted(my_list[j]), L))
        temp_list.append(subset)
    master_list.append(temp_list)
super_final_list = []
for j in range(len(master_list)):
    for k in range(len(master_list[j])):
        temp_list = []
        for l in range(len(master_list[j][k])):
            letter = ''
            for m in range(len(master_list[j][k][l])):
                letter += str(master_list[j][k][l][m]) + ' '
            temp_list.append(letter.strip())
        super_final_list.append(sorted(set(temp_list)))
master_list = super_final_list[:]
del super_final_list

# generating all possible combinations end
def has_infrequent_subset(c, L):
    c = c.split(' ')
    if (len(c) == 2): # c[i] in L[0] for i in range(len(c))):
        for i in range(len(c)):
            if str(c[i]) not in L:
                return True
        return False
    else:
        subset = list(itertools.combinations(sorted(c), len(c) - 1))
        temp_list = []
        for m in range(len(subset)):
            letter = ''
            for j in range(len(subset[m])):
                letter += str(subset[m][j]) + ' '
            temp_list.append(letter.strip())
        for s in temp_list:
            if s not in L:
                return True
        return False

def apriori_gen(L):
    Ck = []
    for i in range(len(L)):
        for j in range(i + 1, len(L)):
            if len(L[i].split(' ')) == 1: # len(list(L[i]))==1:
                c = str(L[i]) + ' ' + str(L[j])

for item in L[i].split(' '):
                        c += item + ' '
                    c += L[j].split(' ')[-1]
                    if has_infrequent_subset(c, L):
                        continue
                    else:
                        Ck.append(c)
    return Ck

L = []
L.append(shortlisted_items)
C = []
for k in itertools.count(1, 1):
    if not L[k - 1]:
        break
    Ck = apriori_gen(L[k - 1])
    temp_dict = {}
    temp_list = []
    for c in Ck:
        for i in range(len(master_list)):
            for j in range(len(master_list[i])):
                if c == master_list[i][j]:
                    if c in temp_dict:
                        temp_dict[c] += 1
                    else:
                        temp_dict[c] = 1
    for key in temp_dict:
        if temp_dict[key] >= min_support:
            temp_list.append(key)
            item_count[key] = temp_dict[key]
    L.append(sorted(temp_list))
count_list = []
for key, value in item_count.items():
    count_list.append(value)
count_list = sorted(list(set(count_list)), reverse=True)
segregated_item_list = []
for j in count_list:
    temp_list = []
    for key, value in item_count.items():
        if value == j:
            temp_list.append(key)
    segregated_item_list.append(sorted(temp_list))
for i in segregated_item_list:
    for j in i:
        print(item_count[j], '[' + j.strip() + ']')
print()
# Closed Pattern Mining
closed_patterns = []
for key, value in item_count.items():
    closed_patterns.append(key.strip().split())
closed_patterns = sorted(closed_patterns, key=len, reverse=True)
i = 0
while i < len(closed_patterns):
    x = ''
    for v in closed_patterns[i]:
        x += v + ' '
    x = x.strip()
    for j in range(1, len(closed_patterns[i])):
        subset = list(itertools.combinations(sorted(closed_patterns[i]), j))
        temp_list = []
        for m in range(len(subset)):
            letter = ''
            for j in range(len(subset[m])):
                letter += str(subset[m][j]) + ' '
            temp_list.append(letter.strip())
        for item in temp_list:
            if item_count[item] == item_count[x]:
                if item.strip().split(' ') in closed_patterns:
                    closed_patterns.remove(item.strip().split(' '))
    i += 1
closed_patterns = sorted(closed_patterns)
final_closed_patterns = []
for itemset in closed_patterns:
    letter = ''
    for item in itemset:
        letter += item + ' '
    final_closed_patterns.append(letter.strip())
closed_patterns = final_closed_patterns[:]
del final_closed_patterns
for i in segregated_item_list:
    for items in i:

Add a comment
Know the answer?
Add Answer to:
This question is for frequent pattern mining algorithm Apriori and closed pattern mining algorithm like CLOSET....
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Question in Data mining : Apply Apriori algorithm on the grocery store example with support threshold...

    Question in Data mining : Apply Apriori algorithm on the grocery store example with support threshold s = 33.34% and confidence threshold c = 60%, where H, B, K, C and P are different items purchased by customers. Show all final frequent itemsets. Specify the association rules that are generated. Show final association rules sorted by confidence. Represent the transactions as graph.

  • 2. The Apriori algorithm makes use of prior knowledge of subset support properties. (a) Prove that...

    2. The Apriori algorithm makes use of prior knowledge of subset support properties. (a) Prove that all nonempty subsets of a frequent itemset must also be frequent. (b) Prove that the support of any nonempty subset s′ of itemset s must be at least as great as the support of s. (c) Given frequent itemset l and subset s of l, prove that the confidence of the rule “s′ ⇒(l−s′)” cannot be more than the confidence of“s⇒(l−s),” where s′ is...

  • 1. Exercise 3.7 of the textbook. An algorithm prints the following pattern: * * * *...

    1. Exercise 3.7 of the textbook. An algorithm prints the following pattern: * * * * * * * * * * * * * * * A. What are the basic operations performed by the algorithm that you would count towards its running time? B. Count the number of these basic operations for the specific output shown above. C. The number of lines printed in the preceding pattern is 5. Assume that the algorithm can extend this pattern for...

  • Here is a recursive algorithm that answers the same question as posed on Group HW3, finding...

    Here is a recursive algorithm that answers the same question as posed on Group HW3, finding the number of people who are taller than everyone before them in line. NumCanSeeRec(a1,... , an : list of n 2 1 distinct heights) (a) ifn -1 then (b return 1 (c) c= ŅumCanSeeRee(a1, , an-1) d) for i:- 1 ton- 1 (e) if a, an then return c (g) return c+1 Answer the following questions about this algorithm. Please show your work. (a)...

  • Please help me with this C++ I would like to create that uses a minimum spanning tree algorithm in C++. I would like the program to graph the edges with weights that are entered and will display the r...

    Please help me with this C++ I would like to create that uses a minimum spanning tree algorithm in C++. I would like the program to graph the edges with weights that are entered and will display the results. The contribution of each line will speak to an undirected edge of an associated weighted chart. The edge will comprise of two unequal non-negative whole numbers in the range 0 to 99 speaking to diagram vertices that the edge interfaces. Each...

  • please write psedocodes for all of the questions and an algorithm for 2. no coding is...

    please write psedocodes for all of the questions and an algorithm for 2. no coding is required . FIUJELI 95 PIOL 1. (Geometry: Area of a Pentagon) Write a C# program that prompts the user to enter the length from the center of a pentagon to a vertex and computes the area of the pentagon, as shown in the following figure. The formula for computing the area of a pentagon is Area = 2 , where s is the length...

  • 2. If n points are connected to form a closed polygon as shown below, the area...

    2. If n points are connected to form a closed polygon as shown below, the area A of the polygon can be computed as n-2 Notice that although the illustrated polygon has only six distinct corners, n for this polygon is 7 because the algorithm expects that the last point, (es yo), wil be a repeat of the initial point, (ox Yo) (x0,y0) = (xi.y1) x2,y2) (x3.V3) (x5-%) (x4 Y4) a. Name your program file lab9p4 b. Do not write...

  • a. You have 5 problems in this assignment. b. G++ compiler will be used to compile...

    a. You have 5 problems in this assignment. b. G++ compiler will be used to compile your source codes. c. Your program will be tested on Ubuntu 16.04. d. You are not allowed to use global variables in your implementation. e. Your program will get two arguments, input and output file names, from the command line: >> Your_Executable INPUT_FILE_NAME OUTPUT_FILE_NAME 1. Given a number ? , we initially have ?+1 different sets which are {0}, {1}, {2}, ... , {?}....

  • QUESTION 7 Which of the following is a valid C++ assignment statement? (assume each letter is...

    QUESTION 7 Which of the following is a valid C++ assignment statement? (assume each letter is a different variable) A.y=b-c B.y +z = x C.x = a bi D.x = -(y*z): Ex = (x + (y z): QUESTION 8 Which of the following is a valid variable name according to C++ naming rules? A 2ndName B.%Last_Name C@Month D#55 Eyear03 QUESTION 9 Which library must be included to enable keyboard input? A kbdin B. cstdlib C input Diostream E lomanip QUESTION...

  • Program Language: PYTHON Consider the following file structure: each line of the file contains a word....

    Program Language: PYTHON Consider the following file structure: each line of the file contains a word. The words are in sorted order. For example, a file might look like this apple apple apple apple banana bargain brick brick sample sample simple text text text Write a program that asks the user for a filename with this structure. The program's job is to write the sequence of words to another file, without any duplicates. Name the output file  result.txt. Each word is...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT