Question

#A common problem in academic settings is plagiarism #detection. Fortunately, software can make this pretty easy!...

#A common problem in academic settings is plagiarism
#detection. Fortunately, software can make this pretty easy!
#
#In this problem, you'll be given two files with text in
#them. Write a function called check_plagiarism with two
#parameters, each representing a filename. The function
#should find if there are any instances of 5 or more
#consecutive words appearing in both files. If there are,
#return the longest such string of words (in terms of number
#of words, not length of the string). If there are not,
#return the boolean False.
#
#For simplicity, the files will be lower-case text and spaces
#only: there will be no punctuation, upper-case text, or
#line breaks.
#
#We've given you three files to experiment with. file_1.txt
#and file_2.txt share a series of 5 words: we would expect
#check_plagiarism("file_1.txt", "file_2.txt") to return the
#string "if i go crazy then". file_1.txt and file_3.txt
#share two series of 5 words, and one series of 11 words:
#we would expect check_plagiarism("file_1.txt", "file_3.txt")
#to return the string "i left my body lying somewhere in the
#sands of time". file_2.txt and file_3.txt do not share any
#text, so we would expect check_plagiarism("file_2.txt",
#"file_3.txt") to return the boolean False.
#
#Be careful: there are a lot of ways to do this problem, but
#some would be massively time- or memory-intensive. If you
#get a MemoryError, it means that your solution requires
#storing too much in memory for the code to ever run to
#completion. If you get a message that says "KILLED", it
#means your solution takes too long to run.


#Add your code here!


#Below are some lines of code that will test your function.
#You can change the value of the variable(s) to test your
#function with different inputs.
#
#If your function works correctly, this will originally
#print:
#if i go crazy then
#i left my body lying somewhere in the sands of time
#False
print(check_plagiarism("file_1.txt", "file_2.txt"))
print(check_plagiarism("file_1.txt", "file_3.txt"))
print(check_plagiarism("file_2.txt", "file_3.txt"))

0 0
Add a comment Improve this question Transcribed image text
Answer #1

def check_plagiarism(file1, file2):
    d1 = {}
    f1 = open(file1, "r")

    for line in f1:
        words = line.split()
        for i in range(0, len(words) - 5):
            if words[i] not in d1:
                d1[words[i]] = []
            d1[words[i]].append(words[i+1:i+5])
    f1.close()
    myList = []
    f2 = open(file2, "r")
    for line in f2:
        words = line.split()
        for i in range(0, len(words) - 5):
            if words[i] in d1:
                if words[i+1:i+5] in d1[words[i]]:
                    j = i + 5
                    k = i + 1
                    # print(words[k:j+1],d1[words[k]])
                    while(words[k+1:j+1] in d1[words[k]]):
                        k+=1
                        j+=1
                        if (j == len(words)):
                            break
                    if(len(words[i:j]) > len(myList)):
                        myList = words[i:j]
  
    f2.close()
    if len(myList) == 0:
        return False
    else:
        return " ".join(myList)

#Below are some lines of code that will test your function.
#You can change the value of the variable(s) to test your
#function with different inputs.
#
#If your function works correctly, this will originally
#print:
#if i go crazy then
#i left my body lying somewhere in the sands of time
#False
print(check_plagiarism("file_1.txt", "file_2.txt"))
print(check_plagiarism("file_1.txt", "file_3.txt"))
print(check_plagiarism("file_2.txt", "file_3.txt"))

Add a comment
Know the answer?
Add Answer to:
#A common problem in academic settings is plagiarism #detection. Fortunately, software can make this pretty easy!...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • #A common problem in academic settings is plagiarism #detection. Fortunately, software can make t...

    #A common problem in academic settings is plagiarism #detection. Fortunately, software can make this pretty easy! # #In this problem, you'll be given two files with text in #them. Write a function called check_plagiarism with two #parameters, each representing a filename. The function #should find if there are any instances of 5 or more #consecutive words appearing in both files. If there are, #return the longest such string of words (in terms of number #of words, not length of the...

  • This looks long but it is easy just having some trouble please help out thank you....

    This looks long but it is easy just having some trouble please help out thank you. Find and fix all syntax and semantic errors which prevent the program from compiling. Find and fix all logical errors which cause the program to crash and/or produce unexpected results. In addition to making sure the code compiles, we have one final method which we need to complete. There is a method in the code, printAll, which is responsible for printing out the entirety...

  • JAVA Primitive Editor (Please help, I am stuck on this assignment which is worth a lot...

    JAVA Primitive Editor (Please help, I am stuck on this assignment which is worth a lot of points. Make sure that the program works because I had someone answer this incorrectly!) The primary goal of the assignment is to develop a Java based primitive editor. We all know what an editor of a text file is. Notepad, Wordpad, TextWrangler, Pages, and Word are all text editors, where you can type text, correct the text in various places by moving the...

  • JAVA Primitive Editor The primary goal of the assignment is to develop a Java based primitive...

    JAVA Primitive Editor The primary goal of the assignment is to develop a Java based primitive editor. We all know what an editor of a text file is. Notepad, Wordpad, TextWrangler, Pages, and Word are all text editors, where you can type text, correct the text in various places by moving the cursor to the right place and making changes. The biggest advantage with these editors is that you can see the text and visually see the edits you are...

  • Hey guys I need help with this assignment. However it contains 7 sub-problems to solve but I figured only 4 of them can...

    Hey guys I need help with this assignment. However it contains 7 sub-problems to solve but I figured only 4 of them can be solved in one post so I posted the other on another question so please check them out as well :) Here is the questions in this assignment: Note: Two helper functions Some of the testing codes for the functions in this assignment makes use of the print_dict in_key_order (a dict) function which prints dictionary keyvalue pairs...

  • Instructions Good news! You have are nearly finished with the semester! Today’s problem is going to...

    Instructions Good news! You have are nearly finished with the semester! Today’s problem is going to be to modify a program to match the actual check used to verify Social Insurance Numbers. Don’t worry, the first part of the solution has been taken care of for you. You just need to modify it to also check the bits in BOLD font. A valid SIN is calculated by multiply odd-positioned digits (1st, 3rd, 5th, 7th, & 9th) by 1 and even-positioned...

  • Write the following functions, as defined by the given IPO comments. Remember to include the IPO...

    Write the following functions, as defined by the given IPO comments. Remember to include the IPO comments in your submission. Be sure to test your code -- you can use the examples given for each function, but you should probably test additional examples as well -- and then comment-out or remove your testing code from your submission!. We will be testing your functions using a separate program, and if your own testing code runs when we load your file it...

  • Let’s build a dynamic string tokenizer! Start with the existing template and work on the areas...

    Let’s build a dynamic string tokenizer! Start with the existing template and work on the areas marked with TODO in the comments: Homework 8 Template.c Note: If you turn the template back into me without adding any original work you will receive a 0. By itself the template does nothing. You need to fill in the code to dynamically allocate an array of strings that are returned to the user. Remember: A string is an array. A tokenizer goes through...

  • Breaking it Down Problem One problem that I find in new Java programmers is a tendency...

    Breaking it Down Problem One problem that I find in new Java programmers is a tendency to put too much code in "main()". It is important to learn how to break a program down into methods and classes. To give you practice at this, I have a program that has too much code in main. Your job will be to restructure the code into appropriate methods to make it more readable. I will provide you with lots of coaching on...

  • IN C ONLY As mentioned earlier there are two changes we are going to make from...

    IN C ONLY As mentioned earlier there are two changes we are going to make from lab 5, The file you read into data structures can be any length. studentInfo array will be stored in another struct called studentList that will contain the Student pointer and current length of the list. Sometimes data can be used in structs that correlate between variables so it's convenient to store the data in the same struct. Instead of tracking a length variable all...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT