Question

    Find the top frequently used words in the book of “Sense and Sensibility”. The book...

    Find the top frequently used words in the book of “Sense and Sensibility”. The book is in the sense_andsensibility.txt file.

  1. The words should not be case sensitive, meaning “Mother” and “mother” are considered the same word.
  2. Replace all the punctuation marks with a space.
  3. Use the “stopwords.txt” file to remove all the stop words in text. (Do NOT modify the stopwords.txt file)
  4. Create a histogram similar to the “histogram.jpg” file. The diagram should contain the ranking, the top 30 words, the number of times they appeared in the book. The number of stars will be the number of appearance divided by 10. For example, “mother” appears 263 times; there are 26 stars displayed. (You may not have the exactly the same result as in the histogram.jpg)

Would like to know the answer in Python 3 code

0 0
Add a comment Improve this question Transcribed image text
Answer #1

#package to rmove stopwords
from nltk.corpus import stopwords

#package for plotting
import matplotlib.pyplot as plt

#package to sort dictionary by value
import operator

#package to remove punctuations
import re

#initializing filename
filename = "sense_andsensibility.txt";

#opening file in read mode
f = open(filename, "r")

#reading file
text = f.read()
             
#converting all characters to lowercase
text = text.lower()

#removing punctuation
text = re.sub(r'[^\w\s]',' ',text)

#removing digits
text = re.sub(r'[0-9]',' ',text)
text = re.sub(r'\_',' ',text)

#converting text(string) into words(list)
text = text.split()

#removing special symbols
text = [txt for txt in text if not txt in stopwords.words('english')]

#creating frequency dictionary
frequency = {}

#creating n gram
for i in range(len(text)):

    #if word is not in dictionary, then creating a new key-value pair
    if text[i] not in frequency.keys():
        # intilizing value to 1
        frequency[text[i]] = 1
    
    #if word is in dictionary, increments value by 1
    else:
        frequency[text[i]]+=1

#sorting dictionary in reverse order
sorted_x = sorted(frequency.items(), key=operator.itemgetter(1),reverse=True)
sorted_x = dict(sorted_x)

#if no. of words are more than 30
if(len(sorted_x)>30):
    size = len(sorted_x)-30

#if no. of words are lesser than 30
else:
    size = len(sorted_x)

#reducing dectinary size to 30
for i in range(size):
    sorted_x.popitem()

#Displays top 30 frequent words from console
'''
print("Most Frequent 30 Words: \n")
for x,val in sorted_x.items():
    stars = int(val/10)
    print(x+" : "+str(stars)+" stars")
'''

#getting keys, values in lists
key = list(sorted_x.keys())
val = list(sorted_x.values())

#Getting star value each frequent word
for i in range(len(val)):
    val[i] = int(val[i]/10)

#plotting a bar
plt.bar(key,val)

#setting vertical style for xlabel
plt.xticks(rotation='vertical')

#fixing bottom margin problem
plt.tight_layout()

#adding ylabel as star
plt.ylabel("Stars")

#displaying the plot
plt.show()

#Sample Output histogram

Add a comment
Answer #2

o achieve this task, you can follow these steps using Python 3 code:

  1. Read the contents of the "sense_andsensibility.txt" file.

  2. Convert the text to lowercase to make it case-insensitive.

  3. Remove punctuation marks from the text.

  4. Read the stopwords from the "stopwords.txt" file.

  5. Tokenize the text into words.

  6. Remove stopwords from the list of words.

  7. Count the frequency of each word.

  8. Sort the words based on their frequency in descending order.

  9. Generate the histogram with the top 30 words and their frequencies, represented by stars.

Here's a Python code to accomplish these steps:

pythonCopy codeimport string# Step 1: Read the contents of the "sense_andsensibility.txt" filewith open("sense_andsensibility.txt", "r", encoding="utf-8") as file:
    text = file.read()# Step 2: Convert the text to lowercasetext = text.lower()# Step 3: Remove punctuation markstranslator = str.maketrans("", "", string.punctuation)
text = text.translate(translator)# Step 4: Read the stopwords from the "stopwords.txt" filewith open("stopwords.txt", "r", encoding="utf-8") as file:
    stopwords = set(file.read().splitlines())# Step 5: Tokenize the text into wordswords = text.split()# Step 6: Remove stopwords from the list of wordsfiltered_words = [word for word in words if word not in stopwords]# Step 7: Count the frequency of each wordword_freq = {}for word in filtered_words:
    word_freq[word] = word_freq.get(word, 0) + 1# Step 8: Sort the words based on their frequency in descending ordersorted_words = sorted(word_freq.items(), key=lambda x: x[1], reverse=True)# Step 9: Generate the histogram with the top 30 words and their frequenciesprint("Rank\tWord\t\tFrequency\tHistogram")print("----------------------------------------------")for rank, (word, freq) in enumerate(sorted_words[:30], 1):
    stars = "*" * (freq // 10)    print(f"{rank}\t{word}\t\t{freq}\t\t{stars}")

This code will read the text from the provided files, process the data as described, and then print the histogram with the top 30 words and their frequencies, represented by stars. Note that the output may not exactly match the "histogram.jpg" file due to the random nature of the data and the counting of word occurrences.


answered by: mervetokaz
Add a comment
Know the answer?
Add Answer to:
    Find the top frequently used words in the book of “Sense and Sensibility”. The book...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Write a Python program stored in a file q6.py that takes a text (without punctuation) as...

    Write a Python program stored in a file q6.py that takes a text (without punctuation) as input and prints the number of occurrences of each word. Your program is case-insensitive, meaning that uppercase and lowercase of the same letter is considered the same. Printing should be done in decreasing order of the number of occurrences. If several words have the same number of occurrences, they should be printed in alphabetical order. A new line must be printed between the words...

  • Part 1. Write a program that takes a string as input, strips whitespace and punctuation from...

    Part 1. Write a program that takes a string as input, strips whitespace and punctuation from the words, and converts them to lowercase. Hint: The string module provides strings named whitespace, which contains space, tab, newline, etc., and punctuation which contains the punctuation characters. Let’s see if we can make Python swear: >>> import string >>> print string.punctuation !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ Also, you might consider using the string methods strip, replace and translate (Chapter 2 of "Introducing Python" is a great reference...

  • this is python do it in pycharm Programming Practice 8.3: Most Frequent Character 15 pts Not...

    this is python do it in pycharm Programming Practice 8.3: Most Frequent Character 15 pts Not Submitted Due Mar 15, 2020 at 11:59 PM Submission Types Website URL Grade O Out of 15 pts Points Submission & Rubric Description Lesson Objective(s): • Use the .upper function • Use loops with strings • Use lists with strings Lesson: Write a program that lets the user enter a string and displays the letter that appears most frequently in the string. Ignore spaces,...

  • In this assignment, you will explore more on text analysis and an elementary version of sentiment...

    In this assignment, you will explore more on text analysis and an elementary version of sentiment analysis. Sentiment analysis is the process of using a computer program to identify and categorise opinions in a piece of text in order to determine the writer’s attitude towards a particular topic (e.g., news, product, service etc.). The sentiment can be expressed as positive, negative or neutral. Create a Python file called a5.py that will perform text analysis on some text files. You can...

  • Implement the histogram function to complete the desired program. You must use dynamically allocated arrays for...

    Implement the histogram function to complete the desired program. You must use dynamically allocated arrays for this purpose. For your initial implementation, use ordered insertion to keep the words in order and ordered sequential search when looking for words. Note that the array utility functions from the lecture notes are available to you as art of the provided code. Although we are counting words in this program, the general pattern of counting occurrences of things is a common analysis step...

  • Create a Python script file called hw12.py. Add your name at the top as a comment,...

    Create a Python script file called hw12.py. Add your name at the top as a comment, along with the class name and date. Ex. 1. a. Texting Shortcuts When people are texting, they use shortcuts for faster typing. Consider the following list of shortcuts: For example, the sentence "see you before class" can be written as "c u b4 class". To encode a text using these shortcuts, we need to perform a replace of the text on the left with...

  • Python program This assignment requires you to write a single large program. I have broken it...

    Python program This assignment requires you to write a single large program. I have broken it into two parts below as a suggestion for how to approach writing the code. Please turn in one program file. Sentiment Analysis is a Big Data problem which seeks to determine the general attitude of a writer given some text they have written. For instance, we would like to have a program that could look at the text "The film was a breath of...

  • In this assignment you’ll implement a data structure called a trie, which is used to answer...

    In this assignment you’ll implement a data structure called a trie, which is used to answer queries regarding the characteristics of a text file (e.g., frequency of a given word). This write-up introduces the concept of a trie, specifies the API you’re expected to implement, and outlines submission instructions as well as the grading rubric. Please carefully read the entire write-up before you begin coding your submission. Tries A trie is an example of a tree data structure that compactly...

  • I am having problems with the following assignment. It is done in the c language. The...

    I am having problems with the following assignment. It is done in the c language. The code is not reading the a.txt file. The instructions are in the picture below and so is my code. It should read the a.txt file and print. The red car hit the blue car and name how many times those words appeared. Can i please get some help. Thank you. MY CODE: #include <stdio.h> #include <stdlib.h> #include <string.h> struct node { char *str; int...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT