Find the top frequently used words in the book of “Sense and Sensibility”. The book...

Question

Question

Find the top frequently used words in the book of “Sense and Sensibility”. The book...

Find the top frequently used words in the book of “Sense and Sensibility”. The book is in the sense_andsensibility.txt file.

The words should not be case sensitive, meaning “Mother” and “mother” are considered the same word.
Replace all the punctuation marks with a space.
Use the “stopwords.txt” file to remove all the stop words in text. (Do NOT modify the stopwords.txt file)
Create a histogram similar to the “histogram.jpg” file. The diagram should contain the ranking, the top 30 words, the number of times they appeared in the book. The number of stars will be the number of appearance divided by 10. For example, “mother” appears 263 times; there are 26 stars displayed. (You may not have the exactly the same result as in the histogram.jpg)

Would like to know the answer in Python 3 code

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

#package to rmove stopwords
from nltk.corpus import stopwords

#package for plotting
import matplotlib.pyplot as plt

#package to sort dictionary by value
import operator

#package to remove punctuations
import re

#initializing filename
filename = "sense_andsensibility.txt";

#opening file in read mode
f = open(filename, "r")

#reading file
text = f.read()

#converting all characters to lowercase
text = text.lower()

#removing punctuation
text = re.sub(r'[^\w\s]',' ',text)

#removing digits
text = re.sub(r'[0-9]',' ',text)
text = re.sub(r'\_',' ',text)

#converting text(string) into words(list)
text = text.split()

#removing special symbols
text = [txt for txt in text if not txt in stopwords.words('english')]

#creating frequency dictionary
frequency = {}

#creating n gram
for i in range(len(text)):

    #if word is not in dictionary, then creating a new key-value pair
    if text[i] not in frequency.keys():
        # intilizing value to 1
        frequency[text[i]] = 1

    #if word is in dictionary, increments value by 1
    else:
        frequency[text[i]]+=1

#sorting dictionary in reverse order
sorted_x = sorted(frequency.items(), key=operator.itemgetter(1),reverse=True)
sorted_x = dict(sorted_x)

#if no. of words are more than 30
if(len(sorted_x)>30):
size = len(sorted_x)-30

#if no. of words are lesser than 30
else:
size = len(sorted_x)

#reducing dectinary size to 30
for i in range(size):
sorted_x.popitem()

#Displays top 30 frequent words from console
'''
print("Most Frequent 30 Words: \n")
for x,val in sorted_x.items():
stars = int(val/10)
print(x+" : "+str(stars)+" stars")
'''

#getting keys, values in lists
key = list(sorted_x.keys())
val = list(sorted_x.values())

#Getting star value each frequent word
for i in range(len(val)):
val[i] = int(val[i]/10)

#plotting a bar
plt.bar(key,val)

#setting vertical style for xlabel
plt.xticks(rotation='vertical')

#fixing bottom margin problem
plt.tight_layout()

#adding ylabel as star
plt.ylabel("Stars")

#displaying the plot
plt.show()

#Sample Output histogram

Add a comment

Answer 2

Answer #2

o achieve this task, you can follow these steps using Python 3 code:

Read the contents of the "sense_andsensibility.txt" file.
Convert the text to lowercase to make it case-insensitive.
Remove punctuation marks from the text.
Read the stopwords from the "stopwords.txt" file.
Tokenize the text into words.
Remove stopwords from the list of words.
Count the frequency of each word.
Sort the words based on their frequency in descending order.
Generate the histogram with the top 30 words and their frequencies, represented by stars.

Here's a Python code to accomplish these steps:

pythonCopy codeimport string# Step 1: Read the contents of the "sense_andsensibility.txt" filewith open("sense_andsensibility.txt", "r", encoding="utf-8") as file:
    text = file.read()# Step 2: Convert the text to lowercasetext = text.lower()# Step 3: Remove punctuation markstranslator = str.maketrans("", "", string.punctuation)
text = text.translate(translator)# Step 4: Read the stopwords from the "stopwords.txt" filewith open("stopwords.txt", "r", encoding="utf-8") as file:
    stopwords = set(file.read().splitlines())# Step 5: Tokenize the text into wordswords = text.split()# Step 6: Remove stopwords from the list of wordsfiltered_words = [word for word in words if word not in stopwords]# Step 7: Count the frequency of each wordword_freq = {}for word in filtered_words:
    word_freq[word] = word_freq.get(word, 0) + 1# Step 8: Sort the words based on their frequency in descending ordersorted_words = sorted(word_freq.items(), key=lambda x: x[1], reverse=True)# Step 9: Generate the histogram with the top 30 words and their frequenciesprint("Rank\tWord\t\tFrequency\tHistogram")print("----------------------------------------------")for rank, (word, freq) in enumerate(sorted_words[:30], 1):
    stars = "*" * (freq // 10)    print(f"{rank}\t{word}\t\t{freq}\t\t{stars}")

This code will read the text from the provided files, process the data as described, and then print the histogram with the top 30 words and their frequencies, represented by stars. Note that the output may not exactly match the "histogram.jpg" file due to the random nature of the data and the counting of word occurrences.

answered by: mervetokaz

Add a comment

Answer 3

Find the top frequently used words in the book of “Sense and Sensibility”. The book...

Homework Answers

Add Answer to:
Find the top frequently used words in the book of “Sense and Sensibility”. The book...

Post as a guest

Earn Coins

Write a Python program stored in a file q6.py that takes a text (without punctuation) as...

Part 1. Write a program that takes a string as input, strips whitespace and punctuation from...

this is python do it in pycharm Programming Practice 8.3: Most Frequent Character 15 pts Not...

In this assignment, you will explore more on text analysis and an elementary version of sentiment...

Implement the histogram function to complete the desired program. You must use dynamically allocated arrays for...

Create a Python script file called hw12.py. Add your name at the top as a comment,...

Python program This assignment requires you to write a single large program. I have broken it...

In this assignment you’ll implement a data structure called a trie, which is used to answer...

I am having problems with the following assignment. It is done in the c language. The...

Find the top frequently used words in the book of “Sense and Sensibility”. The book...

Homework Answers

Add Answer to: Find the top frequently used words in the book of “Sense and Sensibility”. The book...

Post as a guest

Earn Coins

Add Answer to:
Find the top frequently used words in the book of “Sense and Sensibility”. The book...