Question

Using RStudio, Find a text file (.txt file) on your own. Create a word cloud. Submit...

Using RStudio, Find a text file (.txt file) on your own. Create a word cloud. Submit the file, your code and the result.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Choose the text file for which you need to create a word cloud. For instance I am going to create a word cloud of Mr Robot Series. "Welcome back, my tenderfoot hackers! Well, the first season of Mr. Robot just ended and Elliot and fsociety successfully took down Evil Corp! They have effectively destroyed over 70% of the world's consumer and student debt! Free at last! Free at last! Of course, global financial markets crashed as well, but that's another story."

& saved as hacks.txt in Desktop and path is C:\\Desktop\\Word_Cloud\\MrRobot\\project\\hacks.txt

Installing Packages :

Open RStudio.You will need to install the packages “tm” and “wordcloud”. Next you need to load the packages in R

Run the following commands in RStudio.

#Installing Packages

install.packages (“tm”)

install.packages (“wordcloud”)

install.packages (“RColorBrewer”)

#Loading Packages

library(tm)

library(wordcloud)

library(RColorBrewer)

library(tm) library(wordcloud) library(RColorBrewer) speech = “ C:\\Desktop\\Word_Cloud\\MrRobot\\project\\hacks.txt” hack_txt = readLines(speech) hack<-Corpus(VectorSource(hack_txt)) inspect(hack)[1:10] hack_data<-tm_map(hack,stripWhitespace) hack_data<-tm_map(hack_data,tolower) hack_data<-tm_map(hack_data,removeNumbers) hack_data<-tm_map(hack_data,removePunctuation) hack_data<-tm_map(hack_data,removeWords, stopwords(“english”)) hack_data<-tm_map (hack_data, removeWords, c(“and”,”the”,”our”,”that”,”for”,”are”,”also”,”more”,”has”,”must”,”have”,”should”,”this”,”with”)) tdm_hack<-TermDocumentMatrix(hack_data)      TDM1<-as.matrix(tdm_hack)       #Convert this into a matrix format v = sort(rowSums(TDM1), decreasing = TRUE)          #Gives you the frequencies for every word Summary(v) wordcloud (hack_data, scale=c(5,0.5), max.words=1, random.order=FALSE, rot.per=0.35, use.r.layout=FALSE, colors=brewer.pal(8, “Dark2″))

Reading the File

Following is the command to read a text file in R:

speech = “ C:\\Desktop\\Word_Cloud\\MrRobot\\project\\hacks.txt”

hack_txt = readLines(speech)

Converting the text file into a Corpus

Now in order to process or clean the text using tm package, you need to first convert this plain text data into a format called corpus which can then be processed by the tm package. A corpus is a collection of documents (although in our case we only have one) .Following is the command to convert .txt file into a corpus.

hack<-Corpus(VectorSource(hack_txt))

To see the first few documents in the text file, type the R command: inspect(hack)[1:10]

Data Cleaning

Execute the following commands in RStudio:

hack_data<-tm_map(hack,stripWhitespace)

hack_data<-tm_map(hack_data,tolower)

hack_data<-tm_map(hack_data,removeNumbers)

hack_data<-tm_map(hack_data,removePunctuation)

hack_data<-tm_map(hack_data,removeWords, stopwords(“english”))

As you can see the commands above, use tm_map() from the tm package for processing your text. As the commands are quite obvious, they do the following: strip unnecessary white space, convert everything to lower case (since tm package is case sensitive) remove English common words like ‘the’ (so-called ‘stopwords’). You can also explicitly remove numbers and punctuation with the removeNumbers and removePunctuation arguments.

After looking at the text document, I also noticed the following words stop words which I wanted to remove:

hack_data<-tm_map

(hack_data, removeWords, c(“and”,”the”,”our”,”that”,”for”,”are”,”also”,”more”,”has”,”must”,”have”,”should”,”this”,”with”))

Create a Term Document Matrix

It is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to words in the collection and columns correspond to documents.

Now we can create a word cloud even without a TDM. But the advantage of using this here is to take a look at the frequency of words.

tdm_hack<-TermDocumentMatrix(hack_data) #Creates a TDM

TDM1<-as.matrix(tdm_hack) #Convert this into a matrix format

v = sort(rowSums(TDM1), decreasing = TRUE) #Gives you the frequencies for every word

Summary(v)

summary(v) will give us the distribution of the frequency of words. So we can take a look at the least and max number of times a word has occurred. This helps us set the “max.words” parameter in the next step.

Create your first word cloud!

Scale controls the difference between the largest and smallest font, max.words is required to limit the number of words in the cloud (if you omit this R will try to squeeze every unique word into the diagram), rot.per is the percentage of vertical text, and colors provides a wide choice of symbolizing your data.

markets bo con consumerfinest took successfully hackers Elliot

i hope you will get your answer

Add a comment
Know the answer?
Add Answer to:
Using RStudio, Find a text file (.txt file) on your own. Create a word cloud. Submit...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • (Create a text file) Write a program to create a file named Exercise17_01.txt if it does...

    (Create a text file) Write a program to create a file named Exercise17_01.txt if it does not exist. Append new data to it if it already exists. Write 150 integers created randomly into the file using text I/O. Integers are separated by a space.

  • Using C, Write a program to alphabetically merge the three word list files (american0.txt, american1.txt, and...

    Using C, Write a program to alphabetically merge the three word list files (american0.txt, american1.txt, and american2.txt). Each file will have words in random order. The output must be a file called words.txt. Note that you cannot cheat by using Linux commands to do this. It must be done entirely in your C code. File format: apple banana pear . . . Hint: Program will need to utilize double pointers. More Hints: 1. Assume no word is bigger that 50...

  • a. Provide me with your code file, output file and the text file. 1. Create a...

    a. Provide me with your code file, output file and the text file. 1. Create a file with a series of integers. Save it as numbers. txt. Write a program that reads all the numbers and calculates their sum . 2. Create a file having different integers than the first one. Save it as numbers1. txt . Write a program that reads all the number and calculates their average. Important: The two files that you are creating will contain different...

  • In Python Provide me with your code file, output file and the text file Create a...

    In Python Provide me with your code file, output file and the text file Create a file having different integers than the first one. Save it as numbers1. txt . Write a program that reads all the number and calculates their average. Important: The two files that you are creating will contain different numbers.

  • using java create hash set that can for the file use a txt file: Hi my...

    using java create hash set that can for the file use a txt file: Hi my name is rick. (a) Read one word from the file. (b) Remove all non-alphanumeric characters from the word. A non-alphanumeric character is any character other than the lowercase and uppercase English letters, and the numerals 0 through 9. (c) Add the modified word to the hash set.

  • Exercise 1: Create a file by using any word processing program or text editor. Write an...

    Exercise 1: Create a file by using any word processing program or text editor. Write an application that displays the file's name, size, and time of last modification. Save the file as FileStatistics.java. Exercise 2: Create a file that contains your favorite movie quote. Use a text editor, such as Notepad, and save the file as Quote.txt. Copy the file contents and paste them into a word processing program, such as Word. Save the file as Quote.doc. Write an application...

  • Write a c program. CH-12 has arbitrary number of lines and one num. txt EXERCISE 12-11 te a program to create a new file numnew. txt that will he ine. A text file will have number in reverse order...

    Write a c program. CH-12 has arbitrary number of lines and one num. txt EXERCISE 12-11 te a program to create a new file numnew. txt that will he ine. A text file will have number in reverse order of the file num.txt For example: M the num.txt file is 7632 582 13101 then the numnew. txt file will be 1367 285 10131 CH-12 has arbitrary number of lines and one num. txt EXERCISE 12-11 te a program to create...

  • 11. Create your own MATLAB function file using Power Method to find the largest eigenvalue. A...

    11. Create your own MATLAB function file using Power Method to find the largest eigenvalue. A matrix and an initial guess can be the inputs and the output should be the largest eigenvalue. Please send the file by email. 11. Create your own MATLAB function file using Power Method to find the largest eigenvalue. A matrix and an initial guess can be the inputs and the output should be the largest eigenvalue. Please send the file by email.

  • Create a text file named “file1.txt” (by use of the notepad editor in Windows for instance)...

    Create a text file named “file1.txt” (by use of the notepad editor in Windows for instance) containing the following integer values, one per line: ​12 5 13 56 90 52 82 52 Write a Java program that reads these values from the file and displays their sum on the screen.

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT