Question

Imagine that you are asked to build a search engine for finding relevant tweets. Please describe...

Imagine that you are asked to build a search engine for finding relevant tweets. Please describe what methods that you plan to use to (i) build indexes; (ii) rank the tweets; and (iii) evaluate the results. Please provide justifications on your choice and explain potential limitations.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Answer:

import codecs

#variable to hold number of retweets
number_of_retweets = 0

# open the unicode file for parsing
f = codecs.open('content_polluters_tweets.txt', encoding='UTF-8')

# input the date to search for number of tweets on that day
target_day = raw_input("Enter a date in format yyyy-mm-dd\n>").strip()
tweets_per_day = 0

# input the suspicious words to search in a tweet
suspicous_words = raw_input("Enter suspicious words to search in tweets\n>").strip().lower().split()
bad_tweets = 0

# input words to search for repetition
words_rept = raw_input("Enter words to search for repetition\n>").strip().lower().split()
frequency_rept = dict()
for word in words_rept:
    frequency_rept[word] = 0

# iterate over each line in the file - each line is one tweet
for line in f:

    # extract a line from the file
    tweet = repr(line)
    # ignore first 2 characters, which are garbage
    tweet = tweet[2:]

    # strip the line for unnecessary whitespace
    tweet = tweet.rstrip("\\r\\n")
    tweet = tweet.rstrip("\r\n")
    tweet = tweet.strip("\\t")
    tweet = tweet.replace("\\t", " ")
    tweet = tweet.replace("\\u", "")
    tweet = tweet.replace("\u", "")
    tweet = tweet.strip()

    # make all letters lowercase for easier matching
    tweet = tweet.lower()

    # split on whitespace characters to generate tokens
    tokens = tweet.split()

    # determine if it is a retweet
    if tweet.__contains__(" rt "):
        number_of_retweets += 1

    # determine if the tweet is from the given day
    if tweet.__contains__(target_day):
        tweets_per_day += 1

    # search for suspicious words
    for word in suspicous_words:
        if tweet.__contains__(word):
            bad_tweets += 1

    # search for the given words
    for word in words_rept:
        if tweet.__contains__(word):
            frequency_rept[word] += 1


print "Total number of retweets: " + str(number_of_retweets)
print "Total number of tweets on " + target_day + " is " + str(tweets_per_day)
print "Tweets with entered suspicious words: " + str(bad_tweets)
print "The frequency of the given words is: "
for word in words_rept:
    print word + ": " + str(frequency_rept[word]) + " times."

OUTPUT:

DEAR PLEASE DO RATE IT IF HELPS ELSE LET ME KNOW YOUR DOUBT.

THANK YOU!!!

Add a comment
Know the answer?
Add Answer to:
Imagine that you are asked to build a search engine for finding relevant tweets. Please describe...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Imagine that you are the data manager for a hospital. The hospital is tasked with managing a larg...

    Imagine that you are the data manager for a hospital. The hospital is tasked with managing a large volume of data – often referred to as big data. Managing big data also requires a data governance plan. You are asked to perform a feasibly analysis to implement a data governance program. You are responsible for the first phase of the project. The first phase of your analysis is to explore big data, the responsibilities of a data governance officer, and...

  • Netflix Database - PROGRAMMED IN JAVA PLEASE :) In this program, you will build a Netflix...

    Netflix Database - PROGRAMMED IN JAVA PLEASE :) In this program, you will build a Netflix movie database using the provided file, netflix.csv. The file contains more than two hundred records of movies and TV programs, with each record consisting a title, a rating, a release year, and a user rated score. There are three classes that you will need to implement: Movie, NetflixHandler, and NetflixApp. Class: Movie Movie - title: String - rating: String - year: int - score:...

  • I need help finding a simple approach to the following modeling problem. Please provide details s...

    I need help finding a simple approach to the following modeling problem. Please provide details so I can understand. In 1981 Peter Sutcliffe was convicted of thirteen murders and subjecting a number of other people to vicious attacks. One of the methods used to narrow the search for Mr. Sutcliffe was to find a ‘center of mass’ of the locations of the attacks. In the end, the suspect happened to live in the same town predicted by this technique. Since...

  • Please answer the following Question in detail of the following question in 350 Word count in you...

    Please answer the following Question in detail of the following question in 350 Word count in your own words. Please cite your reference from internet search. please answer each question individually. Perform an internet search for a current health care organization of your choice (preferably publicly traded for-profit organizations because these organizations must report all financial data and make it available to the public). In your search, select and evaluate the report of the financial information from the past 4...

  • In this case, you are an auditor who was asked by the company leadership to carry out audit procedures to the sales department (sales division). You received information that the sales department has experienced a decline in the last 2 years due to the CO

    In this case, you are an auditor who was asked by the company leadership to carry out audit procedures to the sales department (sales division). You received information that the sales department has experienced a decline in the last 2 years due to the COVID-19 pandemic which is supported by a decrease in employee relations, resulting in a shortage of employees which results in a lack of segregation of duties taking place in it.   As an auditor, you are asked...

  • Directions: You are asked to prepare a short piece of writing indicating the two career paths...

    Directions: You are asked to prepare a short piece of writing indicating the two career paths you have chosen to research for your Career Development Research Paper, and what types of research strategies you will use to complete the assignment. You have the option to research two potential entry level positions you may obtain upon completion of your degree, or you can research career ladder opportunities you may be interested in. You will compose a proposal in the form of...

  • Please post complete answer . please only post answers if you know all . Imagine that...

    Please post complete answer . please only post answers if you know all . Imagine that two closely related species of frog live in the same general area and the males of one species exhibit parental care but males of the species do not. Which of the following attributes would you expect are associated with the non-paternal care species? Explain your choices). (8 marks) A. The tadpoles grow extremely rapidly. B. The adults are relatively small. The tadpoles and adults...

  • : For the earlier projects in this class, I asked you to go out and perform...

    : For the earlier projects in this class, I asked you to go out and perform an experiment and then write about it. This time, it’s a library exercise. Find a news article related to recent developments in some type of science, whether medicine, social science, physics, or something else. You can find it in a newspaper, magazine, or on the Internet, but if you use the Internet, the source must be a real media company (nytimes.com, foxnews.com, etc). Your...

  • 3. What outcomes will you evaluate? The awareness about the dangers of smoking among members of...

    3. What outcomes will you evaluate? The awareness about the dangers of smoking among members of the community The effectiveness of drugs that help to quit smoking. The effectiveness of smoking cessation centers established in the Kingdom in regarding to the outcoumes evaluation listed above please answer the folowing question: the outcome evaluation plan that you began writing in weeks 6, 9, and 10     Chapter 8, 9 and Appendix A section c) Format for an evaluation plan in Hoggarth...

  • For this activity, you have been hired as a team of consultants on a multi-year basis...

    For this activity, you have been hired as a team of consultants on a multi-year basis for a global washer and dryer manufacturer. They currently offer two core washer and dryer sets: a high-end model and an economic model. You are tasked to complete several calculations and present your findings to the company stakeholders. 1. For your first assignment, management has provided the following revenue and cost information: High-End Set Economical Set Sales price $3,500 per unit    $1,000 per unit  ...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT