Question

C++ Data Structures TREES You are to write a C++ program to count the frequency (number...

C++ Data Structures TREES

You are to write a C++ program to count the frequency (number of occurrences) of n-grams in a text file. Definition of n-gram is simple: it is the number of consecutive letters in a given text. For example, for the word bilkent the 2-grams (bigrams) are bi, il, lk, ke, en, nt. You may ignore any capitalizations and assume that the text file contains only English letters 'a'...'z', 'A'…'Z', and the blank space to separate words. Your program should take the value of n as a parameter. While processing the input text, if your program encounters a word that has length smaller than the value of parameter n, you can simply ignore that word and process following words.

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Thanks for the question.


Here is the completed code for this problem. Let me know if you have any doubts or if you need anything to change.
Thank You !!


==============================================================================================

#include<iostream>
#include<map>
#include<fstream>
#include<cstdlib>
#include<string>
#include<sstream>
using namespace std;

void file_ngrams(char* filename, map<string,int> &frequency, int n){
  
   ifstream infile(filename);
   if (infile.is_open()){
      
       string file_line, word;
       while(getline(infile,file_line)){
           stringstream ss(file_line);
           while(ss>>word){
               if(word.size()<n)continue;
               for(int i=0; i<=word.size()-n;i++){
                   frequency[word.substr(i,n)]++;
               }
              
           }
       }
      
       infile.close();
      
   }else{
       cout<<"Unable to open the file "<<filename<<". Program will terminate now!";
       exit(1);
   }
  
  
}

int main(){
  
   const int NGRAM_SIZE = 3;
  
   map<string,int> frequency;
   char * filename ="F:\\abc.txt";
   file_ngrams(filename,frequency,NGRAM_SIZE);
   map<string , int >::iterator itr;
   for(itr=frequency.begin(); itr!=frequency.end();itr++)
   cout<<itr->first<<" frequency: "<<itr->second<<endl;
  
}

==============================================================================================

Add a comment
Know the answer?
Add Answer to:
C++ Data Structures TREES You are to write a C++ program to count the frequency (number...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Make a C program to count the number of occurrences of words from the input. For...

    Make a C program to count the number of occurrences of words from the input. For example, with input "one two one three one two" your program should output: one 3 two 2 three 1 It should work for up to 100 different words. If there are more than 100 unique words in the input, the program should still work, and count the number of appearances of the first 100 unique words. Each word should have the same maximum amount...

  • Overview: file you have to complete is WordTree.h, WordTree.cpp, main.cpp Write a program in C++ that...

    Overview: file you have to complete is WordTree.h, WordTree.cpp, main.cpp Write a program in C++ that reads an input text file and counts the occurrence of individual words in the file. You will see a binary tree to keep track of words and their counts. Project description: The program should open and read an input file (named input.txt) in turn, and build a binary search tree of the words and their counts. The words will be stored in alphabetical order...

  • Program is in C++. Write a function named wordStatsPlus that accepts as its parameter a string...

    Program is in C++. Write a function named wordStatsPlus that accepts as its parameter a string holding a file name, opens that file and reads its contents as a sequence of words, and produces a particular group of statistics about the input. You should report: the total number of lines; total number of words; the number of unique letters used from A-Z, case-insensitively, and its percentage of the 26-letter alphabet; the average number of words per line (as an un-rounded...

  • Program is in C++.   Write a function named wordStatsPlus that accepts as its parameter a string...

    Program is in C++.   Write a function named wordStatsPlus that accepts as its parameter a string holding a file name, opens that file and reads its contents as a sequence of words, and produces a particular group of statistics about the input. You should report: the total number of lines; total number of words; the number of unique letters used from A-Z, case-insensitively, and its percentage of the 26-letter alphabet; the average number of words per line (as an un-rounded...

  • Write a Python program stored in a file q6.py that takes a text (without punctuation) as...

    Write a Python program stored in a file q6.py that takes a text (without punctuation) as input and prints the number of occurrences of each word. Your program is case-insensitive, meaning that uppercase and lowercase of the same letter is considered the same. Printing should be done in decreasing order of the number of occurrences. If several words have the same number of occurrences, they should be printed in alphabetical order. A new line must be printed between the words...

  • With basic (do not use #include <algorithm>, etc.) and simple C++ Write a program which reads...

    With basic (do not use #include <algorithm>, etc.) and simple C++ Write a program which reads a text file “input.txt” and stores all the distinct words in an array. A word consists of letters only - uppercase and/or lowercase. An incoming word should be inserted into the array such that it is always in ascending order. Use binary search to ensure that no duplicate words are added. Assume that there are no more than 100 distinct words. Assume that the...

  • Write a simple program in Java or C, that will help you to figure out the...

    Write a simple program in Java or C, that will help you to figure out the key used for Vigenère encrypted file. For this exercise, assume the key length is less than five characters long and only English upper case letters from A-Z are used. You may also assume the plaintext contains only the upper case English letters from A-Z, ignore the space characters. Your program should take a ciphertext file (encrypted using the Vigenère encryption algorithm) as input and...

  • Question 2 Write a program that will read in a line of text up to 100...

    Question 2 Write a program that will read in a line of text up to 100 characters as string, and output the number of words in the line and the number of occurrences of each letter. Define a word to be any string of letters that is delimited at each end by whitespace, a period, a comma, or the beginning or end of the line. You can assume that the input consists entirely of letters, whitespace, commas, an<d periods. When...

  • Description: Overview: You will write a program (says wordcountfreq.c) to find out the number of words and how many times each word appears (i.e., the frequency) in multiple text files. Specifically,...

    Description: Overview: You will write a program (says wordcountfreq.c) to find out the number of words and how many times each word appears (i.e., the frequency) in multiple text files. Specifically, the program will first determine the number of files to be processed. Then, the program will createmultiple threads where each thread is responsible for one file to count the number of words appeared in the file and report the number of time each word appears in a global linked-list....

  • Write a C program to run on ocelot to read a text file and print it...

    Write a C program to run on ocelot to read a text file and print it to the display. It should optionally find the count of the number of words in the file, and/or find the number of occurrences of a substring, and/or take all the words in the string and sort them lexicographically (ASCII order). You must use getopt to parse the command line. There is no user input while this program is running. Usage: mywords [-cs] [-f substring]...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT