Question

how to find most repeated bi-grams (pairs of words) in the text by using java. (without...

how to find most repeated bi-grams (pairs of words) in the text by using java. (without using Hashmap )

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Program:

import java.util.*;
import java.io.*;

//Bigrams class
class Bigrams
{
   //main method
   public static void main (String[] args) throws IOException
   {
       //open the text file
       Scanner sc = new Scanner(new File("bigrams.txt"));

       //create an array of strings
       String text[] = new String[1000];
      
       int n;
       for(n=0; sc.hasNext(); n++)
       {
           //read a String from the file
           String s = sc.next();
           //convert to lower case
           s = s.toLowerCase();
           //remove the punctuation
           s = s.replaceAll("\\p{Punct}","");
           text[n] = s;
       }
       //declare the array of count
       int count[] = new int[n-1];
       //declare the array of bigrams
       String bigrams[][] = new String[n-1][2];
      
       int m = 0, j;
       //processing
       for(int i=0; i<n-1; i++)
       {
           for(j=0; j<m; j++)
           {
               //check for existing bigrams
               if(text[i].equalsIgnoreCase(bigrams[j][0]) && text[i+1].equalsIgnoreCase(bigrams[j][1]))
               {
                   count[j]++;
                   break;
               }
           }
           //for non-existing bigrams
           if(j==m)
           {
               bigrams[m][0] = text[i];
               bigrams[m][1] = text[i+1];
               count[j] = 1;
               m++;
           }
       }
      
       int max=0;
       j = 0;
       //calculate maximum frequency
       for(int i=0; i<m; i++)
       {
           if(count[i]>max)
           {
               max = count[i];
               j = i;
           }
       }
       //print the most repeated bi-grams
       System.out.println("Most repeated bi-grams: " + bigrams[j][0] + " " + bigrams[j][1]);
   }
}

bigrams.txt

The book I read was called A Wrinkle In Time. In the book there is a main character named Meg. Meg and her brother Charles Wallace and a guy named Calvin go on a trip across time and space. They are trying to save their father, a scientist. The dad has been captured by a creature in another galaxy. The kids save the dad and go home using a tesseract.

Output:

Most repeated bi-grams: the book

Add a comment
Know the answer?
Add Answer to:
how to find most repeated bi-grams (pairs of words) in the text by using java. (without...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Write a program using Java Streams that reads an arbitrary text file and creates pairs from...

    Write a program using Java Streams that reads an arbitrary text file and creates pairs from everything it finds in the file that is a number and the word that precedes that number. Example input: I need to buy 5 notebooks and 4 folders in 2 days for school. Output. buy 5 and 4 in 2

  • how can i reword or state this nursing diagnosis without have too many words repeated? “potential...

    how can i reword or state this nursing diagnosis without have too many words repeated? “potential preterm labor related to previous preterm labor as evidence by previous preterm at 21 weeks (demise)”

  • 1) If there are N words after the tokenization process, how many bi-grams and tri-grams can be ge...

    1) If there are N words after the tokenization process, how many bi-grams and tri-grams can be generated a) N-1, N-2 b) N-2, N-1 c) N, N-1 d)N-2,N-3 ------------------------------------------------------------------------ ------------------------------------------------------------------------ 2) Regarding the Document Term Matrix(DTM) which of the following is true? a) Each value(typically) contains the number of appearances of that term in that document b) each row represents one term c) each column represents one document ------------------------------------------------------------------------ ------------------------------------------------------------------------ 3) “unnest_tokens" function is used to reduce the words to...

  • Find the Nearest Repeated Entries in an Array People do not like reading text in which...

    Find the Nearest Repeated Entries in an Array People do not like reading text in which a word is used multiple times in a short paragraph. You are to write a program which helps identify such a problem. Write a program that takes as input an array and finds the distance between closest pairs of equal entries. For example if s = <"All, "work", "and", "no", "play", "makes", "for", "no", "work", "and", "no", "fun", "and", "no", "results">, then the second...

  • Using a doubly linked list, create a list L1 with words from a text file in...

    Using a doubly linked list, create a list L1 with words from a text file in Java.

  • 1) The words can be repeated using the Turkish alphabet of 8 characters long words: (Turkish...

    1) The words can be repeated using the Turkish alphabet of 8 characters long words: (Turkish alphabet contain 29 characters) a) How many words can be produced starting and ending with the same letter? b) How many words can contain only one 'A'? c) How many words can be produced containing at least one 'A'? d) How many words can be produced starting with 'A' and ending with 'B'? e) How many words can be produced consecutively with exactly 5...

  • How do I write a java code that mimics charAt without using java API just primitives...

    How do I write a java code that mimics charAt without using java API just primitives and no charAt to be used? I know it comes from primitives but I am confused on how to assemble the loops to derive my own charAt code

  • I need help parsing a large text file in order to create a map using Java....

    I need help parsing a large text file in order to create a map using Java. I have a text file named weather_report.txt which is filled with hundreds of different indexes. For example: one line is "POMONA SUNNY 49 29 46 NE3 30.46F". There are a few hundred more indexes like that line with different values in the text file and they are not delimited by commas but instead by spaces. Therefore, in this list of indexes we only care...

  • I need java code that will go through multiple lines of a text document and find...

    I need java code that will go through multiple lines of a text document and find words and punctuation and classify them. I am looking for 4 things Greeting, Place, Name and a Period at the end ---example of text structure:--- Hello world my name is java. Hello world my name is C++. Hello my name is Ruby. ----------------------------------------- the code should produce the following. word    designation Word Designation Hello Greeting world place java name . period ect the...

  • without using map 1. Write a C++ program to find out the top 10 words in...

    without using map 1. Write a C++ program to find out the top 10 words in terms of number of appearances in a given file, named “picasso.txt”. The data file is to be downloaded from iLMS system (http://lms.nthu.edu.tw). (Hint: The most efficient way to handle this problem is to build a word dictionary using class map in STL (Standard Template Library) if you know how to do it. On the other hand, without using map, it is still possible to...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT