Question

construct a cross-reference index for a given text file. Such index structures have applications in the...

construct a cross-reference index for a given text file. Such index structures have applications in the design of compilers and databases.

Our task is to write a program that while reading a text file collects all words of the text and retains the numbers of the lines in which each word occurred. When this scan is terminated, a table is printed showing all collected words in alphabetical order with lists of line numbers where they occurred. There would be only one line for each word.

Represent the words encountered in the text by a binary search tree (also called a lex- icographic tree). For example, if there were three words

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Ans:- As you have not specifed in which language you want the answer I am providing the solution in java please go through the program and you are free to make changes as per your requirment

  1.    import java.util.*;
  2.    import java.io.*;
  3.    public class Concordance {
  4.       static class AlphabeticalOrder implements Comparator {
  5. /* Represents a Comparator that can be used for comparing Strings according to alphabetical order. It is an error to apply this Comparator to objects that are non-strings.*/
  6.          public int compare(Object obj1, Object obj2) {
  7.             String str1 = (String)obj1;      // Type-cast object to Strings.
  8.             String str2 = (String)obj2;
  9.             str1 = str1.toLowerCase();     // Convert to lower case.
  10.             str2 = str2.toLowerCase();
  11.             return str1.compareTo(str2); // Compare lower-case Strings.
  12.          }
  13.       }
  14.       static TextReader in;                     // An input stream for reading the input file.
  15.       static PrintWriter out;                   // Output stream for writing the output file.
  16.       static TreeMap index = new TreeMap(new AlphabeticalOrder());
  17. /* This TreeMap holds the concordance. Words from the file are used as keys in the map. The value associated with each word is a set that contains the line numbers on which the word occurs in the file. The set contains values belonging to the wrapper class, Integer.*/
  18.       public static void main(String[] args) {
  19.          openFiles(args);              // Open input and output files.
  20.          int lineNum = 1;             // The number of the line in the input
  21.                                        // file that is currently begin processed.
  22.          while (true) {
  23.             while (in.peek() != '\0' && ! Character.isLetter(in.peek())) {
  24. /* Skip over non-letter characters, stopping when end-of-file ('\0') or a letter is seen. If the character is an end-of-line character, add one to lineNum to reflect that fact that we are moving on to the next line in the file.*/
  25.                char ch = in.getAnyChar();
  26.                if (ch == '\n') {
  27.                   lineNum++;                      // Start a new line.
  28.                }
  29.             }
  30.             if (in.eof()) {
  31.                                     // The end-of-file has been reached, so exit from the loop.
  32.                break;
  33.             }
  34.             String word = in.getAlpha();                         // The next word from the file.
  35.             if (word.length() > 2 && !word.equalsIgnoreCase("the")) {
  36. /* Add the reference to word to the concordance, unless the word is "the" or the word has length <= 2. */
  37.                addReference(word,lineNum);
  38.             }
  39.          }
  40.         printConcordance();                  // Print the data to the output file.
  41.          if (out.checkError()) {
  42.                                     // Some error occurred on the output stream.
  43.             System.out.println("An error occurred while writing the data.");
  44.             System.out.println("Output file might be missing or incomplete.");
  45.             System.exit(1);
  46.          }
  47.          System.out.println(index.size() + " distinct words were found.");
  48.       }
  49.       static void openFiles(String[] args) {
  50. /* Open the global variable "in" as an input file with name args[0]. Open the global variable "out" as an output file with name args[1]. If args.length != 2, or if an error occurs while trying to open the files, then an error message is printed and the program will be terminated.*/
  51.          if (args.length != 2) {
  52.             System.out.println("Error: Please specify file names on command line.");
  53.             System.exit(1);
  54.          }
  55.          try {
  56.             in = new TextReader(new FileReader(args[0]));
  57.          }
  58.          catch (IOException e) {
  59.             System.out.println("Error: Can't open input file " + args[0]);
  60.             System.exit(1);
  61.          }
  62.          try {
  63.             out = new PrintWriter(new FileWriter(args[1]));
  64.          }
  65.          catch (IOException e) {
  66.             System.out.println("Error: Can't open output file " + args[1]);
  67.             System.exit(1);
  68.          }
  69.       }                            // end openFiles()
  70.       static void addReference(String term, int lineNum) {
  71. /* Add a line reference to the concordance. The concordance is stored in the global variable, index. The word term has been found on line number lineNum.*/
  72.          TreeSet references;                     // The set of line references that we
  73.                                          //have so far for the term.
  74.          references = (TreeSet)index.get(term); // Type-cast!
  75.          if (references == null){
  76. /* This is the first reference that we have found for the term. Make a new set containing the line number and add it to the index, with the term as the key.*/
  77.              TreeSet firstRef = new TreeSet();
  78.              firstRef.add( new Integer(lineNum) );
  79.              index.put(term,firstRef);
  80.          }
  81.          else {
  82. /*references is the set of line references that we have found previously for the term. Add the new line number to that set.*/
  83.             references.add( new Integer(lineNum) );
  84.          }
  85.       }
  86.       static void printConcordance() {
  87.                                     // Print each entry from the concordance to the output file.
  88.          Set entries = index.entrySet();
  89. /* The index viewed as a set of entries, where each entry has a key and a value. The objects in this set are of type Map.Entry.*/
  90.          Iterator iter = entries.iterator();
  91.          while (iter.hasNext()) {
  92. /* Get the next entry from the entry set and print the term and list of line references that it contains.*/
  93.             Map.Entry entry = (Map.Entry)iter.next();
  94.             String term = (String)entry.getKey();
  95.             Set lines = (Set)entry.getValue();
  96.             out.print(term + " ");
  97.             printIntegers(lines);
  98.             out.println();
  99.          }
  100.       }
  101.       static void printIntegers( Set integers ) {
  102. /* Assume that all the objects in the set are of type Integer. Print the integer values on one line, separated by commas. The commas make this a little tricky, since no comma is printed before the first integer.*/
  103.          if (integers.isEmpty()) {
  104.               // There is nothing to print if the set is empty.
  105.             return;
  106.          }
  107.          Integer integer;                                      // One of the Integers in the set.
  108.          Iterator iter = integers.iterator();            // For traversing the set.
  109.          integer = (Integer)iter.next();                 // First item in the set.
  110.          out.print(integer.intValue());                 // Print the first item.
  111.          while (iter.hasNext()) {
  112.                         // Print additional items, if any, with separating commas.
  113.             integer = (Integer)iter.next();
  114.             out.print(", " + integer.intValue());
  115.          }
  116.       }
  117.    } // end

The answer is provided on the bases of my knowledge and information on the subject hope you had understood.

Thanks.

Add a comment
Know the answer?
Add Answer to:
construct a cross-reference index for a given text file. Such index structures have applications in the...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • JAVA DATA STRUCTURES: Reading a Text file of words into two different data structures 1. Use a Binary search tree and then 2.Use a Hash Map. *USE BOTH BINARY & HASH MAP* * Get the file name as a u...

    JAVA DATA STRUCTURES: Reading a Text file of words into two different data structures 1. Use a Binary search tree and then 2.Use a Hash Map. *USE BOTH BINARY & HASH MAP* * Get the file name as a user input.* Present a menu to the user with the below options: 1) Delete the first occurrence of a given word. 2) Delete all the occurrences of a given word.

  • Please Only use C language In this assignment, you have to read a text file (in.txt)...

    Please Only use C language In this assignment, you have to read a text file (in.txt) that contains a set of words. The first line of the file contains 3 numbers (N, S, D). These numbers represent the sequence of input words in your file. N: represents the number of words to read to build a binary search tree. You have to write a recursive insert code to create and insert these words into the binary search tree. After inserting...

  • Recursion and Trees Application – Building a Word Index Make sure you have read and understood...

    Recursion and Trees Application – Building a Word Index Make sure you have read and understood ·         lesson modules week 10 and 11 ·         chapters 9 and 10 of our text ·         module - Lab Homework Requirements before submitting this assignment. Hand in only one program, please. Background: In many applications, the composition of a collection of data items changes over time. Not only are new data items added and existing ones removed, but data items may be duplicated. A list data structure...

  • Overview: file you have to complete is WordTree.h, WordTree.cpp, main.cpp Write a program in C++ that...

    Overview: file you have to complete is WordTree.h, WordTree.cpp, main.cpp Write a program in C++ that reads an input text file and counts the occurrence of individual words in the file. You will see a binary tree to keep track of words and their counts. Project description: The program should open and read an input file (named input.txt) in turn, and build a binary search tree of the words and their counts. The words will be stored in alphabetical order...

  • Write a program named text_indexing.c that does the following: Reads text and stores it as one...

    Write a program named text_indexing.c that does the following: Reads text and stores it as one string called text. You can read from a file or from the user. (In my implementation, I read only one paragraph (up to new line) from the user. With this same code, I am able to read data from a file by using input redirection (executable < filename) when I run the program. See sample runs below). You can assume that the text will...

  • I need help in C++ implementing binary search tree. I have the .h file for the...

    I need help in C++ implementing binary search tree. I have the .h file for the binary search tree class. I have 4 classic texts, and 2 different dictionaries. Classic Texts: Alice In Wonderland.txt A Tale of Two Cities.txt Pride And Prejudice.txt War and Peace.txt 2 different dictionaries: Dictionary.txt Dictionary-brit.txt The data structures from the standard template library can not be used.The main program should open the text file, read in the words, remove the punctuation and change all the...

  • C++ Lab 1. Read in the contents of a text file up to a maximum of...

    C++ Lab 1. Read in the contents of a text file up to a maximum of 1024 words – you create your own input. When reading the file contents, you can discard words that are single characters to avoid symbols, special characters, etc. 2. Sort the words read in ascending order in an array (you are not allowed to use Vectors) using the Selection Sort algorithm implemented in its own function. 3. Search any item input by user in your...

  • I've previously completed a Java assignment where I wrote a program that reads a given text...

    I've previously completed a Java assignment where I wrote a program that reads a given text file and creates an index that stores the line numbers for where individual words occur. I've been given a new assignment where I need to modify some of my old code. I need to replace the indexer in my Index class with a NavigableMap<String, Word> and update my Word class with NavigableSet<Integer> lines. The instantiated objects should be TreeMap() and TreeSet(). I have below...

  • CSC110 Lab 6 (ALL CODING IN JAVA) Problem: A text file contains a paragraph. You are to read the contents of the file, store the UNIQUEwords and count the occurrences of each unique word. When the fil...

    CSC110 Lab 6 (ALL CODING IN JAVA) Problem: A text file contains a paragraph. You are to read the contents of the file, store the UNIQUEwords and count the occurrences of each unique word. When the file is completely read, write the words and the number of occurrences to a text file. The output should be the words in ALPHABETICAL order along with the number of times they occur and the number of syllables. Then write the following statistics to...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT