COSC 1010 - Assignment #7
Making an Index
Due Date: Friday, Dec 11, 2009


[You can work in teams of two people or by yourself. Each team will just hand-in one set of files.]

Many computer authoring systems, designed to help authors write books, provide some easy way to produce the index at the end of the book. This is the section at the end of the book where key terms are listed in alphabetical order with the page numbers where the terms appeared. In this assignment, you're asked to write a Java program which would help an author prepare an index for their book.

Since I don't want to have to write a whole book just to test your programs, we'll simplify the situation. Instead of reading a book and reporting page numbers, we'll just read a paragraph and report line numbers. Also, to make the program easier, there will be no punctuation.

GOAL: Write a Java program which reads in a paragraph in which words to be indexed have been marked and prints out the paragraph with the indexing marks removed and the lines numbered followed by an index listing the line numbers for each word.

The program should read in the marked paragraph from a file. It should also use a class "Reference" to record the marked words and their lines.

Words to be indexed are marked by placing curly braces around them. Note in the following example that it is OK to list a repeated line number if the word is marked more than once in that line (e.g. `key'). Also, only list marked occurrences of a word. For instance, `computer' occurs on line 5, but it is not enclosed in braces and so line 5 is not listed in the index entry for `computer'.

For example, given the input file:

When using a {computer} to {type}
school {papers} one usually
uses a {keyboard}  You
press a {key} and the letter on the {key}
appears on the computer screen
You can {type} many {papers}
on a {computer}
your program should produce the output:
1: When using a computer to type
2: school papers one usually
3: uses a keyboard  You
4: press a key and the letter on the key
5: appears on the computer screen
6: You can type many papers
7: on a computer

INDEX:
computer  1 7
type  1 6
papers  2 6
keyboard  3
key  4 4

METHOD: This program is more complicated than any we've done so far in class. You'll definitely want to come up with a careful plan before you start writing code.

You should write the class Reference to keep track of the line numbers of a given word. So, a Reference object should have an instance variable to store a word, an int array to list the line numbers, and an int to store how many line numbers are recorded in the array. You can assume that a word will never have more than 10 occurrences.

The class should have (at least) the following methods:

  public Reference(String w, int num)
    // A constructor which records one line number (num)
    // for the word w.

  public void addLineNumber(int num)
    // Put the line number num into the array.

  public String getWord()
    // Return the word for this instance

  public void print()
    // Print out the word followed by the line numbers,
    // all on one line.

Your program will need two arrays to keep track of the lines of input and the Reference instances. In each case, you'll need to initialize a large array and then just keep track of how many entries you are using at any time. You can assume there will never be more than 100 lines in an input file and never more than 100 separate words indexed. You'll also need to keep track of the current line number.

Like in the Pig Latin example, for each input line you'll want to split it into words and then process each word. If the word starts with "{" and ends with "}", then you need to process it as an index. If not, you just append it onto the line of text.

To process a word as an index, you'll search through the array of Reference objects to see if you've indexed this word before. If so, you use addLineNumber to include the new line number, otherwise you use the constructor to create a new Reference and add it to the array of References.

You might want methods like these in your program:

  static String checkLineForBraces(String s, int curr_line)
  // Return a string which equals s with all braces
  // removed and any index words recorded for line curr_line.
  // (Basically, this splits s into an array of
  // words and then calls checkWordsForBraces().)

  static String checkWordsForBraces(String[] words, int curr_line)
  // Return a string made up of the words in the
  // array words with any braces removed separated
  // by spaces.  Any index word should be recorded
  // for line curr_line.







  static String checkOneWord(String w, int curr_line)
  // If w begins with "{" and ends with "}" then
  // record the enclosed word at line curr_line
  // and return the enclosed word.
  // Otherwise, just return w.

  static void recordWord(String w, int curr_line)
  // Search through the array of Reference objects to
  // see if any have a word equal to w.  If so, use 
  // addLineNumber(curr_line) on that Reference.
  // If w isn't equal to any of the words, then
  // create a new Reference(w, curr_line) and
  // put it in the array of Reference objects.

I'll be providing sample input files inp1.txt, inp2.txt, and inp3.txt on the class webpage.

HAND-IN: Your program using D2L. You should have at least one *.java file for your main program and Reference.java for the Reference class. Be sure the names of the team members are in each file.