Project 2 - Scanner

Project Errata

In Project 2 you are going to implement a Scanner for our dialect of the MiniJava programming language, as detailed below. After reading Appel Chapter 2 you should know that Scanning is the first step in compiling a program and aims to identify each token in the program being compiled.

You may be aware of scanner-generator tools like Lex, Yacc, JavaCC, and many more. You will NOT (and MUST NOT) use any scanner-generator or parser-generator tools for Project 2. If you have an overwhelming urge to learn about these tools be patient; Project 3 will require their use. Your Scanner will be comprised of only Java code.

Project 2 is to be completed individually.

Due date: 2004 Sep 08 (Wed) 12:00 Noon.


Directions:

  • Read Chapter 2 of Appel.
  • Run our sample scanner.. It can be run with the command: /homes/cs352/bin/p2 program.java
  • Create Scanner.java and implement your scanner in this file (it MUST be run with the command: javac Scanner.java; java Scanner program.java).
  • Compare your scanner's output with ours. If you desire a passing grade you will maintain a large set of testcases and use the diff utility to discover any differences between running your testcases with our scanner and yours.
  • Debug your scanner until the output matches ours.

  • Specification

    The Scanner should take a MiniJava program file name on the command line and write the token stream to stdout with a single token on each line. To capture the output redirect stdout to a file. We must be able to run your program and capture the output with the command:

    javac Scanner.java; java Scanner program.java > tokenStream

    The tokens to be handled are listed below:

  • Logical Binary Operators: && ||
  • Bitwise Binary Operators: & | ^ ~
  • Mathematical Binary Operators: + - * /
  • Comparison Binary Operators: < > == !=
  • Unary Operators: - !
  • Punctuation: ( ) [ ] { } , . ;
  • Reserved words in the grammar in appendix A.2 of Appel plus: throws System.in.read System.out.print System.out.write java.io.IOException
  • Identifiers as defined in appendix A.1
  • Decimal Integer Literals as defined in A.1
  • Octal Integer Literals (Not in Book). Same as Decimal except they start with the character '0' and cannot contain the characters '8' and '9'. Examples: 0123, 0567.
  • Hexadecimal Integer Literals (Not in Book). Same as Decimal except they start with "0x" or "0X" and can contain the characters 'a'-'f' and 'A'-'F'. Examples: 0x123, 0x123abc, 0x123AbC, 0Xa0.
  • Simple Strings (Not in Book): Double quote (") followed by any characters followed by another double quote. We call them simple because you do not have to handle escape sequences.
  • Comments as defined in A.1 except you do not have to handle nested comments.
  • End of file token.

    The exact names of all tokens can be found by running the reference implementation at /homes/cs352/bin/p2.

    There are several errors that you must identify and report. You must match the errors exactly.

  • Unterminated Comments. (Comment not terminated at end of input.)
  • Invalid characters in numbers. (Invalid character in hex number., Invalid character in octal number., Invalid character in number.)Unterminated Strings (String not terminated at end of line.)
  • All other illegal tokens. (Illegal token.)


    What To Turn In:

    Submit the file Scanner.java. Your Scanner should work with the any proper MiniJava input file we provide for testing.

    Submit your code using the turnin command on the lab machines.
       turnin -c cs352=0X01 -p project2
    where "0X01" should be your division number. Please see the manpage for turnin for other helpful option.


    Back [Rev 1.1 2004 Sep 02 11:31 DWB]