Project 2 - Scanner

Project Errata

In Project 2 you are going to implement a Scanner for our dialect of the MiniJava programming language, as detailed below. After reading Appel Chapter 2 you should know that Scanning is the first step in compiling a program and aims to identify each token in the program being compiled.

You may be aware of scanner-generator tools like Lex, Yacc, JavaCC, and many more. You will NOT (and MUST NOT) use any scanner-generator or parser-generator tools for Project 2. If you have an overwhelming urge to learn about these tools be patient; Project 3 will require their use. Your Scanner will be comprised of only Java code.

Project 2 is to be completed individually.

Due date: 2004 Sep 08 (Wed) 12:00 Noon.


  • Read Chapter 2 of Appel.
  • Run our sample scanner.. It can be run with the command: /homes/cs352/bin/p2
  • Create and implement your scanner in this file (it MUST be run with the command: javac; java Scanner
  • Compare your scanner's output with ours. If you desire a passing grade you will maintain a large set of testcases and use the diff utility to discover any differences between running your testcases with our scanner and yours.
  • Debug your scanner until the output matches ours.

  • Specification

    The Scanner should take a MiniJava program file name on the command line and write the token stream to stdout with a single token on each line. To capture the output redirect stdout to a file. We must be able to run your program and capture the output with the command:

    javac; java Scanner > tokenStream

    The tokens to be handled are listed below:

  • Logical Binary Operators: && ||
  • Bitwise Binary Operators: & | ^ ~
  • Mathematical Binary Operators: + - * /
  • Comparison Binary Operators: < > == !=
  • Unary Operators: - !
  • Punctuation: ( ) [ ] { } , . ;
  • Reserved words in the grammar in appendix A.2 of Appel plus: throws System.out.print System.out.write
  • Identifiers as defined in appendix A.1
  • Decimal Integer Literals as defined in A.1
  • Octal Integer Literals (Not in Book). Same as Decimal except they start with the character '0' and cannot contain the characters '8' and '9'. Examples: 0123, 0567.
  • Hexadecimal Integer Literals (Not in Book). Same as Decimal except they start with "0x" or "0X" and can contain the characters 'a'-'f' and 'A'-'F'. Examples: 0x123, 0x123abc, 0x123AbC, 0Xa0.
  • Simple Strings (Not in Book): Double quote (") followed by any characters followed by another double quote. We call them simple because you do not have to handle escape sequences.
  • Comments as defined in A.1 except you do not have to handle nested comments.
  • End of file token.

    The exact names of all tokens can be found by running the reference implementation at /homes/cs352/bin/p2.

    There are several errors that you must identify and report. You must match the errors exactly.

  • Unterminated Comments. (Comment not terminated at end of input.)
  • Invalid characters in numbers. (Invalid character in hex number., Invalid character in octal number., Invalid character in number.)Unterminated Strings (String not terminated at end of line.)
  • All other illegal tokens. (Illegal token.)

    What To Turn In:

    Submit the file Your Scanner should work with the any proper MiniJava input file we provide for testing.

    Submit your code using the turnin command on the lab machines.
       turnin -c cs352=0X01 -p project2
    where "0X01" should be your division number. Please see the manpage for turnin for other helpful option.

    Back [Rev 1.1 2004 Sep 02 11:31 DWB]