COSC 170 Compiler Construction

Spring 2009

Project #2: Scanner

Implement a Scanner for our dialect of the MiniJava programming language, as detailed below.
Project 2 may be completed in teams of three.
Due: Friday, January 30, 11:59PM CST.
Submit: Scanner.java source file through turnin command on Morbius.

Directions

  • Read Chapter 2 of Appel.
  • Implement a Scanner for our dialect of the MiniJava programming language, as detailed below. After reading Appel Chapter 2 you should know that Scanning is the first step in compiling a program and aims to identify each token in program being compiled.
  • You may NOT use scanner-generator or parse-generator tools like Lex, Yacc, JavaCC, etc., for this project. I encourage you to employ the regular expression and finite automata techniques we have discussed to tackle this problem.
  • Run the reference implementation of the scanner with the command: ~brylow/cosc170/Projects/scanner program.java
  • Create your scanner in a file called "Scanner.java". My grading protocol will assume that your project can be compiled and run with the following command line: javac Scanner.java; java Scanner program.java.
  • Build a decent set of MiniJava testcases. Several exist in the book, and on the web. Having a good set of test inputs will be critical to your success in later phases of the project. The majority of project points will be assigned by running diff to compare your output against mine.
  • Debug until done.
  • Directions

    The Scanner should take a MiniJava program file name on the command line and write the token stream to stdout with a single token on each line. To capture the output to file, use the UNIX redirect operator:

    javac Scanner.java; java Scanner program.java > tokenStream.out

    The tokens to be handled are listed below:

  • Logical Binary Operators: && ||
  • Bitwise Binary Operators: & | ^ ~
  • Mathematical Binary Operators: + - * /
  • Comparison Binary Operators: < > == !=
  • Unary Operators: - !
  • Punctuation: ( ) [ ] { } , .
  • Reserved words in the grammar in appendix A.2 of Appel plus: Xinu.print Xinu.println Xinu.printint Xinu.readint
  • Identifiers as defined in appendix A.1
  • Decimal Integer Literals as defined in A.1
  • Octal Integer Literals (Not in Book). Same as Decimal except they start with the character '0' and cannot contain the characters '8' and '9'. Examples: 0123, 0567.
  • Hexadecimal Integer Literals (Not in Book). Same as Decimal except they start with "0x" and can contain the characters 'a'-'f' and 'A'-'F'. Examples: 0x123, 0x123abc, 0x123AbC.
  • Simple Strings (Not in Book): Double quote (") followed by any characters followed by another double quote. We call them simple because you do not have to handle escape sequences.
  • Comments as defined in A.1 except you do not have to handle nested comments.
  • End of file token.
  • The exact names of all tokens can be found by running your testcases against the reference implementation at Morbius:~brylow/cosc170/Projects/scanner.

    There are several errors that you should identify and report. Please try to match error messages exactly.

  • Unterminated Comments. (Comment not terminated at end of input.)
  • Invalid characters in numbers. (Invalid character in hex number., Invalid character in octal number., Invalid character in number.)Unterminated Strings (String not terminated at end of line.)
  • All other illegal tokens. (Illegal token.)

  • Back
    [Rev 1.4 2009 Jan 23 19:13 DWB]