COSC 170 Compiler Construction
Spring 2009
Project #2: Scanner
Implement a Scanner for our dialect of the MiniJava programming language, as detailed below.
Project 2 may be completed in teams of three.
Due: Friday, January 30, 11:59PM CST.
Submit: Scanner.java source file through turnin command on
Morbius.
Directions
Read Chapter 2 of Appel.
Implement a Scanner for our dialect of the MiniJava programming
language, as detailed below. After reading Appel Chapter 2 you should
know that Scanning is the first step in compiling a program and aims
to identify each token in program being compiled.
You may NOT use scanner-generator or parse-generator tools like
Lex, Yacc, JavaCC, etc., for this project. I encourage you to employ
the regular expression and finite automata techniques we have
discussed to tackle this problem.
Run the reference implementation of the scanner with the
command: ~brylow/cosc170/Projects/scanner program.java
Create your scanner in a file called "Scanner.java". My
grading protocol will assume that your project can be compiled and run
with the following command line: javac Scanner.java; java Scanner
program.java.
Build a decent set of MiniJava testcases. Several exist in the
book, and on the web. Having a good set of test inputs will be
critical to your success in later phases of the project. The majority
of project points will be assigned by running diff to compare your
output against mine.
Debug until done.
Directions
The Scanner should take a MiniJava program file name on the command
line and write the token stream to stdout with a single token on each
line. To capture the output to file, use the UNIX redirect operator:
javac Scanner.java; java Scanner program.java > tokenStream.out
The tokens to be handled are listed below:
Logical Binary Operators: && ||
Bitwise Binary Operators: & | ^ ~
Mathematical Binary Operators: + - * /
Comparison Binary Operators: < > == !=
Unary Operators: - !
Punctuation: ( ) [ ] { } , .
Reserved words in the grammar in appendix A.2 of Appel plus: Xinu.print Xinu.println Xinu.printint Xinu.readint
Identifiers as defined in appendix A.1
Decimal Integer Literals as defined in A.1
Octal Integer Literals (Not in Book). Same as Decimal except they start with the character '0' and cannot contain the characters '8' and '9'. Examples: 0123, 0567.
Hexadecimal Integer Literals (Not in Book). Same as Decimal except they start with "0x" and can contain the characters 'a'-'f' and 'A'-'F'. Examples: 0x123, 0x123abc, 0x123AbC.
Simple Strings (Not in Book): Double quote (") followed by any characters followed by another double quote. We call them simple because you do not have to handle escape sequences.
Comments as defined in A.1 except you do not have to handle nested comments.
End of file token.
The exact names of all tokens can be found by running your
testcases against the reference implementation at
Morbius:~brylow/cosc170/Projects/scanner.
There are several errors that you should identify and report. Please try
to match error messages exactly.
Unterminated Comments. (Comment not terminated at end of input.)
Invalid characters in numbers. (Invalid character in hex number., Invalid character in octal number., Invalid character in number.)Unterminated Strings (String not terminated at end of line.)
All other illegal tokens. (Illegal token.)
Back
[Rev 1.4 2009 Jan 23 19:13 DWB]