Project 2 - Scanner
Project Errata
In Project 2 you are going to implement a Scanner for our dialect of the
MiniJava programming language, as detailed below. After reading Appel Chapter
2 you should know that Scanning is the first step in compiling a program and
aims to identify each token in the program being compiled.
You may be aware of scanner-generator tools like Lex, Yacc, JavaCC, and many more.
You will NOT (and MUST NOT) use any scanner-generator or parser-generator tools for Project 2. If you have an
overwhelming urge to learn about these tools be patient; Project 3 will
require their use. Your Scanner will be comprised of only Java code.
Project 2 is to be completed individually.
Due date: 2004 Sep 08 (Wed) 12:00 Noon.
Directions:
Read Chapter 2 of Appel.
Run our sample scanner.. It can
be run with the command: /homes/cs352/bin/p2 program.java
Create Scanner.java and implement
your scanner in this file (it MUST be run with the command: javac
Scanner.java; java Scanner program.java).
Compare your scanner's output with ours. If you desire a passing
grade you will maintain a large set of testcases and use the
diff utility to discover any differences between running your
testcases with our scanner and yours.
Debug your scanner until the output matches ours.
Specification
The Scanner should take a MiniJava program file name on the command
line and write the token stream to stdout with a single token on each
line. To capture the output redirect stdout to a file. We must be
able to run your program and capture the output with the command:
javac Scanner.java; java Scanner program.java > tokenStream
The tokens to be handled are listed below:
Logical Binary Operators: && ||
Bitwise Binary Operators: & | ^ ~
Mathematical Binary Operators: + - * /
Comparison Binary Operators: < > == !=
Unary Operators: - !
Punctuation: ( ) [ ] { } , . ;
Reserved words in the grammar in appendix A.2 of Appel plus: throws System.in.read System.out.print System.out.write java.io.IOException
Identifiers as defined in appendix A.1
Decimal Integer Literals as defined in A.1
Octal Integer Literals (Not in Book). Same as Decimal except they start with the character '0' and cannot contain the characters '8' and '9'. Examples: 0123, 0567.
Hexadecimal Integer Literals (Not in Book). Same as Decimal except they start with "0x" or "0X" and can contain the characters 'a'-'f' and 'A'-'F'. Examples: 0x123, 0x123abc, 0x123AbC, 0Xa0.
Simple Strings (Not in Book): Double quote (") followed by any characters followed by another double quote. We call them simple because you do not have to handle escape sequences.
Comments as defined in A.1 except you do not have to handle nested comments.
End of file token.
The exact names of all tokens can be found by running the
reference implementation at /homes/cs352/bin/p2.
There are several errors that you must identify and report. You must match the errors exactly.
Unterminated Comments. (Comment not terminated at end of input.)
Invalid characters in numbers. (Invalid character in hex number., Invalid character in octal number., Invalid character in number.)Unterminated Strings (String not terminated at end of line.)
All other illegal tokens. (Illegal token.)
What To Turn In:
Submit the file Scanner.java.
Your Scanner should work with the any proper MiniJava input
file we provide for testing.
Submit your code using the turnin command on the lab machines.
   turnin -c cs352=0X01 -p project2
where "0X01" should be your division number. Please see the manpage for turnin
for other helpful option.
Back
[Rev 1.1 2004 Sep 02 11:31 DWB]