COSC 170 Compiler Construction

Spring 2007

Project #2

Scanner

Implement a Scanner for our dialect of the MiniJava programming language, as detailed below.

Project 2 may be completed in teams of two. Choose your partner wisely; I expect you to work together for the remainder of the term.

Due date: 2007 Feb 02 (Fri) 2:00PM CST.

Submit: E-mail compressed archive of Java source to professor, with a subject header "COSC 170 P2".
Directions

  • Read Chapter 2 of Appel.
  • Implement a Scanner for our dialect of the MiniJava programming language, as detailed below. After reading Appel Chapter 2 you should know that Scanning is the first step in compiling a program and aims to identify each token in program being compiled.
  • You may NOT use scanner-generator or parse-generator tools like Lex, Yacc, JavaCC, etc., for this project. I encourage you to employ the regular expression and finite automata techniques we have discussed to tackle this problem.
  • Run the reference implementation of the scanner on Morbius with the command: ~brylow/cosc170/Projects/scanner program.java
  • Create your scanner in a file called "Scanner.java". My grading protocol will assume that your project can be compiled and run with the following command line: javac Scanner.java; java Scanner program.java).
  • Build a decent set of MiniJava testcases. Several exist in the book, and on the web. Having a good set of test inputs will be critical to your success in later phases of the project. The majority of project points will be assigned by running diff to compare your output against mine.
  • Debug until done.
  • Specification
    The Scanner should take a MiniJava program file name on the command line and write the token stream to stdout with a single token on each line. To capture the output to file, use the UNIX redirect operator:

    javac Scanner.java; java Scanner program.java > tokenStream.out

    The tokens to be handled are listed below:

  • Logical Binary Operators: && ||
  • Bitwise Binary Operators: & | ^ ~
  • Mathematical Binary Operators: + - * /
  • Comparison Binary Operators: < > == !=
  • Unary Operators: - !
  • Punctuation: ( ) [ ] { } , .
  • Reserved words in the grammar in appendix A.2 of Appel plus: throws Syst em.in.read System.out.print System.out.write java.io.IOException
  • Identifiers as defined in appendix A.1
  • Decimal Integer Literals as defined in A.1
  • Octal Integer Literals (Not in Book). Same as Decimal except they start with the character '0' and cannot contain the characters '8' and '9'. Examples: 0123 , 0567.
  • Hexadecimal Integer Literals (Not in Book). Same as Decimal except they sta rt with "0x" and can contain the characters 'a'-'f' and 'A'-'F'. Examples: 0x12 3, 0x123abc, 0x123AbC.
  • Simple Strings (Not in Book): Double quote (") followed by any characters fo llowed by another double quote. We call them simple because you do not have to handle escape sequences.
  • Comments as defined in A.1 except you do not have to handle nested comments.
  • End of file token.

    The exact names of all tokens can be found by running your testcases against the reference implementation at Morbius:~brylow/cosc170/Projects/scanner.

    There are several errors that you should identify and report. Please try to match error messages exactly.

  • Unterminated Comments. (Comment not terminated at end of input.)
  • Invalid characters in numbers. (Invalid character in hex number., Invalid ch aracter in octal number., Invalid character in number.)Unterminated Strings (String not terminated at end of line.)
  • All other illegal tokens. (Illegal token.)
    Turn in

    Submit the file Scanner.java in a compressed tarball, named with your MSCS login name. So, for example, I would run the command: tar cvzf brylow.tar.gz Scanner.java to create a compressed archive of my source file with my login name. E-mail this to me as an attachment. Your Scanner should work with the any proper MiniJava input file I provide for testing.


    Back

    [Rev 1.3 2007 Jan 25 17:40 DWB]