COSC 4400
Compiler for BabyGustave language
NEW Due date: Monday, April 8


GOAL: Write a working compiler for the BabyGustave subset of the Gustave language. This compiler should use your earlier assignments to turn source code into syntax trees and then translate those trees into ARM assembly code which can then be assembled and run on the Raspberry Pi's. So, the output of your compiler should be ARM assembly code printed out on the standard output (screen).

METHOD: Chapters 5-12 in Appel describe one way to write a compiler. However, Appel's approach is more complicated than we have time to absorb this semester. Consequently, I am suggesting a simpler approach. Rather than translating the syntax trees into intermediate representation trees and then translating those into assembly instructions, we will go directly from the syntax trees to assembly instructions. An example of this approach has been discussed in class and the code is available on morbius in the directory ~mike/cosc4400/initcomp.

Here's my best advice on how to work on this:

1. Make a new copy of your assignment #4 code (so you can always start over if things get really messed up).

2. Write your own CodeGenPkg/Coder.java to handle all of the features of BabyGustave.
This is, of course, the main part of the assignment.

Start small and try to get pieces working as soon as possible. For instance, if you write the Coder.visit() for Program, RootClass, PutInteger, and NumberExpr, you should be able to compile and run the program:

class ONE

creation make

method make is
  do
    io.put_integer(5)
  end

end -- ONE

If you then add code for PutString, you can put a newline after the 5, and you can compile hello_world.e . Add features one or two at a time and then test them (and debug them) before adding more.

I would suggest implementing all of the statements and expressions without arrays, first. Then, you can add in arrays and test those features later.

Arrays involve two steps: allocation and access. To allocate an array in the heap, we will use the C runtime library routine malloc(). Malloc takes one parameter (the number of bytes desired) and returns the address of a block of memory at least that large. In assembler, you call malloc by putting the number of bytes (which is 4 times the number of integer entries) into r0, then calling malloc, then using the returned value from r0. This returned address is stored as the value of the variable representing the array.

To access an array location, you need to generate code to put the value of the array variable (base address) into a register, then add on 4 times the index, then use that value to access memory (for a load or store depending on whether you are assigning to or reading from the array).

Sample programs for testing are available here. Don't forget that you can (and should) write your own test programs as needed.