+125
This is an extension of Assignment 7. We’re building a compiler for the Calculator language. Previously, we supported only straight-line code, so we could accumulate every instruction into one basic block. Now we will extend the language with if
statements and while
loops. There are several new test cases in src/test/resources/examples/run-expect
.
The changes to the source language are all in the stmt
grammar:
| 'while' expr 'do' stmt #WhileStmt
| 'if' expr 'then' stmt #IfStmt
| 'if' expr 'then' stmt ';' 'else' stmt #IfElseStmt
| 'begin' prog 'end' #BlockStmt
Here are a few simple examples of their usage:
var x = 3;
if x < 4 then print "Yes"; #expect Yes
print x; #expect 3
if x < 2 then print "Cool"; #no output
print "Ok"; #expect Ok
An example with compound statements (begin
/end
):
var d = 8;
if d > 5 then #true
begin
print "Okay"; #expect Okay
if d < 10 then #true
begin
print d; #expect 8
print "Yup"; #expect Yup
d = d - 2;
end;
print d; #expect 6
if d > 7 then #false
print "Woo"; #no output
end;
print d+1; #expect 7
And a program that calculates the factorial of 20:
var n = 20;
var k = 1;
while n > 0 do
begin
k = k * n;
n = n - 1;
end;
print k; #expect 2432902008176640000
The begin
/end
introduces a new scope, so here’s an example where the variable x
gets shadowed twice, but then is restored after each end
.
var x = "Hello";
begin
var x = 5;
begin
var x = 3.14;
print x; #expect 3.140000e+00
end;
print x; #expect 5
end;
print x; #expect Hello
To support implementing these conditional and loop statements, we have extended the intermediate representation as follows.
In addition to constants, variables, and temporaries, the Address
class can now also represent labels to which control can be transferred. To construct a label, just call the constructor with no arguments:
Address label = new Address();
Like temporaries, labels are represented by a serial number, and they appear prefixed by an L
in the output. Below is an example of one of the sample programs, translated into the IR.
var x = 3; L391:
if x < 4 then print "Yes"; INT vx30 = STORE(INT 3)
print x; BOOL t392 = LT(INT vx30, INT 4)
if x < 2 then print "Cool"; BRANCH(BOOL t392, L394, L393)
print "Ok"; L393:
PRINT(STRING Yes)
JUMP(L394)
L394:
PRINT(INT vx30)
BOOL t395 = LT(INT vx30, INT 2)
BRANCH(BOOL t395, L397, L396)
L396:
PRINT(STRING Cool)
JUMP(L397)
L397:
PRINT(STRING Ok)
END()
That example also demonstrates several new instructions: BRANCH
, JUMP
, and END
:
The BRANCH
instruction takes a boolean address and two labels. If the boolean is zero it jumps to the first address, if it’s one it jumps to the second.
The JUMP
instruction is an unconditional jump, so it takes just one label and transfers control there.
The END
instruction indicates the end of the program. Previously, the program ended when we reached the end of the basic block. Requiring END
gives us flexibility in the ordering of basic blocks in the program… we don’t need to arrange for the last block to appear last.
Unlike in most assembly languages, we don’t allow control to “fall through” the end of one block into the start of the next one. Every basic block should end with either BRANCH
, JUMP
, or END
so there is no ambiguity about what to do next. Again, this is to allow flexibility in block ordering.
The ConvertToIR
visitor now keeps an object of type IRGraph
rather than BasicBlock
. The IRGraph
is a map from label numbers to basic blocks — here is the essence of the class:
class IRGraph implements Iterable<Integer> {
private final TreeMap<Integer, BasicBlock> graph = new TreeMap<>();
private BasicBlock block;
public void newBlock(Address label) {
block = new BasicBlock();
add(label, block);
}
void add(Address label, BasicBlock bb) {
assert label.kind == Address.Kind.LABEL;
graph.put(label.serialNum, bb);
}
void add(Instruction instr) {
block.add(instr);
}
}
It keeps track of the current block, to which you can add new instructions. But it also allows you to create a new block, making that one the current block. Here’s an example of using the methods of IRGraph
to terminate the current block with a jump to the next one:
// We're already in some current block
// Generate label for the new block
Address label = new Address();
// Add an instruction to jump to that label
graph.add(new Instruction(Instruction.Kind.JUMP, label));
// Now we switch to the new block
graph.newBlock(label);
// Further instructions will be in the new block
graph.add(new Instruction(Instruction.Kind.PRINT,
new Address(Address.Kind.CONST, Type.INT, "42")));
graph.add(new Instruction(Instruction.Kind.END));
And here’s a representation of what that Java code would generate:
L100:
JUMP(L101)
L101:
PRINT(INT 42)
END()
Finally, to support nested scopes of variables in begin
/end
blocks, we now use a SymbolTable
data structure, rather than a simple map to keep track of variable names. We covered this representation of a symbol table in the online session on March 23rd (recording available). It features enter()
and leave()
methods to support a stack of scopes.