11. pyxc: Statement Blocks
Where We Are
Chapter 10 added mutable variables, but the function body was still a single expression. The var form needed a : and a body expression, and for loops were expressions that produced 0.0. This chapter introduces real statement blocks and indentation-sensitive syntax. After this chapter you'll be able to write code more naturally:
ready> def sum_to(n): var acc = 0 for var i = 1, i <= n, 1: acc = acc + i return accParsed a function definition.
Source Code
git clone --depth 1 https://github.com/alankarmisra/pyxc-llvm-tutorial
cd pyxc-llvm-tutorial/code/chapter-11
Grammar
The central shift: if, for, var, and return move out of the expression grammar and become statements. Expressions are now purely value-producing.
What changed or is new:
(* changed: body is now a statement or indented block, not an expression *)
definition = "def" prototype ":" ( simplestmt | eols block ) ;
decorateddef = binarydecorator eols "def" binaryopprototype ":" ( simplestmt | eols block )
| unarydecorator eols "def" unaryopprototype ":" ( simplestmt | eols block ) ;
(* new: statement forms *)
ifstmt = "if" expression ":" suite [ eols "else" ":" suite ] ;
forstmt = "for" [ "var" ] identifier "=" expression "," expression "," expression ":" suite ;
varstmt = "var" varbinding { "," varbinding } ; (* no body — var is now a statement *)
assignstmt = identifier "=" expression ;
returnstmt = "return" expression ;
simplestmt = returnstmt | varstmt | assignstmt | expression ;
compoundstmt = ifstmt | forstmt ;
statement = simplestmt | compoundstmt ;
suite = simplestmt | compoundstmt | eols block ;
block = indent statement { eols statement } dedent ;
indent = INDENT ;
dedent = DEDENT ;
INDENT = ? synthetic token emitted by the lexer when indentation increases ? ;
DEDENT = ? synthetic token emitted by the lexer when indentation decreases ? ;
(* simplified: var and assignment removed; if/for removed from primary *)
expression = unaryexpr binoprhs ;
primary = identifierexpr | numberexpr | parenexpr ;
suite— what follows a:. Either a single statement on the same line, or a newline followed by an indented block.simplestmt— statements that fit on one line:return,var, assignment, or a bare expression.compoundstmt— statements that introduce a new suite:ifandfor.block— anINDENTtoken, one or more statements separated by newlines, aDEDENTtoken.INDENT/DEDENT— tokens emitted by the lexer when indentation increases or decreases. OneINDENTis emitted when a block opens, oneDEDENTwhen it closes — not one per line. The parser sees them like matched parentheses:
def f():
var x = 5 # ← INDENT emitted here (indentation increased)
x = x + 1 # ← nothing (same level)
return x # ← nothing (same level)
# ← DEDENT emitted here (indentation decreased)
Full grammar — pyxc.ebnf:
program = [ eols ] [ top { eols top } ] [ eols ] ;
eols = eol { eol } ;
top = definition | decorateddef | external | toplevelexpr ;
definition = "def" prototype ":" ( simplestmt | eols block ) ;
decorateddef = binarydecorator eols "def" binaryopprototype ":" ( simplestmt | eols block )
| unarydecorator eols "def" unaryopprototype ":" ( simplestmt | eols block ) ;
binarydecorator = "@" "binary" "(" integer ")" ;
unarydecorator = "@" "unary" ;
binaryopprototype = customopchar "(" identifier "," identifier ")" ;
unaryopprototype = customopchar "(" identifier ")" ;
external = "extern" "def" prototype ;
toplevelexpr = expression ;
prototype = identifier "(" [ identifier { "," identifier } ] ")" ;
ifstmt = "if" expression ":" suite [ eols "else" ":" suite ] ;
forstmt = "for" [ "var" ] identifier "=" expression "," expression "," expression ":" suite ;
varstmt = "var" varbinding { "," varbinding } ;
assignstmt = identifier "=" expression ;
simplestmt = returnstmt | varstmt | assignstmt | expression ;
compoundstmt = ifstmt | forstmt ;
statement = simplestmt | compoundstmt ;
suite = simplestmt | compoundstmt | eols block ;
returnstmt = "return" expression ;
block = indent statement { eols statement } dedent ;
expression = unaryexpr binoprhs ;
binoprhs = { binaryop unaryexpr } ;
varbinding = identifier [ "=" expression ] ;
unaryexpr = unaryop unaryexpr | primary ;
unaryop = "-" | userdefunaryop ;
primary = identifierexpr | numberexpr | parenexpr ;
identifierexpr = identifier | callexpr ;
callexpr = identifier "(" [ expression { "," expression } ] ")" ;
numberexpr = number ;
parenexpr = "(" expression ")" ;
binaryop = builtinbinaryop | userdefbinaryop ;
indent = INDENT ;
dedent = DEDENT ;
builtinbinaryop = "+" | "-" | "*" | "<" | "<=" | ">" | ">=" | "==" | "!=" ;
userdefbinaryop = ? any opchar defined as a custom binary operator ? ;
userdefunaryop = ? any opchar defined as a custom unary operator ? ;
customopchar = ? any opchar that is not "-" or a builtinbinaryop,
and not already defined as a custom operator ? ;
opchar = ? any single ASCII punctuation character ? ;
identifier = (letter | "_") { letter | digit | "_" } ;
integer = digit { digit } ;
number = digit { digit } [ "." { digit } ]
| "." digit { digit } ;
letter = "A".."Z" | "a".."z" ;
digit = "0".."9" ;
eol = "\r\n" | "\r" | "\n" ;
ws = " " | "\t" ;
INDENT = ? synthetic token emitted by lexer ? ;
DEDENT = ? synthetic token emitted by lexer ? ;
A side effect of this grammar change. In chapter 10, var was an expression with a body — var x = 5 in x + 1. The variable and the code that used it were a single syntactic unit, so the variable's lifetime was self-contained. Now that var is a free-standing statement, a variable declared in one statement could in principle be referenced in any later statement — including one compiled in a completely separate module.
That last part is the problem. In the REPL, each top-level input is compiled into its own throw-away module and immediately freed after evaluation. A var at the top level would need its storage to survive across module boundaries, which the current JIT design doesn't support. Chapter 12 fixes this properly — both for the REPL and for compiled executables.
Statements vs Expressions
Before this chapter, if, for, and var were expressions — they produced a value and could be nested:
var acc = 0: for var i = 1, ...: acc = acc + i
Statements don't produce values — they do things. Once if, for, var, and return are statements, a function body becomes a flat list of them:
var acc = 0
for var i = 1, ...:
acc = acc + i
return acc
ParseExpression no longer handles var, if, for, or assignment =. Those are all in ParseStatement and ParseSimpleStmt. Expressions are now purely value-producing — operators, calls, variable reads.
New Tokens and AST Nodes
Two new token values are added to the lexer's enum:
tok_indent = -19, // synthetic: start of an indented block
tok_dedent = -20, // synthetic: end of an indented block
And three new AST node classes:
ReturnExprAST — a return statement:
class ReturnExprAST : public ExprAST {
unique_ptr<ExprAST> Expr;
public:
ReturnExprAST(unique_ptr<ExprAST> Expr) : Expr(std::move(Expr)) {}
Value *codegen() override;
};
BlockExprAST — a sequence of statements evaluated in order. If execution reaches the end without a return, the function implicitly returns 0.0. Use an explicit return if you need a specific value:
class BlockExprAST : public ExprAST {
vector<unique_ptr<ExprAST>> Stmts;
public:
BlockExprAST(vector<unique_ptr<ExprAST>> Stmts)
: Stmts(std::move(Stmts)) {}
Value *codegen() override;
};
VarStmtAST — the statement form of var. Unlike VarExprAST from chapter 10, it has no body. Variables declared here persist for the rest of the function:
class VarStmtAST : public ExprAST {
vector<pair<string, unique_ptr<ExprAST>>> VarNames;
public:
VarStmtAST(vector<pair<string, unique_ptr<ExprAST>>> VarNames)
: VarNames(std::move(VarNames)) {}
Value *codegen() override;
};
IfStmtAST also exists — the statement form of if. It differs from IfExprAST in that it doesn't need to produce a value, so it has no PHI node and the else branch is optional.
INDENT and DEDENT
A single counter isn't enough to track indentation — nested blocks need to remember every level that was opened. When indentation drops, the lexer needs to know which level it's returning to, and how many blocks it's closing at once. That's why the lexer keeps an IndentStack and a pending-token queue.
At the start of each line it finds the indentation level, compares it to the top of the stack, and pushes INDENT or DEDENT tokens into the queue. When indentation drops by multiple levels in one step, one DEDENT is queued per level closed and the parser drains them one at a time:
def f(x): # stack: [0]
if x > 0: # stack: [0, 4] → INDENT
if x > 10: # stack: [0, 4, 8] → INDENT
return x # stack: [0, 4, 8, 12] → INDENT
return 0 # col 4: three levels closed → DEDENT, DEDENT, DEDENT queued
# stack drains back to [0, 4]; parser sees them one at a time
Blocks are also automatically closed at end of file — no trailing blank line needed:
def f():
var x = 5 # stack: [0, 4] → INDENT
return x # stack: [0, 4] nothing
# EOF # col 0: stack has [0, 4] → DEDENT pushed into PendingTokens
# parser drains it on the next getNextToken() call
static vector<int> IndentStack = {0}; // starts at column 0
static deque<int> PendingTokens; // buffered tokens the parser hasn't seen yet
static bool AtLineStart = true; // true right after a newline
Inside gettok(), before any normal token logic, the indentation is processed in three steps.
Step 1: Find the indentation level of the current line.
if (AtLineStart) {
int IndentCol = 0;
while (LastChar == ' ' || LastChar == '\t') {
IndentCol += (LastChar == ' ') ? 1 : (8 - IndentCol % 8); // tabs → columns
LastChar = advance();
}
Spaces contribute 1 column each. Tabs advance to the next multiple of 8 — the delta is 8 - (IndentCol % 8):
| IndentCol before tab | IndentCol % 8 | delta | IndentCol after tab |
|---|---|---|---|
| 0 | 0 | 8 | 8 |
| 1 | 1 | 7 | 8 |
| 7 | 7 | 1 | 8 |
| 8 | 0 | 8 | 16 |
| 11 | 3 | 5 | 16 |
A tab always snaps forward to the next 8-column boundary, never backward and never past it.
Step 2: Compare to the top of the stack and queue INDENT or DEDENT tokens.
if (IndentCol > IndentStack.back()) {
// More indented → push one INDENT.
IndentStack.push_back(IndentCol);
PendingTokens.push_back(tok_indent);
} else if (IndentCol < IndentStack.back()) {
// Less indented → push one DEDENT per level closed.
while (IndentStack.size() > 1 && IndentCol < IndentStack.back()) {
IndentStack.pop_back();
PendingTokens.push_back(tok_dedent);
}
// Dedenting to a level that was never opened is an error.
if (IndentCol != IndentStack.back()) {
fprintf(stderr, "Error (...): inconsistent indentation\n");
return tok_error;
}
}
A single dedent can push multiple DEDENT tokens — one for each level that closed. Each time the parser calls gettok(), PendingTokens is drained first; only when it is empty does the lexer go looking for the next real token.
Step 3: Drain the queue — return the first pending token if any.
AtLineStart = false;
if (!PendingTokens.empty()) {
int Tok = PendingTokens.front();
PendingTokens.pop_front();
return Tok;
}
}
gettok() is called again for each subsequent token, draining the queue one entry at a time before returning to normal lexing.
At EOF, the lexer flushes one DEDENT per still-open block:
if (LastChar == EOF) {
if (IndentStack.size() > 1) {
IndentStack.pop_back();
return tok_dedent; // gettok is called again for the next one
}
return tok_eof;
}
In REPL mode, a blank line ends the current indented block immediately — the same behavior as the Python REPL.
Pyxc Indentation Rules
These are similar to Python's indentation rules, with one difference: Pyxc allows mixing tabs and spaces (Python 3 disallows it).
- Each space advances one column; each tab advances to the next multiple of 8.
- Mixing tabs and spaces is allowed — the column count is what matters.
- Dedenting to a column that was never opened is an error.
- Blank lines and comment-only lines do not affect indentation in file mode. In REPL mode, a blank line closes the current block immediately.
- A block opens after
:followed by a newline and a deeper indentation level.
Parse-Time Variable Tracking
Assignment to an undeclared variable is a parse-time error:
ready> x = 1
Error: Assignment to undeclared variable
To detect this, the parser maintains a scope stack of declared variable names. var declarations register names; for var loops introduce a loop variable into a temporary inner scope:
static vector<set<string>> VarScopes;
static void BeginFunctionScope(const vector<string> &Args) {
VarScopes.clear();
VarScopes.emplace_back();
for (const auto &Arg : Args)
VarScopes.front().insert(Arg); // parameters are pre-declared
}
static void EndFunctionScope() { VarScopes.clear(); }
static void DeclareVar(const string &Name) {
if (!VarScopes.empty())
VarScopes.back().insert(Name); // declare in the innermost (current) scope
}
static bool IsDeclaredInCurrentScope(const string &Name) {
if (VarScopes.empty()) return false;
return VarScopes.back().count(Name) > 0;
}
static void BeginBlockScope() { VarScopes.emplace_back(); }
static void EndBlockScope() {
if (VarScopes.size() > 1) VarScopes.pop_back();
}
static bool IsDeclaredVar(const string &Name) {
for (auto It = VarScopes.rbegin(); It != VarScopes.rend(); ++It)
if (It->count(Name)) return true;
return false;
}
Each scope guard is a small C++ struct. The constructor opens the scope; the destructor closes it. When the guard variable goes out of scope — at the end of a block, or when an early return is hit — the scope closes automatically without any explicit cleanup calls:
struct FunctionScopeGuard {
FunctionScopeGuard(const vector<string> &Args) { BeginFunctionScope(Args); }
~FunctionScopeGuard() { EndFunctionScope(); }
};
struct LoopScopeGuard {
LoopScopeGuard(const string &Name) { EnterLoopScope(Name); }
~LoopScopeGuard() { ExitLoopScope(); }
};
struct BlockScopeGuard {
BlockScopeGuard() { BeginBlockScope(); }
~BlockScopeGuard() { EndBlockScope(); }
};
ParseDefinition creates a FunctionScopeGuard immediately after parsing the prototype — before parsing the body:
static unique_ptr<FunctionAST> ParseDefinition() {
getNextToken(); // eat 'def'
auto Proto = ParsePrototype();
if (!Proto) return nullptr;
FunctionScopeGuard Scope(Proto->getArgs()); // parameters enter scope here
// ... parse ':' and body ...
}
ParseForStmt creates a LoopScopeGuard only when the loop introduces a new
variable with var. Without var, the loop reuses an existing variable and
errors if it is undeclared.
string VarName = IdentifierStr;
getNextToken(); // eat identifier
LoopScopeGuard LoopScope(VarName); // only when "var" is present
Parsing a Suite
After every :, the parser calls ParseSuite. A suite is either an inline statement or an indented block:
/// suite = simplestmt | compoundstmt | eols block ;
static unique_ptr<ExprAST> ParseSuite(bool *EndedWithBlock) {
if (CurTok == tok_eol) {
// Newline after ':' → expect an indented block.
consumeNewlines();
if (CurTok != tok_indent)
return LogError("Expected an indented block");
*EndedWithBlock = true;
return ParseBlock();
}
// Same line after ':' → parse an inline statement.
auto Stmt = ParseStatement();
if (!Stmt) return nullptr;
*EndedWithBlock = LastStatementWasBlock;
return Stmt;
}
ParseIfStmt and ParseForStmt both call ParseSuite after eating :. A def body works slightly differently — the inline form only accepts a simplestmt, not a compound statement. You cannot write def f(x): if x > 0: return 1 on one line.
Parsing a Block
ParseBlock consumes INDENT, reads statements separated by newlines until DEDENT, then consumes DEDENT:
/// block = INDENT statement { eols statement } DEDENT ;
static unique_ptr<ExprAST> ParseBlock() {
if (CurTok != tok_indent)
return LogError("Expected an indented block");
getNextToken(); // eat INDENT
BlockScopeGuard Scope; // each block gets its own var scope
consumeNewlines();
if (CurTok == tok_dedent)
return LogError("Expected at least one statement in block");
vector<unique_ptr<ExprAST>> Stmts;
while (true) {
if (CurTok == tok_dedent) break;
auto Stmt = ParseStatement();
if (!Stmt) return nullptr;
Stmts.push_back(std::move(Stmt));
if (CurTok == tok_eol) { consumeNewlines(); continue; }
if (CurTok == tok_dedent) break;
// A nested compound statement (if/for) leaves the parser just after
// its suite — no newline is required before the next statement.
if (LastStatementWasBlock) continue;
return LogError("Expected newline or end of block");
}
getNextToken(); // eat DEDENT
return make_unique<BlockExprAST>(std::move(Stmts));
}
LastStatementWasBlock is a global flag set by ParseIfStmt / ParseForStmt when their suite ended with an indented block. It lets the enclosing block keep parsing without requiring a newline first.
Parsing Statements
ParseStatement dispatches to compound or simple statement parsers:
/// statement = simplestmt | compoundstmt ;
static unique_ptr<ExprAST> ParseStatement() {
LastStatementWasBlock = false;
if (CurTok == tok_if) return ParseIfStmt();
if (CurTok == tok_for) return ParseForStmt();
return ParseSimpleStmt();
}
ParseSimpleStmt handles return, var, assignment, and bare expressions:
/// simplestmt = returnstmt | varstmt | assignstmt | expression ;
static unique_ptr<ExprAST> ParseSimpleStmt() {
if (CurTok == tok_return) return ParseReturnStmt();
if (CurTok == tok_var) return ParseVarStmt();
// Fast path: if the current token is an identifier, peek at what follows
// before committing to a full expression parse. This lets us detect
// "x = expr" (assignment) without going through ParseExpression first.
if (CurTok == tok_identifier) {
string Name = IdentifierStr;
getNextToken(); // eat identifier
if (CurTok == '=') {
if (!IsDeclaredVar(Name))
return LogError("Assignment to undeclared variable");
getNextToken(); // eat '='
auto RHS = ParseExpression();
if (!RHS) return nullptr;
return make_unique<AssignmentExprAST>(Name, std::move(RHS));
}
// Not an assignment — parse the rest as an expression.
auto Expr = ParseIdentifierExprWithName(std::move(Name));
if (!Expr) return nullptr;
return ParseBinOpRHS(0, std::move(Expr));
}
// Non-identifier start: parse a full expression, then check for '='.
auto Expr = ParseExpression();
if (!Expr) return nullptr;
if (CurTok != '=')
return Expr; // bare expression statement
const string *AssignedName = Expr->getLValueName();
if (!AssignedName)
return LogError("Destination of '=' must be a variable");
string Name = *AssignedName;
if (!IsDeclaredVar(Name))
return LogError("Assignment to undeclared variable");
getNextToken(); // eat '='
auto RHS = ParseExpression();
if (!RHS) return nullptr;
return make_unique<AssignmentExprAST>(Name, std::move(RHS));
}
Assignment to an undeclared variable is rejected at parse time via IsDeclaredVar — no codegen is needed to catch it.
Parsing Var as a Statement
var in chapter 11 has no body. It declares one or more names that persist for the rest of the function:
/// varstmt = "var" varbinding { "," varbinding } ;
static unique_ptr<ExprAST> ParseVarStmt() {
getNextToken(); // eat 'var'
vector<pair<string, unique_ptr<ExprAST>>> VarNames;
while (true) {
if (CurTok != tok_identifier)
return LogError("Expected identifier after 'var'");
string Name = IdentifierStr;
getNextToken(); // eat identifier
unique_ptr<ExprAST> Init;
if (CurTok == '=') {
getNextToken(); // eat '='
Init = ParseExpression();
if (!Init) return nullptr;
} else {
Init = make_unique<NumberExprAST>(0.0); // default to 0.0
}
DeclareVar(Name); // register name in the current block scope
VarNames.push_back({Name, std::move(Init)});
if (CurTok != ',') break;
getNextToken(); // eat ','
}
return make_unique<VarStmtAST>(std::move(VarNames));
}
The critical difference from chapter 10: no : and no body. DeclareVar(Name) registers Name in the current block scope — so later assignments to it will pass the IsDeclaredVar check. If the var is inside an if or for block, that name is only visible inside that block.
Return
ParseReturnStmt is straightforward:
/// returnstmt = "return" expression ;
static unique_ptr<ExprAST> ParseReturnStmt() {
getNextToken(); // eat 'return'
auto Expr = ParseExpression();
if (!Expr) return nullptr;
return make_unique<ReturnExprAST>(std::move(Expr));
}
ReturnExprAST::codegen emits a real LLVM terminator — a ret instruction that ends the current basic block:
Value *ReturnExprAST::codegen() {
Value *RetVal = Expr->codegen();
if (!RetVal) return nullptr;
Builder->CreateRet(RetVal); // terminates the current basic block
return RetVal;
}
Block Codegen
BlockExprAST::codegen evaluates statements in order. It stops early if a return has already terminated the current block — statements after a return are unreachable. It also saves and restores NamedValues around the block body so that variables declared inside the block with var don't leak to the outer scope:
Value *BlockExprAST::codegen() {
auto SavedBindings = NamedValues; // snapshot outer bindings
Value *Last = nullptr;
for (auto &Stmt : Stmts) {
// If a previous statement already emitted a terminator (e.g. 'return'),
// skip the rest — we'd be emitting into a block with no successor.
if (Builder->GetInsertBlock()->getTerminator()) break;
Last = Stmt->codegen();
if (!Last) {
NamedValues = SavedBindings;
return nullptr;
}
}
NamedValues = SavedBindings; // restore outer bindings when block exits
if (!Last)
return LogErrorV("Empty block");
return ConstantFP::get(*TheContext, APFloat(0.0));
}
Var and Assignment Codegen
VarStmtAST::codegen allocates stack slots and initializes them. Duplicate declarations in the same scope are caught at parse time, so codegen just sets up the alloca and records the binding:
Value *VarStmtAST::codegen() {
Function *TheFunction = Builder->GetInsertBlock()->getParent();
for (auto &Var : VarNames) {
const string &VarName = Var.first;
ExprAST *Init = Var.second.get();
Value *InitVal = Init->codegen();
if (!InitVal) return nullptr;
AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
Builder->CreateStore(InitVal, Alloca);
NamedValues[VarName] = Alloca;
}
return ConstantFP::get(*TheContext, APFloat(0.0)); // var statement produces 0.0
}
AssignmentExprAST::codegen is unchanged from chapter 10 — it loads the alloca from NamedValues, stores the new value, and returns it.
if as a Statement
Chapter 10 had IfExprAST, which always produced a value via a PHI node and required both then and else branches. Chapter 11 adds IfStmtAST — the statement form. The condition check, basic block creation, and branch structure are identical to chapter 10. Two things change:
elseis optional. If there is noelse, the else block just falls through to merge.- No PHI node. The statement doesn't produce a value.
// Emit the then branch.
Builder->SetInsertPoint(ThenBB);
if (!Then->codegen()) return nullptr;
if (!Builder->GetInsertBlock()->getTerminator())
Builder->CreateBr(MergeBB);
// Emit the else branch — skipped entirely if there is no else.
Builder->SetInsertPoint(ElseBB);
if (Else) {
if (!Else->codegen()) return nullptr;
}
if (!Builder->GetInsertBlock()->getTerminator())
Builder->CreateBr(MergeBB);
Builder->SetInsertPoint(MergeBB);
return ConstantFP::get(*TheContext, APFloat(0.0));
// No PHI node — statements don't produce values.
The getTerminator() check before each CreateBr is what makes return inside an if work correctly. If the then block already has a ret, we don't emit a second branch — that would be ill-formed IR.
Implicit Return
In chapter 10, FunctionAST::codegen always emitted CreateRet(RetVal) unconditionally after Body->codegen() returned. That breaks now that return statements emit their own ret instructions.
Chapter 11 checks whether the current block already has a terminator before deciding whether to add one:
// Step 4: codegen the body, verify, optimize — or erase on failure.
if (Value *RetVal = Body->codegen()) {
// Only emit a return if the body didn't already terminate the block.
// A 'return' statement in the body emits its own 'ret', so we don't
// want to emit a second one.
if (!Builder->GetInsertBlock()->getTerminator())
Builder->CreateRet(RetVal);
verifyFunction(*TheFunction);
TheFPM->run(*TheFunction, *TheFAM);
return TheFunction;
}
This is what makes the following valid — the if path returns explicitly; the fall-through path gets an implicit return 0.0:
def threshold(x):
if x > 10: return x
# no explicit return — implicit return 0.0 inserted by codegen
After a Top-Level Block
When a def body ends with an indented block, the parser lands on the next top-level token — not a newline. Without special handling, HandleDefinition would treat that as a parse error. A flag tracks this:
static bool LastTopLevelEndedWithBlock = false;
static void HandleDefinition() {
auto FnAST = ParseDefinition();
bool HasTrailing = (CurTok != tok_eol && CurTok != tok_eof);
if (!FnAST || (HasTrailing && !LastTopLevelEndedWithBlock)) {
// error recovery
}
// ...
}
This is what lets a file contain two top-level definitions back to back without a blank line between them.
Things Worth Knowing
varwithout an initializer defaults to0.0.varis block-scoped. A variable declared inside aniforforblock is not visible after that block exits. An outer variable with the same name is shadowed inside the block and restored when the block exits.- Declaring the same variable twice in the same block is a parse-time error.
- Assignment only works on variables that were declared with
varor are function parameters. Undeclared assignments are rejected at parse time. for varintroduces a loop variable scoped to the loop body only.- The inline body of a
defaccepts only asimplestmt. Compound statements (if,for) require an indented block.
Known Limitations
No global variables. var is only valid inside a function body. ParseTopLevelExpr calls ParseExpression, so var x = 10 at the top level is a parse error. Each top-level expression also gets its own fresh function scope, so there is no way to declare a variable on one REPL line and reference it on the next.
This is the main practical limitation of the current chapter. In the REPL it means you cannot build up state across lines:
# Does not work in the REPL:
var x = 10 # parse error — var is not an expression
x = x + 10 # x is undeclared in this expression's scope
printd(x)
For now, keep mutable state inside a function:
def f():
var x = 10
x = x + 10
return x
printd(f()) # prints 20.000000
Chapter 12 addresses this properly. When compiling to an executable, all top-level statements are collected into a synthesized main(), so var declarations and assignments at the top level work naturally. Full REPL support for global state requires additional runtime infrastructure and is also covered in chapter 12.
Try It
Simple function with multiple statements:
ready> def f(x): if x > 10: return 20 return 10Parsed a function definition.ready> f(5)Parsed a top-level expression. Evaluated to 10.000000ready> f(20)Parsed a top-level expression. Evaluated to 20.000000
Accumulator loop — the chapter 10 workaround, now written naturally:
ready> def sum_to(n): var acc = 0 for var i = 1, i <= n, 1: acc = acc + i return accParsed a function definition.ready> sum_to(5)Parsed a top-level expression. Evaluated to 15.000000
Build and Run
cd code/chapter-11
cmake -S . -B build && cmake --build build
./build/pyxc
What's Next
Chapter 12 resolves the global variable limitation described in Known Limitations. For compiled programs, all top-level statements are collected into a synthesized main() so var declarations and assignments work naturally. For the REPL, a persistent variable store backed by a runtime helper lets state survive across JIT module boundaries. Once globals work in both modes, chapter 13 emits object files and chapter 14 links them into native executables — turning Pyxc from a JIT toy into a real compiler.
Need Help?
Build issues? Questions?
- GitHub Issues: Report problems
- Discussions: Ask questions
Include:
- Your OS and version
- Full error message
- Output of
cmake --versionandninja --version
We'll figure it out.