10. Pyxc: Mutable Variables
Where We Are
Chapter 9 added user-defined operators, but every variable in Pyxc is still immutable. Function parameters can be read, loop variables can be introduced by for, but there is no way to create a local variable and update it. This chapter adds var — a scoped mutable binding — and = assignment:
ready> def bump(n): return var x = n: x = x + 1Parsed a function definition.ready> bump(5)Parsed a top-level expression. Evaluated to 6.000000
One caveat up front: the new syntax is intentionally transitional. var x = ... : expression is not especially Pythonic. It exists because Pyxc still has expression bodies only. The next two chapters replace this temporary shape with real statement blocks and indentation-sensitive syntax.
Source Code
git clone --depth 1 https://github.com/alankarmisra/pyxc-llvm-tutorial
cd pyxc-llvm-tutorial/code/chapter-10
Grammar
pyxc.ebnf
This chapter extends the grammar in two places: a new varexpr production for local bindings, and assignment as the loosest expression form.
expression = varexpr | unaryexpr binoprhs [ "=" expression ] ; -- new
varexpr = "var" varbinding { "," varbinding } ":" [ eols ] expression ; -- new
varbinding = identifier [ "=" expression ] ; -- new
Two forms are new:
var x = 1, y = 2: expression— introduces one or more mutable locals, then evaluates the body under those bindingsx = x + 1— assigns to an existing mutable local (or a function parameter)
var must come first in the expression, and the : is mandatory. The body after : can stay on the same line or move to the next line because consumeNewlines() is already part of the expression forms.
Full Grammar
program = [ eols ] [ top { eols top } ] [ eols ] ;
eols = eol { eol } ;
top = definition | decorateddef | external | toplevelexpr ;
definition = "def" prototype ":" [ eols ] "return" expression ;
decorateddef = binarydecorator eols "def" binaryopprototype ":" [ eols ] "return" expression
| unarydecorator eols "def" unaryopprototype ":" [ eols ] "return" expression ;
binarydecorator = "@" "binary" "(" integer ")" ;
unarydecorator = "@" "unary" ;
binaryopprototype = customopchar "(" identifier "," identifier ")" ;
unaryopprototype = customopchar "(" identifier ")" ;
external = "extern" "def" prototype ;
toplevelexpr = expression ;
prototype = identifier "(" [ identifier { "," identifier } ] ")" ;
ifexpr = "if" expression ":" [ eols ] expression [ eols ] "else" ":" [ eols ] expression ;
forexpr = "for" identifier "=" expression "," expression "," expression ":" [ eols ] expression ;
expression = varexpr | unaryexpr binoprhs [ "=" expression ] ; -- new
binoprhs = { binaryop unaryexpr } ;
varexpr = "var" varbinding { "," varbinding } ":" [ eols ] expression ; -- new
varbinding = identifier [ "=" expression ] ; -- new
unaryexpr = unaryop unaryexpr | primary ;
unaryop = "-" | userdefunaryop ;
primary = identifierexpr | numberexpr | parenexpr
| ifexpr | forexpr ;
identifierexpr = identifier | callexpr ;
callexpr = identifier "(" [ expression { "," expression } ] ")" ;
numberexpr = number ;
parenexpr = "(" expression ")" ;
binaryop = builtinbinaryop | userdefbinaryop ;
builtinbinaryop = "+" | "-" | "*" | "<" | "<=" | ">" | ">=" | "==" | "!=" ;
userdefbinaryop = ? any opchar defined as a custom binary operator ? ;
userdefunaryop = ? any opchar defined as a custom unary operator ? ;
customopchar = ? any opchar that is not "-" or a builtinbinaryop,
and not already defined as a custom operator ? ;
opchar = ? any single ASCII punctuation character ? ;
identifier = (letter | "_") { letter | digit | "_" } ;
integer = digit { digit } ;
number = digit { digit } [ "." { digit } ]
| "." digit { digit } ;
letter = "A".."Z" | "a".."z" ;
digit = "0".."9" ;
eol = "\r\n" | "\r" | "\n" ;
ws = " " | "\t" ;
Mutable Variables Are Still Expressions
Pyxc still has no statement blocks. So chapter 10 adds mutable variables in expression form:
var x = 1: x = x + 2
This means:
varintroduces one or more local variables- each variable gets its own mutable storage
- the body expression runs with those bindings in scope
- the whole
varexpression evaluates to the value of the body
Multiple bindings are allowed, and later initializers can reference earlier ones:
var x = 1, y = x + 2: y
New Token and AST Nodes
The lexer gains one new keyword token:
tok_var = -18,
Added to the keyword table like every other reserved word:
{"binary", tok_binary}, {"unary", tok_unary}, {"var", tok_var}
Two new AST nodes do the real work.
AssignmentExprAST represents x = x + 1. It stores the destination name and the right-hand side:
class AssignmentExprAST : public ExprAST {
string Name;
unique_ptr<ExprAST> Expr;
public:
AssignmentExprAST(const string &Name, unique_ptr<ExprAST> Expr)
: Name(Name), Expr(std::move(Expr)) {}
Value *codegen() override;
};
VarExprAST represents var a = 1, b = 2: body. It stores the list of bindings plus the body:
class VarExprAST : public ExprAST {
vector<pair<string, unique_ptr<ExprAST>>> VarNames;
unique_ptr<ExprAST> Body;
public:
VarExprAST(vector<pair<string, unique_ptr<ExprAST>>> VarNames,
unique_ptr<ExprAST> Body)
: VarNames(std::move(VarNames)), Body(std::move(Body)) {}
Value *codegen() override;
};
Parsing var
ParseVarExpr reads the var keyword, one or more name [= initializer] bindings separated by commas, the mandatory :, then the body expression:
static unique_ptr<ExprAST> ParseVarExpr() {
getNextToken(); // eat 'var'
vector<pair<string, unique_ptr<ExprAST>>> VarNames;
while (true) {
if (CurTok != tok_identifier)
return LogError("Expected identifier after 'var'");
string Name = IdentifierStr;
getNextToken(); // eat identifier
unique_ptr<ExprAST> Init;
if (CurTok == '=') {
getNextToken(); // eat '='
Init = ParseExpression();
if (!Init) return nullptr;
} else {
Init = make_unique<NumberExprAST>(0.0); // no initializer → default to 0.0
}
VarNames.push_back({Name, std::move(Init)});
if (CurTok != ',') break; // no more bindings
getNextToken(); // eat ',' and loop for the next binding
}
if (CurTok != ':')
return LogError("Expected ':' after var bindings");
getNextToken(); // eat ':'
// Allow the body to start on the next line:
// var x = 1:
// x + 2
consumeNewlines();
auto Body = ParseExpression();
if (!Body) return nullptr;
return make_unique<VarExprAST>(std::move(VarNames), std::move(Body));
}
ParseExpression simply gives var first refusal before the usual binary-expression path:
static unique_ptr<ExprAST> ParseExpression() {
if (CurTok == tok_var)
return ParseVarExpr();
auto LHS = ParseUnary();
// ...
}
Parsing Assignment
Assignment is parsed after the binary expression has been built. If the result of ParseBinOpRHS is a plain variable reference and the next token is =, we consume the = and parse the right-hand side recursively:
/// expression
/// = varexpr | unaryexpr binoprhs [ "=" expression ] ;
static unique_ptr<ExprAST> ParseExpression() {
if (CurTok == tok_var)
return ParseVarExpr();
auto LHS = ParseUnary();
if (!LHS) return nullptr;
auto Expr = ParseBinOpRHS(0, std::move(LHS));
if (!Expr) return nullptr;
if (CurTok != '=')
return Expr; // no assignment — return the binary expression
// The left-hand side must be a plain variable name.
const string *AssignedName = Expr->getVariableName();
if (!AssignedName)
return LogError("Destination of '=' must be a variable");
string Name = *AssignedName;
getNextToken(); // eat '='
auto RHS = ParseExpression(); // right-recursive, so chains right-to-left
if (!RHS) return nullptr;
return make_unique<AssignmentExprAST>(Name, std::move(RHS));
}
This makes assignment:
- lower precedence than all binary operators — the entire left-hand binary expression is parsed before
=is checked - right-associative —
a = b = 1parses asa = (b = 1)
The parser enforces that the left-hand side is a plain variable name, not an arbitrary expression. (1 + 2) = 3 is a parse error.
Stack Slots: From Values to Storage
Until chapter 9, NamedValues mapped variable names directly to LLVM Value* — the SSA values produced by the function's incoming arguments. That worked only because variables were immutable: a parameter name could always refer to the same SSA value forever.
Mutable variables break that model. Once x can be reassigned, the name x can no longer mean "this one fixed SSA value". It has to mean "the place where the current value of x lives".
So NamedValues changes from:
static map<string, Value *> NamedValues;
to:
static map<string, AllocaInst *> NamedValues;
Each variable name now maps to an AllocaInst — a stack slot in the current function's entry block. That is the entire core implementation change.
CreateEntryBlockAlloca
This helper creates the stack slots:
/// CreateEntryBlockAlloca - Create a stack slot in the current function's
/// entry block for a mutable variable.
static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
const string &VarName) {
IRBuilder<> TmpB(
&TheFunction->getEntryBlock(), // insert into the entry block
TheFunction->getEntryBlock().begin()); // at the very start, before any instructions
return TmpB.CreateAlloca(
Type::getDoubleTy(*TheContext), // type: a single double
nullptr, // no array size (scalar slot)
VarName); // name for the IR printout
}
A temporary IRBuilder (TmpB) is used instead of the main Builder because we may be codegenning deep inside a branch or loop body, but allocas for local variables belong in the function entry block — not wherever the main builder happens to be pointing. Placing all allocas at the start of the entry block is a convention LLVM's mem2reg pass depends on when promoting allocas to SSA registers.
Loading and Storing Variables
Once names map to stack slots, reading and writing a variable becomes explicit load and store instructions.
A variable reference loads the current value:
Value *VariableExprAST::codegen() {
AllocaInst *A = NamedValues[Name];
if (!A)
return LogErrorV("Unknown variable name");
// Load the current value from the stack slot.
return Builder->CreateLoad(Type::getDoubleTy(*TheContext), A, Name.c_str());
}
An assignment evaluates the right-hand side, stores it into the stack slot, and returns the assigned value:
Value *AssignmentExprAST::codegen() {
Value *Val = Expr->codegen(); // evaluate the right-hand side first
if (!Val) return nullptr;
AllocaInst *A = NamedValues[Name];
if (!A)
return LogErrorV("Unknown variable name");
Builder->CreateStore(Val, A); // write the new value into the stack slot
return Val; // return the assigned value (makes a = b = 1 work)
}
Returning the assigned value is what makes assignment fit naturally into an expression language.
VarExprAST::codegen
VarExprAST::codegen does four things:
- Evaluate each initializer and allocate a stack slot for it
- Install the new bindings in
NamedValues, saving any previously shadowed bindings - Codegen the body under the new bindings
- Restore the old bindings (or remove the names) after the body finishes
Value *VarExprAST::codegen() {
vector<pair<string, AllocaInst *>> OldBindings; // saved outer bindings to restore later
Function *TheFunction = Builder->GetInsertBlock()->getParent();
for (auto &Var : VarNames) {
const string &VarName = Var.first;
ExprAST *Init = Var.second.get();
// Evaluate the initializer before installing the new binding,
// so "var x = x: ..." looks up the outer x, not the new one.
Value *InitVal = Init->codegen();
if (!InitVal) return nullptr;
AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
Builder->CreateStore(InitVal, Alloca);
// Save the old binding (may be nullptr if name was not in scope).
OldBindings.push_back({VarName, NamedValues[VarName]});
NamedValues[VarName] = Alloca; // shadow any outer binding
}
Value *BodyVal = Body->codegen();
if (!BodyVal) return nullptr;
// Restore outer bindings in reverse order.
for (auto I = OldBindings.rbegin(), E = OldBindings.rend(); I != E; ++I) {
if (I->second)
NamedValues[I->first] = I->second; // restore saved binding
else
NamedValues.erase(I->first); // name was not in scope before — remove it
}
return BodyVal;
}
This gives var normal lexical shadowing behavior. If an outer variable already has the same name, the inner var temporarily replaces it, then the old binding is restored after the body.
Parameters Become Mutable Too
Once NamedValues holds allocas, function parameters must use the same representation. FunctionAST::codegen now creates an entry-block alloca for each argument and stores the incoming LLVM argument value into it:
NamedValues.clear();
for (auto &Arg : TheFunction->args()) {
// Create a stack slot for each parameter.
AllocaInst *Alloca =
CreateEntryBlockAlloca(TheFunction, string(Arg.getName()));
// Copy the incoming argument value into the slot.
Builder->CreateStore(&Arg, Alloca);
NamedValues[string(Arg.getName())] = Alloca;
}
This unifies the whole language: parameters, var locals, and loop variables all live in stack slots. Variable references always load; assignments always store. One model everywhere.
for Loops Switch to the Same Model
The old for implementation bound the loop variable directly to an SSA Value*. That no longer fits now that all mutable locals use allocas. So chapter 10 changes ForExprAST::codegen to use a stack slot for the loop variable too:
// Allocate a stack slot for the loop variable and store the start value.
AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
Builder->CreateStore(StartVal, Alloca);
// ...
// In the loop body, load the current value, add the step, store back.
Value *CurVar =
Builder->CreateLoad(Type::getDoubleTy(*TheContext), Alloca, VarName);
Value *NextVar = Builder->CreateFAdd(CurVar, StepVal, "nextvar");
Builder->CreateStore(NextVar, Alloca);
The loop variable name is installed in NamedValues as an alloca for the duration of the loop, then restored (or removed) afterward — the same pattern as VarExprAST::codegen.
What the IR Looks Like
A tiny update like var x = 1: x = x + 2 often optimizes all the way down to a single constant result. To see the mutable-variable machinery clearly, it helps to use a slightly larger example. Here is a sum written with an accumulator:
@binary(1)
def ;(x, y): return y
def sum_to(n): return var acc = 0:
(for i = 1, i < n + 1, 1: acc = acc + i) ; acc
With -O0 -v, Pyxc prints the unoptimized IR:
define double @sum_to(double %n) {
entry:
%i = alloca double, align 8
%acc = alloca double, align 8
%n1 = alloca double, align 8
store double %n, ptr %n1, align 8
store double 0.000000e+00, ptr %acc, align 8
br label %loop_cond
loop_cond:
store double ..., ptr %i, align 8
%addtmp = fadd double %n, 1.000000e+00
%cmptmp = fcmp olt double ..., %addtmp
br i1 %cmptmp, label %loop_body, label %after_loop
loop_body:
%addtmp6 = fadd double ..., ...
store double %addtmp6, ptr %acc, align 8
...
br label %loop_cond
after_loop:
%binop = call double @"binary;"(double 0.000000e+00, double ...)
ret double %binop
}
Three things to notice:
%n1,%acc, and%iare stack slots created byallocastorewrites new values into those slots;load(elided in the summary above) reads them out- the IR still contains explicit
alloca/load/store— LLVM'smem2regpass would promote most of these to pure SSA form, but chapter 10 does not run that pass so the storage model stays visible
Build and Run
cd code/chapter-10
cmake -S . -B build && cmake --build build
./build/pyxc
Try It
Simple local update:
ready> var x = 1: x = x + 2Parsed a top-level expression. Evaluated to 3.000000
Multiple bindings — later initializers see earlier ones:
ready> var x = 1, y = x + 2: yParsed a top-level expression. Evaluated to 3.000000
Local variable inside a function:
ready> def bump(n): return var x = n: x = x + 1Parsed a function definition.ready> bump(5)Parsed a top-level expression. Evaluated to 6.000000
Accumulator with a loop:
ready> @binary(1) def ;(x, y): return yParsed a user-defined operator.ready> def sum_to(n): return var acc = 0: (for i = 1, i < n + 1, 1: acc = acc + i) ; accParsed a function definition.ready> sum_to(5)Parsed a top-level expression. Evaluated to 15.000000
Invalid assignment target:
ready> (1 + 2) = 3Error (Line 1, Column 9): Destination of '=' must be a variable (1 + 2) = ^~~~
What's Next
Chapter 11 replaces the single-expression function body with real statement blocks. That makes mutable variables much more natural to use: assignment can stand on its own line, return can appear anywhere in a function body, and examples stop needing expression-level workarounds like var acc = 0: (for ...) ; acc.
Need Help?
Build issues? Questions?
- GitHub Issues: Report problems
- Discussions: Ask questions
Include:
- Your OS and version
- Full error message
- Output of
cmake --version,ninja --version, andllvm-config --version
We'll figure it out.