18. pyxc: Pointers
Where We Are
Chapter 17 added structs, but with a catch: structs are passed by value. If you hand a struct to a function and the function modifies a field, the caller's copy is unchanged. That's fine for pure computations, and it's a deliberate design choice — but sometimes you actually want to modify the caller's data.
That's what pointers are for. After this chapter:
struct Point:
x: int
y: int
def translate(p: ptr[Point], dx: int, dy: int) -> None:
p[0].x = p[0].x + dx
p[0].y = p[0].y + dy
def main() -> int:
var pt: Point
pt.x = 3
pt.y = 4
translate(addr(pt), 10, 20)
printd(float64(pt.x)) # 13.000000
printd(float64(pt.y)) # 24.000000
return 0
Source Code
git clone --depth 1 https://github.com/alankarmisra/pyxc-llvm-tutorial
cd pyxc-llvm-tutorial/code/chapter-18
Grammar
type ::= ...
| 'ptr' '[' type ']' (* pointer to type — no nesting allowed *)
addr-expr ::= 'addr' '(' lvalue ')'
lvalue ::= identifier ('.' identifier)*
index-expr ::= lvalue '[' expression ']' ('.' identifier)*
index-assign ::= lvalue '[' expression ']' ('.' identifier)* '=' expression
ptr[T] is a type annotation only — you cannot construct one without addr. addr takes an lvalue (a named variable, optionally followed by field access) and returns a pointer to it. p[i] reads or writes the value at offset i from the pointer. p[i].field chains field access after indexing, for pointers to structs.
Nested pointer types (ptr[ptr[int]]) and pointers to None are rejected at parse time.
Two New Keywords
tok_ptr = -35,
tok_addr = -36,
Registered in the keyword map:
{"ptr", tok_ptr}, {"addr", tok_addr}
ValueType::Pointer
Pointer is a new entry in the ValueType enum, after Struct. Unlike scalar types, a pointer value is not self-describing — ValueType::Pointer alone does not tell you what the pointer points to. You need the pointee type alongside it.
For scalar types and structs, the pointee information is carried in the StructName string field that already exists on every ExprAST node. For pointers, that same field is reused to carry an encoded string describing the pointee:
static string EncodePointerType(ValueType PointeeType,
const string &PointeeStructName) {
return std::to_string(static_cast<int>(PointeeType)) + ":" + PointeeStructName;
}
static bool DecodePointerType(const string &Encoded, ValueType &PointeeType,
string &PointeeStructName) {
auto Pos = Encoded.find(':');
// split on ':', parse the int as ValueType, rest is struct name
...
}
Every type in the compiler is described by two fields: a ValueType enum and a StructName string. For most types StructName is empty. For structs it holds the struct name. For pointers it holds a serialized description of the pointee, because ValueType::Pointer alone does not say what the pointer points to:
| Type | ValueType |
StructName |
|---|---|---|
int, float64, … |
Int, Float64, … |
"" |
Point (struct) |
Struct |
"Point" |
ptr[int] |
Pointer |
"1:" |
ptr[Point] |
Pointer |
"10:Point" |
The format is "<ValueType int>:<struct name>". ptr[int] encodes as "1:" (ValueType::Int = 1, no struct name). ptr[Point] encodes as "10:Point" (ValueType::Struct = 10, struct name "Point"). The caller that created the pointer type is responsible for encoding; any site that needs the pointee type decodes it.
This reuses the existing type-tracking infrastructure without adding a new field to the base class. It is a tradeoff: the encoding is not beautiful, but it works and the surface is small.
The LLVM Pointer Type
In LLVM IR, all pointer types are the same opaque type:
case ValueType::Pointer:
return PointerType::getUnqual(*TheContext);
PointerType::getUnqual produces the opaque ptr type — LLVM does not distinguish ptr[int] from ptr[Point] in the type system. The element type only appears in getelementptr and load/store instructions, not in the pointer type itself. This is LLVM's opaque pointer model, which has been the default since LLVM 15.
The zero value for a pointer is null:
case ValueType::Pointer:
return ConstantPointerNull::get(cast<PointerType>(LLVMTypeFor(ValueType::Pointer)));
var p: ptr[int] with no initializer starts as a null pointer.
Parsing ptr[T]
ParseTypeToken handles the tok_ptr case:
case tok_ptr: {
getNextToken(); // eat 'ptr'
// expect '['
getNextToken(); // eat '['
string PointeeStructName;
ValueType PointeeType = ParseTypeToken(&PointeeStructName);
// reject None and nested ptr
getNextToken(); // eat ']'
if (StructName)
*StructName = EncodePointerType(PointeeType, PointeeStructName);
return ValueType::Pointer;
}
The parsed pointee type is immediately encoded and written into the StructName output parameter. From this point on, the pointer's pointee information travels with it as an opaque string through VarScopes, PrototypeAST::ArgInfo, VariableExprAST, and every other place that stores a ValueType alongside a StructName.
addr: Taking the Address of an Lvalue
addr(x) returns a pointer to x. addr(p.x) returns a pointer to the field x of struct p. ParseAddrExpr handles both:
static unique_ptr<ExprAST> ParseAddrExpr() {
getNextToken(); // eat 'addr'
// expect '('
getNextToken(); // eat '('
// expect identifier — addr requires an lvalue
string BaseName = IdentifierStr;
getNextToken(); // eat identifier
ValueType CurType = LookupVarType(BaseName);
// walk optional field chain: addr(o.inner.value)
vector<string> Path;
while (CurTok == '.') {
// validate each field, advance CurType and CurStruct
Path.push_back(Field);
}
// expect ')'
return make_unique<AddrExprAST>(BaseName, Path, CurType,
EncodePointerType(CurType, CurStruct));
}
The resulting AddrExprAST has type ValueType::Pointer and its StructName holds the encoded pointee type.
addr only accepts a named variable, optionally with field access. Expressions like addr(1 + 2) are rejected immediately — the parser checks for tok_identifier right after the opening (.
AddrExprAST::codegen
Value *AddrExprAST::codegen() {
if (FieldPath.empty()) {
// addr(x) — return the alloca or global directly
auto It = NamedValues.find(BaseName);
if (It != NamedValues.end() && It->second)
return It->second;
if (auto *GV = GetGlobalVariable(BaseName))
return GV;
return LogErrorV("Unknown variable name");
}
// addr(p.x) — return the field pointer from GetFieldAddress
Value *Ptr = GetFieldAddress(BaseName, FieldPath);
return Ptr;
}
For a local variable, LLVM already represents it as an alloca — a pointer to its storage. addr(x) simply returns that pointer without any new instruction. For a struct field, GetFieldAddress (from chapter 17) computes and returns the GEP pointer for that field.
; var x: int = 42
; var p: ptr[int] = addr(x)
%x = alloca i64
store i64 42, ptr %x
%p = alloca ptr
store ptr %x, ptr %p ; addr(x) is just %x — the alloca itself
; var pt: Point
; var px: ptr[int] = addr(pt.x)
%pt = alloca %struct.Point
...
%fieldptr = getelementptr inbounds %struct.Point, ptr %pt, i32 0, i32 0
%px = alloca ptr
store ptr %fieldptr, ptr %px
p[i]: Pointer Indexing
p[i] computes the address at offset i from the pointer and loads from it. p[i] = v stores to it.
Parsing
ParseIndexExpr is called from ParseIdentifierExpr whenever [ follows a pointer-typed variable or field:
static unique_ptr<ExprAST> ParseIndexExpr(string BaseName,
vector<string> FieldPath,
ValueType BaseType,
const string &BaseStructName) {
// reject if base is not a pointer
getNextToken(); // eat '['
auto Index = ParseExpression();
// reject if index is not an integer type
getNextToken(); // eat ']'
// decode pointee type from BaseStructName
return make_unique<IndexExprAST>(BaseName, FieldPath, Index, ElemType, ElemStruct);
}
The element type (what the pointer points to) is decoded from the encoded string in BaseStructName.
BuildIndexElementPtr: the shared address computation
Both reads and writes need the element address. BuildIndexElementPtr computes it without loading:
static Value *BuildIndexElementPtr(IndexExprAST *IdxExpr) {
// load the pointer value from the base variable or field
Value *BasePtr = LoadPointerValue(IdxExpr->getBaseName(),
IdxExpr->getFieldPath(), ...);
// widen index to i64 if needed
Value *IdxVal = ...;
// GEP: &base[i]
return Builder->CreateInBoundsGEP(
LLVMTypeFor(IdxExpr->getType(), IdxExpr->getStructName()),
BasePtr, IdxVal, "elemptr");
}
The index is always widened to i64 before the GEP — LLVM requires a consistent index type.
Read codegen
Value *IndexExprAST::codegen() {
Value *ElemPtr = BuildIndexElementPtr(this);
return Builder->CreateLoad(LLVMTypeFor(getType(), getStructName()),
ElemPtr, "elemload");
}
For p[0] where p: ptr[int]:
%ptrload = load ptr, ptr %p ; load the pointer value
%elemptr = getelementptr inbounds i64, ptr %ptrload, i64 0
%elemload = load i64, ptr %elemptr
For p[1]:
%ptrload = load ptr, ptr %p
%elemptr = getelementptr inbounds i64, ptr %ptrload, i64 1
%elemload = load i64, ptr %elemptr
Write codegen
Value *IndexAssignmentExprAST::codegen() {
Value *ElemPtr = BuildIndexElementPtr(LHS.get());
Value *Val = RHS->codegen();
Val = EmitImplicitCast(Val, RHS->getType(), getType());
Builder->CreateStore(Val, ElemPtr);
return Val;
}
For p[0] = 99 where p: ptr[int]:
%ptrload = load ptr, ptr %p
%elemptr = getelementptr inbounds i64, ptr %ptrload, i64 0
store i64 99, ptr %elemptr
The implicit cast rules from chapter 16 apply — assigning an integer to a ptr[float64] is a type error; assigning int8 to ptr[int] widens.
p[i].field: Field Access After Indexing
For pointers to structs, you can chain field access after the index: p[0].x. This requires a separate AST node because the base is an index expression, not a named variable.
IndexedFieldExprAST
class IndexedFieldExprAST : public ExprAST {
unique_ptr<IndexExprAST> BaseIndex; // the p[i] part
vector<string> FieldPath; // the field chain
...
};
ParseIndexedFieldAccessExpr is called when ParseIdentifierExpr sees a . after parsing an index expression. It walks the field chain exactly like ParseFieldAccessExpr from chapter 17:
static unique_ptr<ExprAST>
ParseIndexedFieldAccessExpr(unique_ptr<IndexExprAST> BaseIndex) {
// walk '.field' chain, validating each step against StructTypes
return make_unique<IndexedFieldExprAST>(BaseIndex, Path, CurType, CurStruct);
}
Codegen
Value *IndexedFieldExprAST::codegen() {
// get the element address without loading (BuildIndexElementPtr)
Value *Ptr = BuildIndexElementPtr(BaseIndex.get());
// walk field GEPs from that address
for (const auto &FieldName : FieldPath) {
Ptr = Builder->CreateStructGEP(BaseLLVM, Ptr, Idx, "fieldptr");
// advance type
}
return Builder->CreateLoad(LLVMTypeFor(getType(), getStructName()), Ptr, "fieldload");
}
BuildIndexElementPtr computes the element address without loading the struct value — so the struct GEPs can chain directly from the element pointer.
For p[0].x where p: ptr[Point]:
%ptrload = load ptr, ptr %p
%elemptr = getelementptr inbounds %struct.Point, ptr %ptrload, i64 0
%fieldptr = getelementptr inbounds %struct.Point, ptr %elemptr, i32 0, i32 0
%fieldload = load i64, ptr %fieldptr
For p[0].x = v (write):
%ptrload = load ptr, ptr %p
%elemptr = getelementptr inbounds %struct.Point, ptr %ptrload, i64 0
%fieldptr = getelementptr inbounds %struct.Point, ptr %elemptr, i32 0, i32 0
store i64 %v, ptr %fieldptr
Two GEPs: one to reach element 0 of the array, one to reach field x of that element. No load between them — the pointer chains through.
Mutation Through a Pointer Parameter
This is the payoff. A function that takes ptr[T] can modify the caller's data:
def set_value(p: ptr[int], v: int) -> None:
p[0] = v
def main() -> int:
var x: int = 5
set_value(addr(x), 100)
# x is now 100
return 0
define void @set_value(ptr %p, i64 %v) {
entry:
%p.addr = alloca ptr
store ptr %p, ptr %p.addr
%ptrload = load ptr, ptr %p.addr
%elemptr = getelementptr inbounds i64, ptr %ptrload, i64 0
store i64 %v, ptr %elemptr
ret void
}
The pointer is passed by value (it's just an address), but the store through it writes to x's alloca in the caller's stack frame. The caller sees the updated value.
Pointer arguments are type-checked: passing ptr[float64] where ptr[int] is expected is a type error.
Parse Flow in ParseIdentifierExpr
The full sequence of what ParseIdentifierExpr handles, in order:
- Parse the base identifier.
- If
.follows → parse field chain (FieldExprAST). - If
[follows → parse index expression (IndexExprAST). - If
.follows after step 3 → parse field chain on the index result (IndexedFieldExprAST).
This covers: x, p.field, p[i], p.field[i], p[i].field, p[i].field.subfield.
The statement parser has the same sequence to handle the left side of assignments.
Build and Run
cd code/chapter-18
cmake -S . -B build && cmake --build build
Try It
Take an address, read through it
extern def printd(x: float64)
def main() -> int:
var x: int = 42
var p: ptr[int] = addr(x)
printd(float64(p[0]))
return 0
42.000000
Write through a pointer, see it in the caller
extern def printd(x: float64)
def main() -> int:
var x: int = 5
var p: ptr[int] = addr(x)
p[0] = 99
printd(float64(x))
return 0
99.000000
Pass a struct by pointer
extern def printd(x: float64)
struct Point:
x: int
y: int
def set_x(p: ptr[Point], v: int) -> None:
p[0].x = v
def main() -> int:
var pt: Point
pt.x = 3
set_x(addr(pt), 7)
printd(float64(pt.x))
return 0
7.000000
Address of a struct field
extern def printd(x: float64)
struct Point:
x: int
y: int
def main() -> int:
var p: Point
p.x = 11
var px: ptr[int] = addr(p.x)
printd(float64(px[0]))
return 0
11.000000
Inspect the IR
pyxc --emit llvm-ir -o out.ll program.pyxc
grep 'getelementptr\|load\|store' out.ll
Known Limitations
No pointer arithmetic. p + 1 is not supported — use p[1] to access adjacent elements.
No nested pointers. ptr[ptr[int]] is rejected at parse time.
No pointer comparisons. p == nullptr is not supported.
No pointer-to-pointer casting. You cannot reinterpret a ptr[int] as a ptr[float64].
Null pointer is silent. var p: ptr[int] with no initializer is a null pointer. Dereferencing it crashes at runtime with no helpful error. Bounds checking and null safety are not implemented.
Pointee type is encoded in a string. The StructName field on ExprAST nodes doubles as pointer type metadata, stored as "<ValueType int>:<struct name>" (e.g. "1:" for ptr[int], "10:Point" for ptr[Point]). It works but is not the cleanest representation — a dedicated field would be cleaner. This is a consequence of the single-AST-hierarchy design established in chapter 12.
What's Next
Chapter 19 adds fixed-size arrays: T[N], stack allocation, indexing, and array-to-pointer decay. With arrays and pointers in place, you have the building blocks for the string and C interop chapter that follows.
Need Help?
Build issues? Questions?
- GitHub Issues: Report problems
- Discussions: Ask questions
Include:
- Your OS and version
- Full error message
- Output of
cmake --version,ninja --version, andllvm-config --version
We'll figure it out.