24. pyxc: Classes
Where We Are
Chapter 23 added arrays. We now have a decent type system, but the only aggregate type is struct. After this chapter, the class keyword is available:
class Point:
x: int
y: int
def main() -> int:
var p: Point
p.x = 3
p.y = 4
return 0
On its own, class behaves identically to struct — same field layout, same IR, same field access syntax. The difference lives inside the compiler: a class sets IsClass = true in StructTypeInfo. That flag is what the next three chapters gate everything on.
Source Code
git clone --depth 1 https://github.com/alankarmisra/pyxc-llvm-tutorial
cd pyxc-llvm-tutorial/code/chapter-24
Grammar
This chapter adds one new production and extends top.
top = typealias | structdef | classdef | definition | decorateddef | external | toplevelexpr ; -- changed
classdef = "class" identifier ":" eols structblock ; -- new
classdef and structdef share the same body grammar (structblock). The only syntactic difference is the keyword.
Full Grammar
code/chapter-24/pyxc.ebnf
program = [ eols ] [ top { eols top } ] [ eols ] ;
eols = eol { eol } ;
top = typealias | structdef | classdef | definition | decorateddef | external | toplevelexpr ;
typealias = "type" identifier "=" type ;
structdef = "struct" identifier ":" eols structblock ;
classdef = "class" identifier ":" eols structblock ;
structblock = indent fielddecl { eols fielddecl } dedent ;
fielddecl = identifier ":" type ;
definition = "def" prototype [ "->" type ] ":" ( simplestmt | eols block ) ;
decorateddef = binarydecorator eols "def" binaryopprototype [ "->" type ] ":" ( simplestmt | eols block )
| unarydecorator eols "def" unaryopprototype [ "->" type ] ":" ( simplestmt | eols block ) ;
binarydecorator = "@" "binary" "(" integer ")" ;
unarydecorator = "@" "unary" ;
binaryopprototype = customopchar "(" typedparam "," typedparam ")" ;
unaryopprototype = customopchar "(" typedparam ")" ;
external = "extern" "def" prototype [ "->" type ] ;
toplevelexpr = expression ;
prototype = identifier "(" [ typedparam { "," typedparam } ] ")" ;
typedparam = identifier ":" type ;
ifstmt = "if" expression ":" suite
[ eols "else" ":" suite ] ;
forstmt = "for"
( "var" identifier ":" type | identifier )
"=" expression "," expression "," expression ":" suite ;
varstmt = "var" varbinding { "," varbinding } ;
assignstmt = lvalue "=" expression ;
simplestmt = returnstmt | varstmt | assignstmt | expression ;
compoundstmt = ifstmt | forstmt ;
statement = simplestmt | compoundstmt ;
suite = simplestmt | compoundstmt | eols block ;
returnstmt = "return" [ expression ] ;
block = indent statement { eols statement } dedent ;
expression = unaryexpr binoprhs ;
binoprhs = { binaryop unaryexpr } ;
lvalue = identifier | fieldaccess | indexexpr ;
varbinding = identifier ":" type [ "=" expression ] ;
unaryexpr = unaryop unaryexpr | primary ;
unaryop = "-" | userdefunaryop ;
primary = castexpr | sizeofexpr | addrexpr | arrayliteral | stringliteral | identifierexpr | fieldaccess | indexexpr | numberexpr | bool_literal | parenexpr ;
castexpr = casttype "(" expression ")" ;
sizeofexpr = "sizeof" "(" type ")" ;
addrexpr = "addr" "(" lvalue ")" ;
identifierexpr = identifier | callexpr ;
callexpr = identifier "(" [ expression { "," expression } ] ")" ;
fieldaccess = identifier "." identifier { "." identifier } ;
indexexpr = identifier "[" expression "]" ;
numberexpr = number ;
arrayliteral = "[" [ expression { "," expression } ] "]" ;
stringliteral = "\"" { ? any char except " and newline ? | escape } "\"" ;
escape = "\\" ( "\\" | "\"" | "n" | "t" | "0" ) ;
parenexpr = "(" expression ")" ;
binaryop = builtinbinaryop | userdefbinaryop ;
indent = INDENT ;
dedent = DEDENT ;
builtinbinaryop = "+" | "-" | "*" | "<" | "<=" | ">" | ">=" | "==" | "!=" ;
userdefbinaryop = ? any opchar defined as a custom binary operator ? ;
userdefunaryop = ? any opchar defined as a custom unary operator ? ;
customopchar = ? any opchar that is not "-" or a builtinbinaryop,
and not already defined as a custom operator ? ;
opchar = ? any single ASCII punctuation character ? ;
identifier = (letter | "_") { letter | digit | "_" } ;
builtintype = "int" | "int8" | "int16" | "int32" | "int64"
| "float" | "float32" | "float64"
| "bool" | "None" ;
aliastype = identifier ;
structtype = identifier ;
pointertype = "ptr" "[" type "]" ;
type = basetype [ arraysuffix ] ;
basetype = builtintype | aliastype | structtype | pointertype ;
arraysuffix = "[" integer "]" ;
casttype = "int" | "int8" | "int16" | "int32" | "int64"
| "float" | "float32" | "float64"
| "bool" | pointertype ;
integer = digit { digit } ;
number = digit { digit } [ "." { digit } ]
| "." digit { digit } ;
bool_literal = "True" | "False" ;
letter = "A".."Z" | "a".."z" ;
digit = "0".."9" ;
eol = "\r\n" | "\r" | "\n" ;
ws = " " | "\t" ;
INDENT = ? synthetic token emitted by lexer ? ;
DEDENT = ? synthetic token emitted by lexer ? ;
New Token
tok_class = -40,
Registered in the keyword table alongside tok_struct:
{"struct", tok_struct},
{"class", tok_class},
tok_class is also added to the token name map so error messages print 'class' rather than a raw integer.
One Parser, Two Keywords — ParseAggregateDefinition
Previously the struct parser was ParseStructDefinition(). This chapter replaces it with a single function that handles both keywords. The caller passes "struct" or "class" as a string, and the parser uses it only to:
- Produce readable error messages (
"Expected struct name"vs"Expected class name") - Set the
IsClassflag (see Group 3)
static bool ParseAggregateDefinition(const char *KindName) {
// CurTok is tok_struct or tok_class
getNextToken(); // eat keyword
if (CurTok != tok_identifier)
return LogError((string("Expected ") + KindName + " name").c_str());
string StructName = IdentifierStr;
// Check for redefinition
if (StructTypes.count(StructName))
return LogError(("Aggregate '" + StructName + "' is already defined").c_str());
getNextToken(); // eat aggregate name
// ... parse ':', INDENT, fields, DEDENT ...
Info.IsClass = (strcmp(KindName, "class") == 0);
StructTypes[StructName] = std::move(Info);
return true;
}
All error strings are parameterised on KindName, so a mis-formed class body reports "Expected dedent after class body" instead of the generic struct message.
HandleStructDef and HandleClassDef each call this with "struct" or "class":
static void HandleStructDef() {
bool Ok = ParseAggregateDefinition("struct");
if (!Ok) { SynchronizeToLineBoundary(); return; }
// check for trailing tokens on the same line
}
static void HandleClassDef() {
bool Ok = ParseAggregateDefinition("class");
// same error recovery
}
HandleStructDef now also includes improved error recovery — if parsing succeeds but there are unexpected tokens on the same line, it logs an error and synchronises to the line boundary.
The dispatch loops in MainLoop and FileModeLoop add a tok_class case:
case tok_class:
HandleClassDef();
break;
The IsClass Flag
StructTypeInfo gains one boolean field:
struct StructTypeInfo {
string Name;
bool IsClass = false; // new
vector<FieldInfo> Fields;
// ...
};
ParseAggregateDefinition sets IsClass = true when KindName == "class", false otherwise. This flag is the sole distinction between structs and classes in StructTypes. It does nothing yet in chapter 24 — but chapter 25 checks it before parsing methods, chapter 26 checks it for constructors, and so on.
IR Layout
A class has exactly the same IR layout as a struct with the same fields. The LLVM type system uses a "struct." prefix for the named aggregate type for both source-level struct and class — a comment in the code makes this explicit:
// LLVM named aggregate types use a conventional "struct." prefix here for
// both source-level 'struct' and 'class' in chapter 24. They are layout-
// equivalent at this stage; chapter 25 can layer semantic distinctions.
class Vec2:
x: float64
y: float64
%Vec2 = type { double, double }
There is nothing in the generated LLVM IR that distinguishes a class Vec2 from a struct Vec2. The distinction is a compile-time concept only.
Conflict Rules
Class names and struct names share the same namespace (StructTypes). Defining a class and a struct with the same name, in either order, is rejected:
struct Foo:
x: int
class Foo: # Error: Aggregate 'Foo' is already defined
y: int
Type alias names also conflict: a class name that collides with an existing type alias, or vice versa, is rejected with "Name '...' is already defined as an aggregate type".
Build and Run
cd code/chapter-24
cmake -S . -B build && cmake --build build
What's Next
Chapter 25 adds methods — functions defined inside a class body and called with obj.method(args). The IsClass flag gates all of this: structs do not get methods.
Need Help?
Build issues? Questions?
- GitHub Issues: Report problems
- Discussions: Ask questions
Include:
- Your OS and version
- Full error message
- Output of
cmake --version,ninja --version, andllvm-config --version
We'll figure it out.