34. pyxc: Bitwise Operators
Where We Are
Chapter 33 completed the loop story. The last major gap before K&R-style systems programming is bitwise manipulation. After this chapter, flags, masks, and bit-shifting all work:
extern def printd(x: float64)
def main() -> int:
var flags: int = 0
flags = flags | 1 # set bit 0
flags = flags | 4 # set bit 2
flags = flags & ~2 # clear bit 1 (already clear, but pattern works)
var shifted: int = 1 << 3 # 8
var masked: int = shifted & 0xFF
printd(float64(flags + masked))
return 0
13.000000
Source Code
git clone --depth 1 https://github.com/alankarmisra/pyxc-llvm-tutorial
cd pyxc-llvm-tutorial/code/chapter-34
New Tokens for << and >>
Single-character operators &, |, ^, and ~ are already handled by the catch-all ASCII path — they are returned as their character values. The shift operators require two new token values because < and > are already comparison operators:
tok_shl = -58, // <<
tok_shr = -59, // >>
Lexer Peek-Ahead for Shifts
The existing < and > paths in the lexer previously looked ahead for = only. They are expanded to also look for a second < or >:
if (LexerLastChar == '<') {
int Next = peek();
int Tok = '<';
if (Next == '=')
Tok = (advance(), tok_leq); // '<=' — comparison
else if (Next == '<')
Tok = (advance(), tok_shl); // '<<' — left shift
LexerLastChar = advance();
return Tok;
}
if (LexerLastChar == '>') {
int Next = peek();
int Tok = '>';
if (Next == '=')
Tok = (advance(), tok_geq); // '>=' — comparison
else if (Next == '>')
Tok = (advance(), tok_shr); // '>>' — right shift
LexerLastChar = advance();
return Tok;
}
peek() reads ahead without consuming. If the next character matches, advance() consumes it and the two-character token is returned. Any other character leaves Tok as the single character.
Precedence — A Full Restructuring
Adding bitwise operators requires reorganising the existing precedence table. Previously all comparisons sat at the same level (10). The new table follows C exactly:
{tok_or, 5}, // ||
{tok_and, 7}, // &&
{'|', 10}, // |
{'^', 11}, // ^
{'&', 12}, // &
{tok_eq, 13}, // ==
{tok_neq,13}, // !=
{tok_leq,14}, // <=
{tok_geq,14}, // >=
{'<', 14}, // <
{'>', 14}, // >
{tok_shl,15}, // <<
{tok_shr,15}, // >>
// arithmetic remains at 20–40
The important consequence: & is below == and !=. So a & b == 0 parses as a & (b == 0), not (a & b) == 0. If you want the latter, add parentheses.
Type-Checking Predicates and GetBinaryResultType
Two predicates identify the new operator families:
static bool IsBitwiseOp(int Op) { return Op == '&' || Op == '|' || Op == '^'; }
static bool IsShiftOp(int Op) { return Op == tok_shl || Op == tok_shr; }
GetBinaryResultType gains two new branches. For bitwise ops, both operands must be integers; IsAssignable picks the wider of the two as the result type:
if (IsBitwiseOp(Op)) {
if (!IsIntType(L) || !IsIntType(R))
return ValueType::Error;
if (IsAssignable(L, R))
return L;
if (IsAssignable(R, L))
return R;
return ValueType::Error;
}
For shifts, the result type is always the left operand's type, regardless of the shift count's type:
if (IsShiftOp(Op)) {
if (!IsIntType(L) || !IsIntType(R))
return ValueType::Error;
return L;
}
In ParseBinOpRHS, the built-in type-check dispatch is extended:
if (IsComparisonOp(BinOp) || IsArithmeticOp(BinOp) || IsLogicalOp(BinOp) ||
IsBitwiseOp(BinOp) || IsShiftOp(BinOp)) {
ResultType = GetBinaryResultType(BinOp, LHS->getType(), ...);
if (ResultType == ValueType::Error)
return LogError("Type mismatch in binary operator");
LHS = make_unique<BinaryExprAST>(...);
continue;
}
Type errors are caught at parse time. Codegen never runs on a bad operand pair.
Parsing Unary ~
~ is parsed in ParseUnary alongside - and !. The operand must be an integer type; the result type is the same as the operand:
if (CurTok == '~') {
getNextToken(); // eat '~'
auto Operand = ParseUnary();
if (!Operand)
return nullptr;
if (!IsIntType(Operand->getType()))
return LogError("Unary '~' requires an integer operand");
return make_unique<UnaryExprAST>('~', std::move(Operand),
Operand->getType());
}
~~x (double complement) and ~(x + 1) both parse naturally because the operand is a full unaryexpr.
Codegen: Binary Bitwise Operators
BinaryExprAST::codegen gains cases for &, |, and ^. Both operands are coerced to the result type via EmitImplicitCast — this handles widening selected by GetBinaryResultType (e.g., int32 & int64 widens int32 to int64 before the instruction):
case '&':
case '|':
case '^': {
ValueType Ty = getType();
L = EmitImplicitCast(L, LType, Ty);
R = EmitImplicitCast(R, RType, Ty);
if (!L || !R)
return LogErrorV("Type mismatch in binary operator");
if (Op == '&')
return Builder->CreateAnd(L, R, "bwand");
if (Op == '|')
return Builder->CreateOr(L, R, "bwor");
return Builder->CreateXor(L, R, "bwxor");
}
Each maps directly to a single LLVM instruction: and, or, or xor. These are integer instructions — LLVM has no floating-point equivalents.
Shifts follow the same structure:
case tok_shl:
case tok_shr: {
ValueType Ty = getType();
L = EmitImplicitCast(L, LType, Ty);
R = EmitImplicitCast(R, RType, Ty);
if (!L || !R)
return LogErrorV("Type mismatch in binary operator");
if (Op == tok_shl)
return Builder->CreateShl(L, R, "shltmp");
return Builder->CreateAShr(L, R, "shrtmp");
}
CreateShl emits shl — shift left, filling low bits with zeros. CreateAShr emits ashr — arithmetic shift right, which fills high bits with the sign bit of the left operand. For a negative int64, x >> 1 stays negative. LLVM also has CreateLShr for logical (zero-filling) right shift, but pyxc doesn't expose it — ashr is correct for signed integers.
Codegen: Unary ~
UnaryExprAST::codegen handles ~ after existing cases for - and !:
if (Opcode == '~') {
if (!IsIntType(getType()))
return LogErrorV("Unary '~' not supported for this type");
return Builder->CreateNot(Op, "bnottmp");
}
CreateNot lowers to xor %val, -1 — XOR-ing every bit with 1 flips it. The name bnottmp (bitwise not) distinguishes it from nottmp (logical not on i1) in the IR.
For a concrete example:
var x: int = 9 # binary: ...0001001
var y: int = ~x # binary: ...1110110 → -10 in two's complement
var z: int = y & 7 # mask the low 3 bits → 6
~9 is -10 because in two's complement, flipping all bits and adding 1 negates: ~x = -(x + 1).
Grammar
unaryop = "-" | "!" | "~" | "++" | "--" | userdefunaryop ; -- changed
builtinbinaryop = "+" | "-" | "*" | "/" | "%"
| "<" | "<=" | ">" | ">=" | "==" | "!="
| "&&" | "||"
| "&" | "|" | "^" | "<<" | ">>" ; -- changed
Error Cases
Bitwise op on float:
var x: float64 = 1.0
var y: float64 = 2.0
var z: float64 = x & y # Error: Type mismatch in binary operator
~ on non-integer:
var x: float64 = 1.0
var y: float64 = ~x # Error: Unary '~' requires an integer operand
Both are caught at parse time and never reach codegen.
Things Worth Knowing
& and | are not && and ||. The single-character forms are bitwise and operate on integers. The double-character forms are logical, operate on bool, and short-circuit.
Compound assignment works with all bitwise operators. x &= mask, flags |= bit, x ^= pattern, x <<= 2, and x >>= 1 are all valid — the compound assignment path already handles any binary operator by token value.
Right shift is arithmetic (sign-extending). For a negative int, x >> 1 fills the high bit with the sign bit. Chapter 38 adds unsigned integer types, which use logical shift right (lshr).
The C precedence gotcha. a & b == 0 means a & (b == 0) in both C and pyxc. Write (a & b) == 0 when you want that.
What's Next
Chapter 35 adds switch statements.
Need Help?
Build issues? Questions?
- GitHub Issues: Report problems
- Discussions: Ask questions
Include:
- Your OS and version
- Full error message
- Output of
cmake --version,ninja --version, andllvm-config --version
We'll figure it out.