38. pyxc: Unsigned Integer Types
Where We Are
Chapter 37 added character literals. pyxc has had signed integers since Chapter 16, but all of them interpret their top bit as a sign. Sizes, counts, and bit masks are commonly stored as unsigned values in systems code, and without unsigned types the compiler has no way to generate the right instructions for them. After this chapter, uint8, uint16, uint32, and uint64 are available:
extern def printd(x: float64)
def main() -> int:
var flags: uint32 = 0
flags |= uint32(1) << uint32(3) # set bit 3
flags |= uint32(1) << uint32(7) # set bit 7
var mask: uint32 = uint32(0xFF)
printd(float64(flags & mask)) # 136.000000
return 0
136.000000
Source Code
git clone --depth 1 https://github.com/alankarmisra/pyxc-llvm-tutorial
cd pyxc-llvm-tutorial/code/chapter-38
New Tokens, Keywords, and ValueType Enum Values
Four new tokens and keywords:
tok_uint8 = -65,
tok_uint16 = -66,
tok_uint32 = -67,
tok_uint64 = -68,
{"uint8", tok_uint8}, {"uint16", tok_uint16},
{"uint32", tok_uint32}, {"uint64", tok_uint64},
Four new values in the ValueType enum:
UInt8,
UInt16,
UInt32,
UInt64,
ParseTypeToken gets cases for all four so they work in type annotations and the casttype production:
case tok_uint8: getNextToken(); BaseType = ValueType::UInt8; break;
case tok_uint16: getNextToken(); BaseType = ValueType::UInt16; break;
case tok_uint32: getNextToken(); BaseType = ValueType::UInt32; break;
case tok_uint64: getNextToken(); BaseType = ValueType::UInt64; break;
No New LLVM IR Types
LLVM has no separate "unsigned integer" types. uint32 and int32 are both i32 in the IR. LLVMTypeFor maps the four new ValueType values to the same LLVM types as their signed counterparts:
case ValueType::UInt8: return Type::getInt8Ty(*TheContext);
case ValueType::UInt16: return Type::getInt16Ty(*TheContext);
case ValueType::UInt32: return Type::getInt32Ty(*TheContext);
case ValueType::UInt64: return Type::getInt64Ty(*TheContext);
The signedness lives entirely in which instruction the compiler emits.
IsUnsignedIntType and IsSignedIntType
Two new predicate functions drive all instruction selection:
static bool IsUnsignedIntType(ValueType Type) {
return Type == ValueType::UInt8 || Type == ValueType::UInt16 ||
Type == ValueType::UInt32 || Type == ValueType::UInt64;
}
static bool IsSignedIntType(ValueType Type) {
return IsIntType(Type) && !IsUnsignedIntType(Type);
}
IsIntType is expanded to include all four unsigned types:
return Type == ValueType::Int || Type == ValueType::Int8 || ... ||
Type == ValueType::UInt8 || Type == ValueType::UInt16 ||
Type == ValueType::UInt32 || Type == ValueType::UInt64;
Implicit Widening Rule — Same Signedness Only
IsAssignable gains a signedness gate. The bit-width comparison added in the previous chapter is now also gated on signedness:
if (IsIntType(From) && IsIntType(To)) {
unsigned FromBits = LLVMTypeFor(From)->getIntegerBitWidth();
unsigned ToBits = LLVMTypeFor(To)->getIntegerBitWidth();
if (IsUnsignedIntType(From) != IsUnsignedIntType(To))
return false; // signed/unsigned mixing forbidden implicitly
return FromBits <= ToBits;
}
uint8 → uint64 widens without a cast. int32 → uint32 or uint32 → int64 requires an explicit cast. This matches the design intent: implicit signed/unsigned conversion is a common bug source in C; pyxc won't do it silently.
Instruction Selection — Seven Changed Sites
Integer widening (EmitImplicitCast)
// Before: always sext
return Builder->CreateSExt(V, LLVMTypeFor(To), "sext");
// After:
return IsUnsignedIntType(From)
? Builder->CreateZExt(V, LLVMTypeFor(To), "zext")
: Builder->CreateSExt(V, LLVMTypeFor(To), "sext");
Unsigned types use zext (zero-extend) rather than sext (sign-extend).
Integer → float
return IsUnsignedIntType(From)
? Builder->CreateUIToFP(V, LLVMTypeFor(To), "uitofp")
: Builder->CreateSIToFP(V, LLVMTypeFor(To), "sitofp");
uitofp treats the bit pattern as an unsigned integer, producing the correct positive float for uint32(-1) = 4294967295.0.
Float → integer
return IsUnsignedIntType(To)
? Builder->CreateFPToUI(V, LLVMTypeFor(To), "fptoui")
: Builder->CreateFPToSI(V, LLVMTypeFor(To), "fptosi");
Division and remainder
// / operator:
return IsUnsignedIntType(ResultType) ? Builder->CreateUDiv(L, R, "divtmp")
: Builder->CreateSDiv(L, R, "divtmp");
// % operator:
return IsUnsignedIntType(ResultType) ? Builder->CreateURem(L, R, "modtmp")
: Builder->CreateSRem(L, R, "modtmp");
Right shift
return IsUnsignedIntType(Ty) ? Builder->CreateLShr(L, R, "shrtmp")
: Builder->CreateAShr(L, R, "shrtmp");
lshr fills vacated high bits with zero. ashr fills with the sign bit.
Comparisons (<, <=, >, >=)
// '<':
return IsUnsignedIntType(CompareType)
? Builder->CreateICmpULT(L, R, "cmptmp")
: Builder->CreateICmpSLT(L, R, "cmptmp");
// '>':
return IsUnsignedIntType(CompareType)
? Builder->CreateICmpUGT(L, R, "cmptmp")
: Builder->CreateICmpSGT(L, R, "cmptmp");
// '<=':
return IsUnsignedIntType(CompareType)
? Builder->CreateICmpULE(L, R, "cmptmp")
: Builder->CreateICmpSLE(L, R, "cmptmp");
// '>=':
return IsUnsignedIntType(CompareType)
? Builder->CreateICmpUGE(L, R, "cmptmp")
: Builder->CreateICmpSGE(L, R, "cmptmp");
== and != are signedness-agnostic (icmp eq / icmp ne); they are unchanged.
Literal range check
ParseNumberExpr already checks that a literal fits in the target type. The max value calculation is updated to use APInt::getAllOnes(Bits) for unsigned types:
APInt Max = IsUnsignedIntType(Type) ? APInt::getAllOnes(Bits)
: APInt::getSignedMaxValue(Bits);
getAllOnes is the all-bits-set value (0xFF, 0xFFFF, etc.), which is the maximum for an unsigned type. getSignedMaxValue is 0x7F, 0x7FFF, etc.
Explicit Casts
Explicit casts between signed and unsigned types are always allowed. They reinterpret the bit pattern:
var x: int32 = -1
var y: uint32 = uint32(x) # 4294967295
var z: int32 = int32(y) # -1
Same bit width: bits are unchanged. Narrowing truncates to the low bits.
Grammar
builtintype = "int" | "int8" | "int16" | "int32" | "int64"
| "uint8" | "uint16" | "uint32" | "uint64" -- changed
| "float" | "float32" | "float64"
| "bool" | "None" ;
casttype = "int" | "int8" | "int16" | "int32" | "int64"
| "uint8" | "uint16" | "uint32" | "uint64" -- changed
| "float" | "float32" | "float64"
| "bool" | pointertype ;
Error Cases
Implicit signed/unsigned mix:
var a: uint32 = 1
var b: int32 = 2
a = a + b # Error: Type mismatch
Cast explicitly: a = a + uint32(b).
Things Worth Knowing
uint64(-1) is 18446744073709551615. Converting it to float64 rounds up because float64 can only represent integers exactly up to 2^53.
Right shift is always logical for unsigned types. uint32(-1) >> 1 fills the vacated high bit with zero, giving 2147483647.
size_t maps to uint64 on 64-bit targets. When calling C functions that take or return size_t, declare the parameter as uint64.
What's Next
Chapter 39 allows assignment to appear inside an expression — enabling the while (c = getchar()) != EOF pattern from K&R.
Need Help?
Build issues? Questions?
- GitHub Issues: Report problems
- Discussions: Ask questions
Include:
- Your OS and version
- Full error message
- Output of
cmake --version,ninja --version, andllvm-config --version
We'll figure it out.