14. pyxc: One-Step Executables
Where We Are
Chapter 13 added --emit obj, --emit asm, and --emit llvm-ir. Producing a runnable binary from a Pyxc program still required an external tool:
pyxc --emit obj -o program.o program.pyxc
clang program.o runtime.c -o program # ← still needed clang
./program
This chapter removes that second step. After it:
pyxc --emit exe -o program program.pyxc
./program
No clang, no runtime.c, no separate link invocation. One command.
Source Code
git clone --depth 1 https://github.com/alankarmisra/pyxc-llvm-tutorial
cd pyxc-llvm-tutorial/code/chapter-14
Grammar
No grammar changes. The language is unchanged — this is purely a compiler-driver extension.
The Design
The key insight is that LLVM ships LLD — a full production linker — as a C++ library. Instead of shelling out to clang or ld, Pyxc calls lld::macho::link (or lld::elf::link on Linux) directly in-process. The pipeline becomes:
.pyxc → compile → temp .o
.o inputs → pass through
synthesize runtime .o (printd, putchard)
─────────────────────────────────────────
LLD links all .o files into executable
The runtime functions printd and putchard are generated as LLVM IR and emitted to a temporary .o — no runtime.c or external compiler needed.
What Changes
Five new pieces on top of chapter 13:
cl::list<string> InputFiles— the positional argument changes from a single string to a list, enabling multiple inputs.EmitRuntimeObject— synthesizesprintdandputchardas LLVM IR, emits them to a temporary.o.CompileFileToObject— per-file compilation: open → lex → parse → codegen →.o.PrepareFileModeModule— refactored out ofEmitFileMode: the shared logic that builds__pyxc.global_init, registers it inllvm.global_ctors, and wrapsmain().LinkExecutable+FindMacOSSDKRoot— LLD-as-library dispatch with platform-aware system library detection.EmitExecutable— the new orchestrator that wires all of the above together.
Multiple Inputs: cl::list
In chapter 13, the positional argument was a single optional cl::opt:
// Chapter 13
static cl::opt<std::string> InputFile(cl::Positional, ...);
Chapter 14 changes it to a list so the driver can accept any number of .pyxc and .o files:
// Chapter 14
static cl::list<std::string>
InputFiles(cl::Positional, cl::desc("[inputs]"), cl::ZeroOrMore,
cl::cat(PyxcCategory));
IsRepl is now derived from whether the list is empty:
IsRepl = InputFiles.empty();
The --emit exe path also enforces the multi-input rule:
} else if (EmitKindOpt == "exe") {
EmitMode = EmitKind::EXE;
if (OutputFile.empty() && InputFiles.size() > 1) {
fprintf(stderr, "Error: multiple inputs require -o\n");
return -1;
}
if (!OutputFile.empty())
EmitOutputPath = OutputFile.getValue();
}
--emit llvm-ir, --emit asm, and --emit obj still require exactly one input and are unchanged.
Output Naming: DefaultExeOutputPath
When -o is omitted and there is exactly one input, the output is the input with its extension stripped — and .exe appended on Windows:
static string DefaultExeOutputPath(StringRef InputPath) {
SmallString<256> Out(InputPath);
sys::path::replace_extension(Out, "");
string OutStr = Out.str().str();
#ifdef _WIN32
OutStr += ".exe";
#endif
return OutStr;
}
sys::path::replace_extension handles both .pyxc and .o inputs uniformly: foo.pyxc → foo, mylib.o → mylib.
Synthesizing the Runtime: EmitRuntimeObject
In --emit obj mode (chapter 13), tests linked against runtime.c to get printd and putchard. In --emit exe mode, Pyxc synthesizes those functions itself — no C file, no external compiler:
static bool EmitRuntimeObject(const string &ObjPath) {
LLVMContext Ctx;
auto M = std::make_unique<Module>("pyxc.runtime", Ctx);
auto *DoubleTy = Type::getDoubleTy(Ctx);
auto *Int32Ty = Type::getInt32Ty(Ctx);
auto *PtrTy = PointerType::get(Ctx, 0);
// Declare printf and putchar (provided by libc).
FunctionType *PrintfTy = FunctionType::get(Int32Ty, {PtrTy}, /*vararg=*/true);
Function *Printf = Function::Create(PrintfTy, Function::ExternalLinkage,
"printf", M.get());
FunctionType *PutcharTy = FunctionType::get(Int32Ty, {Int32Ty}, false);
Function *Putchar = Function::Create(PutcharTy, Function::ExternalLinkage,
"putchar", M.get());
// Define printd(double) → double.
FunctionType *PrintdTy = FunctionType::get(DoubleTy, {DoubleTy}, false);
Function *Printd = Function::Create(PrintdTy, Function::ExternalLinkage,
"printd", M.get());
{
BasicBlock *BB = BasicBlock::Create(Ctx, "entry", Printd);
IRBuilder<> B(BB);
auto *FmtGV = B.CreateGlobalString("%f\n", "fmt");
Value *Zero = ConstantInt::get(Int32Ty, 0);
Value *Fmt = B.CreateInBoundsGEP(FmtGV->getValueType(), FmtGV,
{Zero, Zero}, "fmt_ptr");
B.CreateCall(Printf, {Fmt, Printd->getArg(0)});
B.CreateRet(ConstantFP::get(Ctx, APFloat(0.0)));
}
// Define putchard(double) → double.
FunctionType *PutchardTy = FunctionType::get(DoubleTy, {DoubleTy}, false);
Function *Putchard = Function::Create(PutchardTy, Function::ExternalLinkage,
"putchard", M.get());
{
BasicBlock *BB = BasicBlock::Create(Ctx, "entry", Putchard);
IRBuilder<> B(BB);
Value *Ch = B.CreateFPToUI(Putchard->getArg(0), Int32Ty, "ch");
B.CreateCall(Putchar, {Ch});
B.CreateRet(ConstantFP::get(Ctx, APFloat(0.0)));
}
return EmitModuleToFile(M.get(), EmitKind::OBJ, ObjPath);
}
The key points:
- A fresh, independent
LLVMContextandModuleare used — separate from the user's program module. This isolates the runtime from user IR. printfandputcharare declared asextern(they come from libc at link time).printdandputchardare defined withExternalLinkageso the linker can resolve theextern def printd(x)declarations in user code.EmitRuntimeObjectends by callingEmitModuleToFileto write a real.oto a temp path. That.ois added to the link list alongside user objects.
Per-File Compilation: CompileFileToObject
Each .pyxc input goes through its own full parse-codegen-emit cycle:
static bool CompileFileToObject(const string &Path, const string &ObjPath,
bool *HasMain) {
if (!OpenInputFile(Path))
return false;
ResetLexerState();
ResetParserStateForFile();
InitializeModuleAndManagers(false); // false = emit mode, no JIT
IsRepl = false;
getNextToken();
FileModeLoop();
CloseInputFile();
if (HasMain)
*HasMain = FunctionProtos.find("main") != FunctionProtos.end();
if (!PrepareFileModeModule())
return false;
return EmitModuleToFile(TheModule.get(), EmitKind::OBJ, ObjPath);
}
ResetLexerState and ResetParserStateForFile clear the persistent lexer and parser state between files, so each .pyxc is compiled independently. This is what makes multi-file compilation safe — a global declared in a.pyxc does not silently bleed into b.pyxc.
Linking with LLD: LinkExecutable
LinkExecutable dispatches to the right LLD driver based on the host triple:
static bool LinkExecutable(const vector<string> &Inputs,
const string &OutputPath) {
Triple TT(sys::getDefaultTargetTriple());
vector<string> ArgStorage;
auto Push = [&](const string &A) { ArgStorage.push_back(A); };
if (TT.isOSDarwin()) {
Push("ld64.lld");
Push("-arch"); Push(TT.getArchName().str());
Push("-o"); Push(OutputPath);
string SDK = FindMacOSSDKRoot();
if (!SDK.empty()) {
Push("-syslibroot"); Push(SDK);
Push("-L" + SDK + "/usr/lib");
string OSVer = DefaultMacOSVersion(TT);
Push("-platform_version"); Push("macos"); Push(OSVer); Push(OSVer);
for (auto &Crt : {SDK+"/usr/lib/crt1.o", SDK+"/usr/lib/crti.o"})
if (sys::fs::exists(Crt)) Push(Crt);
}
for (auto &I : Inputs) Push(I);
Push("-lSystem");
vector<const char*> Args;
for (auto &A : ArgStorage) Args.push_back(A.c_str());
return lld::macho::link(Args, llvm::outs(), llvm::errs(), false, false);
}
if (TT.isOSLinux()) {
Push("ld.lld");
Push("-o"); Push(OutputPath);
for (auto &I : Inputs) Push(I);
Push("-lc"); Push("-lm");
vector<const char*> Args;
for (auto &A : ArgStorage) Args.push_back(A.c_str());
return lld::elf::link(Args, llvm::outs(), llvm::errs(), false, false);
}
if (TT.isOSWindows()) {
Push("lld-link");
Push("/OUT:" + OutputPath);
for (auto &I : Inputs) Push(I);
vector<const char*> Args;
for (auto &A : ArgStorage) Args.push_back(A.c_str());
return lld::coff::link(Args, llvm::outs(), llvm::errs(), false, false);
}
fprintf(stderr, "Error: unsupported target for --emit exe\n");
return false;
}
The LLD API is the same on every platform: an array of const char* arguments (identical to what you'd pass on the command line), plus output/error streams and two flags — exitEarly (stop on first error) and disableOutput (dry-run). The return value is true on success.
This is a key architectural choice: LLD is called as a library, not as a subprocess. There is no fork/exec, no temporary shell script, no PATH lookup. If the library is linked into the pyxc binary, it is available.
SDK Detection: FindMacOSSDKRoot and DefaultMacOSVersion
LLD's Mach-O linker needs a sysroot to find system headers and libSystem. FindMacOSSDKRoot checks three locations in order:
static string FindMacOSSDKRoot() {
// 1. Respect an explicit user override.
if (const char *EnvSDK = getenv("SDKROOT"))
return string(EnvSDK);
// 2. Xcode.app installation.
const char *XcodeSDK =
"/Applications/Xcode.app/Contents/Developer/Platforms/"
"MacOSX.platform/Developer/SDKs/MacOSX.sdk";
if (sys::fs::exists(XcodeSDK))
return string(XcodeSDK);
// 3. Command Line Tools installation (no full Xcode).
const char *CLTSDK =
"/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk";
if (sys::fs::exists(CLTSDK))
return string(CLTSDK);
return ""; // caller will try without -syslibroot
}
DefaultMacOSVersion extracts the OS version from the host triple (e.g., arm64-apple-macosx14.2.0 → "14.2") and falls back to "11.0" if the triple carries no version:
static string DefaultMacOSVersion(const Triple &TT) {
VersionTuple Ver = TT.getOSVersion();
if (Ver.getMajor()) {
std::ostringstream OS;
OS << Ver.getMajor() << "." << Ver.getMinor().value_or(0);
if (Ver.getSubminor().value_or(0) != 0)
OS << "." << Ver.getSubminor().value();
return OS.str();
}
return "11.0";
}
The -platform_version macos <min> <sdk> flag is required by the Mach-O linker to set the LC_BUILD_VERSION load command. Without it, the linker produces a warning or errors depending on the LLD version.
EmitExecutable: The Orchestrator
static bool EmitExecutable() {
vector<string> ObjectFiles;
vector<string> TempFiles; // cleaned up on exit regardless of success/failure
bool SawMain = false;
for (const auto &InputPath : InputFiles) {
if (IsPyxcInput(InputPath)) {
// Compile .pyxc → temp .o
SmallString<128> TmpPath;
sys::fs::createTemporaryFile("pyxc", "o", TmpPath);
TempFiles.push_back(TmpPath.str().str());
bool FileHasMain = false;
if (!CompileFileToObject(InputPath, TmpPath.str().str(), &FileHasMain))
return CleanupAndFail(TempFiles);
SawMain |= FileHasMain;
ObjectFiles.push_back(TmpPath.str().str());
} else if (IsObjectInput(InputPath)) {
// Pass .o straight to the linker.
ObjectFiles.push_back(InputPath);
} else {
fprintf(stderr, "Error: unsupported input '%s'\n", InputPath.c_str());
return CleanupAndFail(TempFiles);
}
}
// Synthesize and add the runtime object.
SmallString<128> RuntimePath;
sys::fs::createTemporaryFile("pyxc_runtime", "o", RuntimePath);
TempFiles.push_back(RuntimePath.str().str());
if (!EmitRuntimeObject(RuntimePath.str().str()))
return CleanupAndFail(TempFiles);
ObjectFiles.push_back(RuntimePath.str().str());
// Resolve the output path if -o was omitted.
if (EmitOutputPath.empty())
EmitOutputPath = DefaultExeOutputPath(InputFiles.front());
bool OK = LinkExecutable(ObjectFiles, EmitOutputPath);
for (auto &P : TempFiles) sys::fs::remove(P);
return OK;
}
The cleanup pattern — accumulating temp files in TempFiles and removing them whether or not linking succeeded — ensures no orphaned .o files are left in /tmp.
New Headers and Build Changes
The new headers in chapter 14:
#include "lld/Common/Driver.h" // lld::macho::link, lld::elf::link, lld::coff::link
#include "llvm/Support/VersionTuple.h" // VersionTuple for OS version extraction
The three LLD driver macros must appear at file scope to register the drivers:
LLD_HAS_DRIVER(elf)
LLD_HAS_DRIVER(coff)
LLD_HAS_DRIVER(macho)
CMakeLists.txt links the LLD libraries explicitly — llvm-config --libs all does not include them:
set(LLD_FLAGS "-llldCommon -llldELF -llldMachO -llldCOFF")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${LLD_FLAGS}")
Known Limitations
Target is always the host. The SDK is detected for the machine running pyxc. Cross-compilation is not supported.
main() always exits 0. The int main() wrapper returns 0 unconditionally. There is no way to set a non-zero exit code from Pyxc yet.
Runtime is always linked in. Even if a program never calls printd or putchard, the runtime object is included in every --emit exe link. A future chapter could strip unreferenced symbols with LTO.
SDK detection does not use xcrun. The SDK path is found by probing well-known filesystem locations. If you have a non-standard Xcode installation, set the SDKROOT environment variable to the SDK path before running pyxc.
No debug information. The emitted executables contain no DWARF. Debuggers cannot map instructions back to Pyxc source lines.
Multi-file globals are independent. Each .pyxc file gets its own __pyxc.global_init. If two files declare globals with the same name, the linker will report a duplicate-symbol error. There is no cross-file global sharing.
Try It
The minimal case
cat hello.pyxc
extern def printd(x)
def main():
printd(42)
pyxc --emit exe -o hello hello.pyxc
./hello
42.000000
Default output name
pyxc --emit exe hello.pyxc # produces ./hello
./hello
Global init runs before main
extern def printd(x)
var total = 0
for var i = 1, i < 6, 1:
total = total + i
def main():
printd(total) # 15.000000
Linking two files
# lib.pyxc
def add(a, b): return a + b
# main.pyxc
extern def printd(x)
extern def add(a, b)
def main():
printd(add(3, 4))
pyxc --emit exe -o prog main.pyxc lib.pyxc
./prog
7.000000
Linking a pre-built object
pyxc --emit obj -o lib.o lib.pyxc
pyxc --emit exe -o prog main.pyxc lib.o
./prog
Inspect the IR before linking
pyxc --dump-ir --emit exe -o prog main.pyxc
Build and Run
cd code/chapter-14
cmake -S . -B build && cmake --build build
./build/pyxc --emit exe -o hello hello.pyxc
./hello
What's Next
At this point Pyxc can compile multi-file programs to standalone native binaries without requiring any external tools. Future chapters will extend the language itself: a type system beyond double, string literals, and structured data.