GuyWithGames

Systems & infrastructure engineer · I build and run private AI

I run a private AI stack — a 120-billion-parameter brain on my own GPU — and the homelab that runs it.

One idea runs through all of it: never trust a confident claim — human or AI — until it's checked against what's real. It's why my news engine corroborates instead of pronounces, why my story world pauses rather than invent, and how I catch an AI agent the moment it starts confidently making things up.

I'm Halfax — a senior infrastructure & automation engineer. By day: large-scale storage, cloud, and end-to-end automation. Off the clock I turn that same discipline loose on a lab where I own and control every layer, silicon to reverse proxy — and run it like production. No cloud LLMs in the loop — nothing I don't own, nothing that leaves the house.

Inside it: a multi-model LLM server running three models resident at once on a single GPU's 96 GiB — headlined by a 120-billion-parameter reasoner running on an integrated GPU the vendor's own AI stack won't even boot on, and it's fast. Plus an always-on AI-driven story world, a cross-source news engine that fact-checks itself against the sources it reads, and even a from-scratch operating system with its own kernel, TCP/IP stack, and windowed desktop. I don't just build these — I run them like real infrastructure: wired across a small private fleet, backed up automatically, and reproducible enough to rebuild the entire stack from scratch after a total loss — the backups even land on a machine in another room.

And lately, the fun part — and the genuinely hard part: getting specialized AI agents to work together as one — splitting a big problem across them, handing results between them, having them cross-check each other. Orchestrating them is the easy half. The real skill is the conducting: knowing my own systems well enough to catch an agent the moment it drifts — when it's confidently producing something that looks right but doesn't match what I know to be true — and pulling it back before it ships. That's the thesis up top, turned on the AI itself: a fast, capable partner that still needs someone who knows the ground truth to keep it honest. The back-and-forth is where the good work comes from. A few of these are live right now — go poke at them.

6
hosts in the fleet
3
LLMs on one iGPU
1
OS from scratch
0
cloud LLMs in the loop

Live right now

Things you can open and use

Not demos, not screenshots — production services running on my own hardware right now, served to the public through a hardened reverse proxy. Go poke at them.

live

Halfax AI Chat

Chat with my private LLM server — the same model that drives my IDE extension and code tools, served over an OpenAI-compatible API on my own GPU. Nothing leaves the house.

Open the chat →
live

Heimdall Augur

Reads nearly 50 news sources — and actively searches the wider web for more — then reads how each one frames a story from the writing itself, and confirms a fact when sources that frame it differently still agree. No pronouncements, every claim quote-checked.

Open the Augur →
live

The Story Universe

An AI-driven story world that keeps running on its own across three machines — characters, factions, and arcs evolving in the background. The public World Browser reads straight from the canonical chronicle.

Open the World Browser →
live

Ashes of Grace

A narrative "interactive-book" RPG set in a state-run-reincarnation dystopia — death isn't a fail state, it's the loop. Branching, stat-gated choices with no hidden dice; your skills, memories, and convictions carry across every life toward one of ~30 endings. Runs entirely in your browser, saves locally, no account.

Play it →
live

Bird Cam

An always-on outdoor camera anyone can watch — no account, about three seconds behind live even on a phone. Its indoor sibling stays dark until armed behind a login + TOTP, and the privacy is engineered at the stream level, not a settings page: the public endpoint is hardcoded to the outdoor camera with no audio track in the stream at all.

Watch the birds →

The hardware

The Lab — six hosts, one private mesh

Each host does what its hardware is good at, not what's convenient. They all talk over the same WireGuard mesh, so internal traffic stays internal even when machines are continents apart.

See the fleet live — real-time resource & workload stats →

Betelgeuse AI inference

GMKtec EVO-X2 · Ryzen AI MAX+ 395 (16C/32T) · Radeon 8060S, 96 GiB UMA VRAM (96/32 split) · 128 GB unified · Ubuntu
  • The GPU box: three LLMs resident (a 120B reasoner, a creative model, a coder), embeddings, the Story Universe narrative engine, PictureAI
  • Strix Halo iGPU does double duty for the AI stack and the desktop — no discrete card, Vulkan path

Cygnus data & infra

Beelink SER9 MAX · Ryzen 7 H 255 (8C/16T) · Radeon 780M · 64 GB DDR5 · Ubuntu 26.04
  • The fleet's data/infra tier: GitLab, the KeySecrets vault, MongoDB, Qdrant, SearXNG, Tor
  • The Heimdall Augur's workers and web live here too, calling Betelgeuse's GPU for inference
  • Plus the public camera rig (the bird cam) and the network-intelligence sweeps behind the live dashboard

UYScuti gaming + dev

Alienware 18 Area-51 · Core Ultra 9 275HX · RTX 5080 Laptop (16 GB) · Windows 11 Pro
  • My daily driver — code, games, anything that wants a real keyboard and a discrete GPU
  • Home to the Windows-native work (the kernel-driver telemetry, VMs) and the manual third DR copy

hera-pi always-on edge

Raspberry Pi 5 · Cortex-A76 ×4 · 8 GB · NVMe · Debian 13
  • The low-power always-on node: the Story Universe's Chronicle Keeper, the DDNS updater, the public reverse proxy, and one of two off-box backup targets

Guy public host

Linode 4 GB · Rocky Linux 9 · Apache + Postfix/Dovecot
  • The public face: this site, mail (mail.guywithgames.com), and the live World Browser

Shop secondary web

Linode Nanode · Rocky Linux 9 · Apache
  • A small secondary web host (shopmom.net), off the mesh, reached by direct SSH

NetBird mesh

Managed WireGuard. Every host reaches the others directly over a private mesh address — SSH, the database, the vault API, the story pub/sub — none of it on the public internet.

BIND9 private DNS

An authoritative private zone gives every service a stable name; move a service and one DNS record changes — nothing is hardcoded to an IP.

Right workload, right node

Heavy compute on the GPU box, data/infra on Cygnus, always-on light work on the Pi, public entry points on the VPSes where uptime beats throughput.

Let's Encrypt TLS

Every public service uses free auto-renewing certs from Let's Encrypt (ISRG) — worth a donation to keep it free for everyone.

What makes it all work

A brain I own, a memory every agent shares — and rules they all obey

The projects below aren't islands. They're held together by three things I built on purpose: a private AI server I fully own, a shared memory so every AI agent works from the same durable context instead of starting cold in its own vendor silo, and a governance layer that makes them all work my way — my written rules outrank the model's defaults. I don't just use these assistants; I direct them, and I know my own systems well enough to catch one the moment it drifts. This is what makes a one-person lab punch far above its size.

And none of it is theoretical. A capable AI will hand me something that looks right — several times a day — and the craft is catching it against the ground truth and writing the correction into the rulebook so no agent repeats it. The verification isn't a feature I bolted on; it's the discipline the whole lab runs on — including how this very page is kept honest.

Hal-AI — the brain, on my own hardware

Hal-AI is a private multi-model LLM server running on my GPU, behind an OpenAI-compatible API. Any tool that already knows how to talk to OpenAI — or to MCP — can point straight at it, so I'm never renting someone else's model to work on my own code. Two MCP bridges expose its inference, memory, file operations, and the whole homelab as plain tool calls, which is how an outside agent reaches my hardware at all.

One shared memory, not four vendor silos

I genuinely run Claude Code, Windsurf Cascade, GitHub Copilot, and my own Hal-AI extension on the same codebase. Each ships its own per-conversation memory; left alone, they'd each learn my conventions separately and forget them separately. So I gave them common ground: a single ADR-style rulebook (DECISIONS.md) symlinked so every tool finds its own native convention — .cursorrules, AGENTS.md, CLAUDE.md — plus a shared semantic memory any client can search. One source of truth, written once, read by every agent.

Promote-to-durable — memory that outlives the tool

The rule that ties it together: when any agent learns something load-bearing mid-session — a corrected approach, a hard-won invariant — it gets promoted from that tool's throwaway memory into the shared rulebook in git, and off-boxed like production data. So the knowledge survives a context reset, a harness change, or swapping the assistant entirely. The lesson a model learns on Tuesday is still there, for every agent, on Friday.

My rules outrank the model's defaults

The shared rulebook isn't a suggestion — it's the top authority. When a rule I've written collides with a model's own instinct or a vendor's built-in default, my rule wins, whichever assistant is driving. It governs not just what the AI knows but how it works — verify before claiming, don't wave off a hard question, no AI watermarks in my git history. And the load-bearing ones aren't left to good faith: they're compiled into hooks that block the wrong action before it runs. I set the standard; the assistant meets it or gets stopped.

Many specialists, one conductor

The current edge of the work: getting several specialized agents to act as one. I fan a big job out across them, hand each one's result to the next, and have them cross-check each other before anything lands — then synthesize the survivors. Wiring that up is the easy half; the real skill is the conducting — sequencing who does what, and knowing my own stack well enough to spot the one that's confidently wrong and pull it before it ships. A fast room of assistants is only as good as the person who can tell which one to trust.

Why it works so well for my projects

It compounds. Every session makes the shared context sharper about my specific stack, so the agents behave consistently instead of relitigating the same decisions. There's no lock-in — I can switch tools tomorrow and the memory stays, because it lives in git and on my hardware, not in a vendor's account. And because the rules travel with the code and are recoverable, the same rigor I put into backups protects how my assistants think.

Honest about the lineage: ADRs (Nygard, 2011), mem0, MemGPT, and .cursorrules all exist. What's mine is the synthesis — owned compute, portable cross-tool memory, and a promote-to-durable discipline tied into one workflow — run as if it were production infrastructure.

What I build

Projects

AI tooling, distributed systems, security research, an operating system from bare metal, and games — every one built end to end and run for real on the lab, not left as a weekend demo. Skim the headlines; expand what grabs you.

AI & Story

Story Universe

live

A persistent, AI-driven narrative world that runs unattended across three machines — closer to an MMO backend than a chatbot, and it will pause the entire world before it fabricates a single scene.

  • Three hosts, split by strength: a Raspberry Pi holds the canonical world state in SQLite and broadcasts a world-clock over ZeroMQ; my GPU box generates each event with the in-house LLM; a public VPS renders the live World Browser.
  • It builds on itself. Every generation is grounded in the canonical chronicle — the recent world events and the active arc's full history are fed back in as context — so each new beat continues the story instead of restarting cold.
  • It pauses, it never lies. If a dependency is down — or a generation comes back malformed — the engine skips that beat rather than fabricate one. A validation guard blocks anything that isn't real prose, so no invented or half-formed event can enter the canonical world.
prose-quality loop

Generated beats run through a filter that rejects atmospheric-formula openings, cliché tension families, time-of-day contradictions, and any output that leaks the prompt template instead of writing prose; each rejection is fed back as a negative example with rising temperature, up to six tries, and a beat that still won't come back clean is skipped rather than persisted. Combined with grammar-constrained JSON decoding, parse failures dropped from ~51% of LLM calls to 0%.

Halfax AI Stack

live

A private multi-model AI server on my own hardware that runs a 120-billion-parameter model on an integrated GPU — fast — even though the vendor's own compute stack doesn't work on this chip. OpenAI-compatible, so any tool that talks to OpenAI talks to mine instead.

  • Three models resident on one iGPU. A GPT-OSS 120B mixture-of-experts reasoner (lm1) drives the analytical work — the Augur, agent coding, reasoning; an abliterated 24B creative model (lm2) serves the Story Universe; and a fast Qwen2.5-Coder (lm3) handles editor autocomplete — all together in the AMD Strix Halo's 96 GiB of UMA VRAM via Vulkan (~78 GiB resident).
  • MoE is the trick. On a bandwidth-bound integrated GPU, a 120B model that activates only ~5B parameters per token runs roughly an order of magnitude faster than a dense model its size — frontier capability without the bandwidth tax.
  • Supervised + self-healing. Each model runs as a watched llama-server child; a watchdog restarts a dead one within seconds, and requests can target a slot or auto-failover.
  • Output that can't be malformed. json_schema / grammar pass straight through to llama.cpp, which masks any token that would break the schema before it's sampled.
memory + embeddings

Three opt-in memory layers: a SHA-256-cached RAG store (~99k chunks at ~4 ms median lookup), the shared ChromaDB semantic memory, and a two-tier personal memory (always-loaded CORE + cosine-retrieved archive). A separate Vulkan llama-server serves embeddings, since Python sentence-transformers hangs on HIP init on this GPU.

Heimdall Augur

live

A news engine that fact-checks itself. It ingests ~50 editorial feeds — and searches the open web for coverage it doesn't even subscribe to — clusters every outlet's take on the same story, and confirms only the facts that survive differently-framed reporting. It cuts through the spin to the shared truth and refuses to hand you an opinion.

  • ~50 feeds plus the open web, deliberately diverse. Western mainstream (BBC, NPR, Guardian, NYT, DW, France 24, ABC, Al Jazeera) and divergent US voices (Fox, NY Post, The Intercept, Mother Jones, Politico) sit alongside state and regional press — Russian state TASS read against independent Russian outlets (Meduza, The Moscow Times) and the Kyiv Independent, plus Press TV, SCMP, Times of Israel, Al Arabiya, Middle East Eye, Al-Monitor, Daily Sabah, Dawn, MercoPress, Premium Times, Yonhap, Mainichi (Japan), YLE (Finland), Bellingcat (OSINT), tech desks (Ars Technica, The Verge, 404 Media), and India read across four desks (the Hindu, Times of India, Hindustan Times, Indian Express) — alongside SEC EDGAR filings and the Bluesky firehose. When a story is thinly covered it searches the web itself (a self-hosted SearXNG) to pull in outlets it doesn't even subscribe to.
  • Cluster on meaning, anchor against drift. Events are embedded (nomic-embed) and kNN-clustered in Qdrant, with a wire-syndication dedup (AP→BBC + AP→NPR isn't two confirmations) and a seed-anchor gate so aging clusters can't drift off-topic.
  • Corroboration is computed, not pronounced. The model extracts atomic factual claims with verbatim quotes; every quote is validated against the source text — fabricated or unverifiable quotes are dropped, often dozens on a single busy story, then — reading each article blind to who published it — it groups them by how the story is actually framed in the writing, and confirms a fact when sources that frame it differently still assert it. Agreement across genuinely divergent framings is the strongest signal; a plain fact many sources report is confirmed too. (Earlier it leaned on hand-labeled country "blocs" — but assigning those labels was itself a bias to see past, so the framing is now read from the article, not assigned by the operator.)
  • It holds contradictions open instead of resolving them. When sources genuinely conflict it surfaces both, attributed, rather than picking a winner — and it can tell a real contradiction from two different events described in similar words, flagging the narrative tension without inventing a resolution.
  • Two ways to read it. A consolidated story written strictly from the validated facts (contested points attributed to both sides, spin surfaced and avoided), tab-linked to the evidence view with the spine, contradictions, and quotes.
why no verdicts

The first build asked the model to pronounce what events mean. Every model tried — up to the largest I run — over-read: confident, well-cited-looking significance manufactured from a single beat, and the bigger the model, the more persuasively it did it. The failure was the genre, not the model. So the verdict layer was removed and replaced with extract-then-corroborate: the model only reads and quotes; the source graph decides what's true.

Halfax AI VSCode Extension

TypeScript

A from-scratch IDE client — chat, autonomous edits, autocomplete, one-click apply — that talks to my GPU, logs to plain JSON I own, works with the internet unplugged, and auto-resumes its agent mid-task after a crash, exactly where it stopped — a resilience the mainstream AI-coding tools still skip.

  • Not a fork. Custom React webview, Zustand state, an in-process MCP stdio client, OpenAI-compatible streaming to local llama.cpp. Distributed only as a sideloaded VSIX from the lab.
  • The agent loop is a real state machine (pending → in_progress → completed | awaiting_user | error | cancelled), and every successful turn flushes history to disk — quit or crash mid-task and it auto-resumes where it stopped.
  • One GPU slot, prioritized: autocomplete < chat < agent, so a long agent task never starves typing-time completion. Applies preview via diff and writes through MCP with mtime conflict detection.

PictureAI

FastAPI + Vulkan

A local image generator running entirely on the server's integrated GPU — no CUDA, no cloud, no account. A ground-up rebuild tuned for a bandwidth-bound integrated GPU, where nearly every image tool assumes an NVIDIA card and would simply refuse to run.

  • Resident model, FastAPI UI. A long-lived sd-server keeps weights in GPU memory across generations; supports SDXL, Flux, SD3.5, Chroma, AuraFlow, HiDream with on-demand weights and a browsable catalog.
  • Tuned for the iGPU. Models are converted to q8_0 (≈half the per-step bandwidth on this bandwidth-bound APU) with flash attention on; runs as a systemd service that hands the GPU back to the LLM stack on request.
Infrastructure & Operations

Homelab MCP Server

Python + Paramiko

Glue that lets any MCP-capable AI inspect and operate the whole fleet in plain English — "is hera-pi healthy?", "which containers restarted?", "deploy this site."

  • Six hosts, 40+ services, under two seconds — checked concurrently over pooled Paramiko SSH with a thread pool; the first call cold-starts the pool, every call after is one round-trip per host.
  • Fleet tools: systemd / Docker / process control on Linux, PowerShell on Windows, log retrieval, SFTP between any two hosts, arbitrary commands behind a destructive-pattern denylist. The host map is YAML and hot-reloads.
  • Site deploy + Story Universe orchestration are first-class: this page deploys through it, and one natural-language call can snapshot or fully reset the narrative stack across hosts.

hal-web-mcp

Python + MCP

The AI agent's web reach — behind a real SSRF wall. Typed web tools instead of letting the model shell out to curl, so a prompt-injected agent can't be turned into a request-forging weapon against my own network.

  • Six typed tools (get, head, search, extract, cache, config): SearXNG-federated search and Readability article extraction, with HTML auto-converted to markdown so smaller models can actually read it.
  • The safety boundary is the whole point: it DNS-resolves every URL up front and refuses loopback, private, link-local, and carrier-grade-NAT ranges; unwraps IPv4-mapped IPv6; and rejects mixed public+private DNS answers — the DNS-rebinding trick most homemade fetchers miss.
  • Hardened end to end: an http/https-only scheme allowlist, every redirect hop re-validated with HTTPS→HTTP downgrades refused, and a streaming byte cap that kills the connection rather than trusting Content-Length. A per-domain rate limiter stops agent feedback loops; 31 tests cover it.

Halfax AI MCP

Python + MCP

The bridge that lets any MCP-capable agent reach my AI stack and hardware — 49 tools, with a command surface that's deliberately not a shell.

  • Inference with teeth: one call fans the same prompt to every loaded model in parallel (wall-time = the slowest), and another forces grammar-constrained JSON so the output physically can't come back malformed.
  • "Run a command" is an allowlist, not a shell: operation names map to specific binaries with fixed args, service restarts are frozen to a handful of named units, and any model-supplied argument carrying shell metacharacters is rejected — far narrower than handing an agent SSH.
  • File ops deny before they allow: .ssh, .gnupg, cloud credentials, and shell histories stay unreachable even if the agent's roots are widened; writes are atomic, with timestamped backups, mtime conflict detection, and a dry-run.

AI Collaboration Memory

Markdown + git

My AI assistants don't start cold, aren't locked to one vendor, and don't get to freelance: I treat their rules like Architecture Decision Records — the top authority for how any assistant acts on my systems, in flat markdown, off-boxed like production data.

  • DECISIONS.md is the durable layer — each rule in ADR shape (rule / why / how to verify), in a shared fleet-docs git repo every host pulls, off-boxed nightly in the encrypted archive.
  • The rulebook outranks the model — and some of it is enforced. A rule here beats the model's own instinct and the vendor harness's built-in default; it governs how the AI writes and reasons, not just what it knows. The load-bearing action rules compile into PreToolUse hooks that block the wrong move before it runs — governance with teeth, not a wish list.
  • One source, every tool. The same file is reachable as DECISIONS.md, .cursorrules, AGENTS.md, and CLAUDE.md via symlink, so Claude Code, Cursor, Windsurf, and my own extension all read the same rules.
  • Promote-to-durable: when an assistant learns something load-bearing mid-session, the convention is to promote it from per-tool working memory into the shared rulebook — the workflow is itself written down so it applies recursively.

Self-Hosted DevOps Stack

GitLab · live-look · drift-detect

I didn't build GitLab — but the way I run it is the point: the entire stack rebuilds from compose files and secrets bundles in a single afternoon, no vendor in the loop. That's the property no hosted version gives you.

  • GitLab CE on Cygnus is the authoritative remote for every project here, LAN-only. Nightly it produces a data dump plus a config/secrets bundle that GitLab's own backup tool omits, off-boxed to two separate machines.
  • Observability is the Fleet Live-Look (guywithgames.com/livelook) — real-time per-service CPU/RAM, GPU/VRAM, and live AI-stack throughput across every host — backed by a gethomepage dashboard and a nightly drift-detector. The collector and page live in git, so a clean rebuild reproduces them.
  • Recoverable by design: the recoveryplan/ tree rebuilds this entire stack from compose files and secrets bundles in an afternoon, with no vendor in the loop.

Recovery Plan + Drift Detector

Bash + Python

Not just backups — a recovery tree: the backups plus the runbook to use them, both off-boxed, both reproducible, with a drift guard that recently caught a service that had been silently dead for six months.

  • The chain: Mongo dump, GitLab + secrets bundle, the Qdrant vectors and the Augur source-snapshot archive, and a nightly AES-256 Tier-A archive — three local mirrors plus nightly off-box rsync to two separate machines (one in a separate room). A fourth copy lands on the Windows workstation automatically — a daytime job that pushes the moment the laptop is online — for the "both Linux boxes gone" case.
  • Cold-metal bootstrap is chained so each step unlocks the next: decrypt archive → restore the vault's bootstrap files → mongorestore → start KeySecrets → ks-get works → clone from GitLab with the PAT pulled from the freshly-up vault.
  • Drift-detect runs paired with the chain: failed/flapping units, crontab vs baseline, enabled units vs captured, untracked recovery-tree files, disk/memory pressure, and backup-log freshness. It recently caught a service that had been silently dead for six months.

Homelab Tor Stack

Docker

Tor isn't my code — but I use it for things it wasn't built for. The country-pinned exits aren't about anonymity; they let the Augur read state media as a non-Western reader sees it (Cloudflare posture, GDPR splash, editorial blurbs all change by viewer country). Built reproducibly from GPG-verified source — no third-party image in the trust path — bound strictly off the public internet, with a health check that proves a real circuit rather than just checking the port.

  • Built from source-of-truth: Tor from the project's signed apt repo, the Browser tarball GPG-verified at build time — no third-party Docker Hub image in the trust path.
  • Never public: ports bind LAN + Netbird only, with a second SocksPolicy gate; a health check builds a real circuit each interval rather than just checking the port.
  • Country-pinned siblings (DE / NL / JP) let the Augur read state media as a non-Western reader sees it.

tor-recon

Python + SOCKS5

Security recon that turns Tor into a free attacker's-eye view of my own network — a rotating IP on none of my allowlists, so a probe sees my public surface exactly as an outside scanner would. Its first runs found real holes I'd missed: a public dev server running as root over plaintext, a silently-expired TLS cert, and an orphan port-forward — each caught, fixed, and re-verified.

  • Parses the SOCKS5 reply byte to tell a target's refusal from the Tor exit's own port policy, which most probes conflate into false findings.
  • Negative assertions: verifies that services which should never be public (the agent API, MongoDB, the vault, GitLab, dashboards) actually refuse from outside, across every external face.
  • First runs earned their keep: caught a silently-expired mail cert, a public Flask dev server running as root over plaintext, and an orphan UPnP forward. Each fixed and re-verified.

Network Intelligence

discovery + correlation

Most homelabs end up with five disconnected tools — nmap here, DHCP leases there, an audit log somewhere else. This joins them against one device identity — active scans, BIND9 records, DHCP leases, audit hits, mesh peer state — so a device shows up once, with everything any tool knows about it.

  • Drift-aware: runs as an always-on service with scheduled sweeps, flagging new hosts, disappeared services, unexpected ports, and mesh peers gone dark.
  • Live dashboard drills from a device into every record and distinguishes "a new device" from "an old device with new behavior."
  • Privacy-preserving by design: the full map stays on the LAN, but a sanitized counts-only feed — devices, containers, subnets, mesh health, never an IP, MAC, or hostname — is what powers the public Fleet Live-Look.

Halfax System Reporter

Windows kernel driver

A hardware-telemetry platform fronted by a custom WDM (Ring 0) driver that reads MSRs, PCI config space, and SMBus/I2C directly — the same access level commercial tools like HWiNFO use, written from scratch. A driver that loads into Ring 0 is a different league than calling an API.

  • Three layers: the kernel driver for raw access, a C++ user-mode broker that wraps IOCTLs into JSON, and native helpers for CPUID topology, RAM SPD timings, NVMe SMART, and EDID.
  • From MSRs: per-core temps with thermal margins, RAPL power, turbo ratios, C-state residency, microcode version — with multi-method fallback chains so even locked-down systems yield most readings.
  • Cross-platform parity across Windows (driver + WMI), Linux (sysfs + dmidecode), and the Pi (device tree).
Systems & Security

HalfaxOS

x86_64, from scratch

A 64-bit, capability-based operating system written entirely from scratch in C and assembly — no borrowed kernel code — that boots on VMware and real hardware. Most hobby "OSes" stop at a bootloader printing "hello"; this one has its own TCP/IP stack and a windowed desktop where the shell, terminal, and compositor are all Ring-3 programs, not kernel code. The largest single thing I've ever written — and every line is mine.

  • Capability security model: typed, permission-checked handles replace UNIX file descriptors (cap_dup() only attenuates, never escalates), and no syscall takes a raw user pointer — memory crosses the boundary only as a (region, offset, length) capability the kernel granted. Closer to seL4 than Unix, by design; same direction as Fuchsia and CHERI.
  • Userspace is the system: the shell, the terminal emulator, and the window compositor all run as Ring-3 programs. A process's stdout/stdin are capability IPC ports, so the terminal hosts the shell with no kernel-side console, and the compositor shares each window's pixel buffer with its client zero-copy via a memory-region capability. ACPI S5 powers the machine off from an on-screen menu.
  • Kernel internals: 52 syscalls, a preemptive scheduler with per-process CR3, 4-level paging VMM, message-passing IPC, Ring-3 userspace with an ELF64 loader, and SMP bring-up — ACPI MADT CPU discovery, APIC/IOAPIC, and an AP startup trampoline (multicore scheduling is the active work).
  • From-scratch TCP/IP stack (ARP, IPv4, UDP, TCP, ICMP, DNS, HTTP, DHCP), an E1000 driver, a framebuffer display driver, and a VFS with RAM/dev/exFAT filesystems.

ComboServer

BGP + Steam analytics

An unusual question I couldn't find anyone else asking: when the internet itself misbehaves — a route leak, a cable cut, a country going dark — what happens to the people playing games on top of it? BGP routing analysis and Steam behavioral analytics are separate worlds; correlating them was the framing I went looking for and didn't find prior art on.

  • Correlates real-time BGP (RIPE RIS Live, RouteViews replay) with Steam behavioral data (player counts, reviews, patch impact) to surface patterns nobody else tools for.
  • Forensic features: hijack-vs-misconfig classification, AS "personality profiles," prefix-ghosting, review-bomb detection, sentiment drift, review-DNA fingerprinting.
  • TimescaleDB time-series with auto-bootstrap that pulls weeks of history on first run and catches up after downtime.

Halfax KeySecrets

secrets vault, HTTPS-only

A self-hosted, end-to-end encrypted secrets vault — hybrid post-quantum by default (X25519 + ML-KEM-768), built before most vendors ship it at all. The server only ever sees ciphertext: steal the entire database and you get nothing — even an admin with full disk access cannot read a single password.

  • Per-recipient cryptographic sharing: every secret has its own data key, sealed individually to each recipient; revoking is one wrapped-key delete. The wrap is hybrid post-quantum by default (X25519 + ML-KEM-768) — the server won't start without ML-KEM.
  • Conservative crypto: Argon2id, ChaCha20-Poly1305, mandatory TOTP; master passwords are never stored, existing only as a session-key input in memory.
  • The consumption story — halfax-secrets: a tiny lib + ks-get CLI fetches credentials over the API at runtime, so programs hold no .env files; per-host service tokens authenticate unattended scripts.
  • An MCP front door for AI agents, multiply gated: only one of its nine tools ever returns plaintext, mutations need both an env flag and a change-role token (deletes add a confirm), and it never logs a secret value — so an agent can fetch a credential at task-time without it ever touching a config file or the model's context.

Usenet Reader

Python + PySide6/Qt

A desktop Usenet newsreader with a hand-rolled NNTP client written straight against RFC 3977 — Python 3.13 removed nntplib, so I reimplemented the slice I needed over raw sockets, TLS and all.

  • Classic three-pane UI (groups / threaded articles / body) built from References headers, multi-server subscriptions, and full read + post + quoted reply.
  • All network I/O on worker threads so the Qt UI never blocks — and honest in its own README about the bits that aren't wired up yet.
Games

Ashes of Grace

live

A data-driven narrative "interactive-book" RPG in a dark-satirical, state-run-reincarnation dystopia — vanilla JS, no build, no server, no dependencies; it runs entirely in the player's browser and is playable now at grace.guywithgames.com.

  • Death is the loop, not a fail state. Each death is a Reissue: the State re-rolls your body, but your attributes, memories, convictions, and the bonds you remember persist — so a finite "cheat death" resource and your choices compound across lives toward one of ~30 endings across 10 chapters.
  • Deterministic choices, no hidden dice. Stat-checks are pure thresholds, not RNG; a locked choice shows its requirement, and knowledge-gated options only appear once you remember the right thing — branching that rewards what you've built, not luck.
  • Strict engine/content split. A single IIFE engine knows nothing about the story; every scene is data registered with Story.scene(...), so writing a chapter never touches engine code. Versioned, checksummed, lightly-obfuscated saves live in localStorage with a copy-pasteable export code — and a node smoke test asserts scene-graph integrity plus a 500-run fuzz (no dead-ends, no loops) after every change.

Halfax Dungeons

Godot 4 roguelike

A from-scratch Angband-style roguelike, 100 levels deep — ported file-by-file from a ~8,900-line Python build to ~7,200 lines of Godot 4 GDScript. The standout isn't the game; it's the Borg, an autonomous agent that plays the whole thing by itself.

  • The game: depth-scaled difficulty (rarer items and elite monsters deeper, minibosses at floors 33/66/90 and a final boss at 100), melee + ranged combat with Bresenham line-of-fire, a magic system, town generation, and save/load — all driven by 12 hand-authored data tables (43 monsters, 25 weapons, 16 armor, plus artifacts, wands, scrolls, potions).
  • The Borg autoplayer: an autonomous bot on BFS pathfinding with path caching and anti-oscillation, reimplemented cleanly in GDScript — it runs the dungeon headless as a balance and regression harness, so the game effectively tests itself.
  • Built with discipline: the old Python version is deprecated but kept, with a documented file-by-file migration table to the Godot port so the two implementations never drift.

Skyhaven: Scrolls of the Elder Realm

browser RPG

A Skyrim-inspired RPG built in pure HTML, CSS, and JavaScript — no framework, no build tools — with a parallel Godot 4 port scaffolded.

  • Hand-authored depth: ~11 playable races and 6 classes in full character creation, 13 skills, ~66 items, 13 enemy types, 10 locations, 8 NPCs, and 7 multi-stage quests.
  • Real Elder-Scrolls systems: magicka/stamina pools, skill leveling, stealth, smithing / enchanting / alchemy, turn-based combat, and localStorage saves — themed throughout, dragons and all.

Halcity

Python + Pygame

A SimCity-style city builder in pure Python + Pygame (~3,900 lines), grown up from an earlier Tkinter prototype — with a genuinely deep simulation under the pixel art.

  • A two-layer world: a 150×100 surface map plus a Tab-toggled underground layer for water and sewer pipes (each servicing a 4-tile Manhattan radius), with coverage math feeding back into happiness and pollution.
  • A real economy: 10 tracked stats — money, population, jobs, food, energy, happiness, pollution, water, sewage, prestige — updated by an explicit per-turn formula, with random events and a prestige→decree system of city-wide buffs.
  • Building subtypes (5 power-plant types, three variants each of residential / commercial / industrial / farm), each with a runtime-rendered 48×48 sprite, smart auto-tiling roads, and JSON save/load that auto-migrates old saves.