The Controller Board: A Custom Zynq Linux Host

You might be wondering why I didn’t just slap a Raspberry Pi inside the aluminum monolith and call it a day.

The original Connection Machine CM-1 didn’t operate on its own; it was a massively parallel accelerator attached to a “front-end” host, typically a Symbolics Lisp Machine or a VAX. To recreate this architecture, I needed a host computer capable of running a full Linux OS, compiling my StarC code natively, and, most importantly, orchestrating 16 deterministic TDMA hypercube backplanes simultaneously.

If you ask a Raspberry Pi to juggle 16 high-speed custom serial buses with microsecond accuracy, it will fail. A standard CPU cannot do this. The Linux kernel’s interrupt overhead alone would choke the processor to death. You cannot bit-bang a supercomputer. You need the hardened processing of an ARM core bolted directly to the deterministic, programmable logic of an FPGA.

So, I designed a custom supercomputer motherboard from scratch.

The Silicon Strategy: Designing for the Footprint

The heart of the controller board is an AMD/Xilinx Zynq UltraScale+ MPSoC. The specific part number I landed on for the initial bring-up is the XCZU2CG-L1SFVC784I.

Choosing this specific piece of silicon was a highly calculated exercise in balancing performance against the punishing physical constraints of my 1:1:1 aluminum cube enclosure.

The Pin-Compatible Upgrade Path

Let’s clear something up about the CG architecture. It is a dual-core Cortex-A53, and it lacks the Mali GPU. For initial testing, I don’t care about the GPU or the core count; I just need to verify my power delivery and DDR4 routing.

But the beauty of Xilinx’s UltraScale+ architecture is pin compatibility. Because I designed this entire motherboard around the 0.8mm SFVC784 footprint, I can later spin a final revision of the board and drop in an XCZU3EG. That chip upgrades the system to a Quad-Core Cortex-A53 and includes the Mali-400 GPU for smooth, hardware-accelerated desktop rendering. I can upgrade the brain of the machine without re-routing a single copper trace.

People always ask, “Is this just a glorified Raspberry Pi 4?” Yes and no. If you look purely at the Linux desktop experience, a Quad-Core A53 is functionally identical to a Raspberry Pi 3, though the 16nm hardened USB 3.0 and Gigabit Ethernet give it the I/O bandwidth of a Pi 4. But comparing a Zynq to a Raspberry Pi entirely misses the point. A Raspberry Pi is just a computer. The Zynq is a computer bolted to a deterministic hardware factory.

Feature Raspberry Pi 4 Zynq UltraScale+ (XCZU3EG)
CPU ArchitectureQuad-core Cortex-A72Quad-core Cortex-A53
Hardware UARTs6 (Max, shared with other I/O)Effectively Infinite (FPGA Fabric)
Real-Time ProcessingNone (Linux Kernel overhead)Dual Cortex-R5 Cores + FPGA Logic
Custom I/O RoutingFixed to specific GPIO pinsAny signal to any of the 200+ PL pins

System Memory: 2 Gigabytes of DDR4

There are two distinct memory domains in this machine. The entire 4,096-node hypercube possesses a grand total of 32 Megabytes of distributed SRAM—a mathematically perfect match to the original 1985 CM-1.

However, the Linux host needs real breathing room. I gave the Zynq 2 Gigabytes of system RAM using two Micron MT40A512M16TB DDR4 chips.

Routing the Fly-By Topology

Because the Zynq requires a 32-bit memory bus to run the OS efficiently, and each Micron chip is 16-bits wide (x16), I had to place two chips side-by-side.

Routing this was a pain.

The lower 16 data bits (DQ0-DQ15) go point-to-point to Chip 1. The upper 16 data bits (DQ16-DQ31) go point-to-point to Chip 2. However, the address, command, and clock lines have to be routed in a Fly-By Topology. They exit the Zynq, hit the pads on the first RAM chip, continue along the same layer to the second RAM chip, and then terminate into 39.2-ohm resistors tied to a 0.6V VTT tracking regulator. Length-matching this on a custom 8-layer board to picosecond tolerances is exactly as stressful as it sounds.

The Interface: The DisplayPort Trapdoor

I wanted a standard HDMI port on the back of the machine so I could easily plug it into a bench monitor. There is just one massive catch: the hardened video controller inside the Zynq Processing System (PS) is strictly DisplayPort.

The high-speed GTR transceivers on the Zynq do not support “Dual-Mode” (DP++), meaning they cannot passively fall back to sending HDMI/TMDS signals. If you wire them directly to an HDMI connector, the screen stays black.

To solve this, I placed a Parade Technologies PS176 on the board. This is an active DisplayPort-to-HDMI 2.0 converter IC. I routed two high-speed DisplayPort lanes from the Zynq directly into the PS176, which actively decodes the DP packet stream, regenerates a fully compliant HDMI video stream, and pushes it out to the connector on the rear IO panel.

The 16-Channel Routing Nightmare

Here is the exact reason a standard single-board computer could never run this machine: I have to talk to 16 separate hypercube backplanes simultaneously.

Each of the 16 compute cards requires its own independent, high-speed serial channel (RX/TX) to pass instructions, load StarC binaries, and retrieve data.

As mentioned, Linux is terrible at this. So, I bypassed the CPU entirely. I went into the FPGA fabric (the Programmable Logic) and manually instantiated 16 independent hardware UART MACs. I wired these MACs directly to the AXI memory interconnect.

The Orchestrator: TDMA Sync and the LED Display

The 16 compute cards are just the engines. The Zynq also has to conduct the rest of the orchestra.

First, there is the TDMA Synchronization. To prevent the 4,096 RISC-V chips from talking over each other, the entire machine runs on a globally synchronized Time Division Multiple Access (TDMA) schedule. Linux is terrible at microsecond-accurate timing because of kernel preemption. But the FPGA fabric? It only operates in hard, deterministic real-time. The PL fabric generates a flawless, jitter-free master phase clock and distributes it across the backplane to all 16 cards simultaneously.

Second, there is the LED Display. The 64x64 front panel is controlled by an RP2040. The Zynq needs to blast display state data to that RP2040 fast enough to maintain a 60fps refresh rate. I spun up yet another dedicated high-speed SPI channel in the FPGA fabric, wiring it directly to the front-panel connector.

Tying It All Together

Designing a 64-bit Linux motherboard from scratch just to serve as a glorified conductor for an array of RISC-V microcontrollers is objectively ridiculous. But by utilizing the Zynq’s hardened AXI interconnect, the Linux kernel can memory-map the entire hypercube directly into its address space.

When you sit at the terminal and execute a StarC program, the ARM cores compile the C code natively, pass the deterministic TDMA schedules directly to the FPGA fabric, and the hardware MACs blast the data across the backplane at gigabit speeds. The CPU never breaks a sweat. It is an incredibly elegant, full-stack architecture wrapped inside a piece of brutalist modern art.