I regret building this $3000 Pi AI cluster

I regret building this $3000 Pi AI cluster Jeff Geerling reflects on his experience building a 10-node Raspberry Pi Compute Blade AI cluster, spending around $3,000 including shipping. Ordered in April 2023, he upgraded from Raspberry Pi CM4 to CM5 Lite 16GB modules, totaling 160GB RAM across the cluster. --- Cluster Build and Setup Used 10 Compute Blades with 16GB CM5 modules. Initial setup involved unreliable random NVMe SSDs; replaced with Patriot P300 256GB SSDs. Thermals improved after attaching aluminum heatsinks securely. The build was complex; required multiple rebuilds for stability. A video walkthrough is available on YouTube. --- Benchmarking Results High Performance Linpack (HPL) Initial HPL benchmark: 275 Gflops at 105W power. After thermal fixes: 325 Gflops at 130W, a 10x speedup over a single CM5. Compared to an $8,000 4-node Framework Desktop cluster: Pi cluster is 4x faster. Slightly more energy efficient (Gflops/W). However, Pi is less cost-effective ($3,000 vs. $8,000) when considering performance per dollar. AI Performance Tests 160 GB RAM shared between CPU and iGPU, but lack of Vulkan acceleration for AI means CPU-only inference. Small model (Llama 3.2:3B) on one Pi runs at ~6 tokens/sec, significantly slower than cheap Intel N100 or Framework Desktop. Large model (Llama 3.3:70B, 40 GB) distributed over multiple nodes: Llama.cpp RPC method too slow, required reducing token generation batch size. Achieved only 0.28 tokens/sec (~25x slower than Framework cluster). Alternative tools (Exo, distributed-llama) yielded max 0.85 tokens/sec but with instability and still 5x slower than Framework. Detailed AI benchmarks available on GitHub. --- Final Thoughts and Use Cases The cluster is compact, quiet, and energy efficient. Good fit for use cases needing high node density or physical separation (e.g., CI jobs, high-security edge deployments). Example: Unredacted Labs uses Pi clusters for Tor exit relays due to efficiency and node count. Overall, for most users, the cluster is not cost-effective or powerful enough compared to x86 alternatives. --- Related Hardware and Notes Gateworks GBlade industrial-grade Compute Blades similar to Pi4 performance with 10 Gbps networking, now discontinued. Raspberry Pi Compute Blade remained a niche, cult-classic product without broad adoption. --- Parts List Compute Blade DEV and fan units Raspberry Pi CM5 16GB Lite modules Aluminum heatsinks (GLOTRENDS) Patriot P300 256GB NVMe SSDs (10-pack) GigaPlus 2.5 Gbps 10-port PoE+ switch Slim Cat6A cables for networking DeskPi RackMate TT enclosure 3D printed rack mounts for blades and switch --- Further Reading I clustered four Framework Mainboards to test huge LLMs Sipeed NanoCluster fits 7-node Pi cluster in 6cm LLMs accelerated with eGPU on a Raspberry Pi 5 --- This cluster project highlights challenges and trade-offs in building large Raspberry Pi AI and HPC clusters, emphasizing that while educational and niche uses exist, they often cannot compete with traditional x86 clusters in cost, speed, or AI inference performance.