486Tang - 486 on a Credit-Card-Sized FPGA Board Author: nand2mario Date: September 13, 2025 --- Overview 486Tang v0.1 is a port of the ao486 MiSTer PC core to the Sipeed Tang Console 138K FPGA. This is reportedly the first time ao486 has been ported to a non-Altera FPGA. The project adapts an x86 core to fit a smaller, different FPGA environment while maintaining PC-like boot and operation. --- 486Tang Architecture Porting ao486 involved adapting major components to fit the Tang Console platform: Memory Changes: Main memory switched from DDR3 (MiSTer) to SDRAM, naturally fitting the old 80486 era timing. DDR3 repurposed for framebuffer only. SDRAM is 16-bit wide; doubled clock speed allows 32-bit words per CPU cycle. Storage Interface: MiSTer forwards IDE requests to ARM HPS MCU; Tang lacks comparable MCU-FPGA speed link. Disk storage moved to SD card accessed directly by FPGA. Boot Loading: BIOS, VGA BIOS, CMOS settings, IDE IDENTIFY data stored in first 128 KB of SD card. Boot loader reads from SD card into memory before starting CPU. --- System Bring-Up Using Whole-System Simulation Debugging was challenging due to large Verilog codebase (>25K lines) and long build times. Used Verilator to simulate subsystems (VGA, IDE, Sound Blaster) and entire boot to DOS, speeding diagnosis significantly. Debug aids included: BIOS debug prints to port 0x8888, intercepted by simulator and FPGA UART. Subsystem IO tracing flags (--sound, --ide). Official Bochs BIOS assembly listings for symbolic debugging. Many bugs were in newly added FPGA stitching code, with some toolchain-dependent behaviors discovered. --- Performance Optimizations Initial Tang implementation ran like a 25 MHz 80386. Key challenges involved reducing long combinational logic paths: Reset Tree and Fan-Out: Manual reset net replication reduced fan-out delays. Instruction Fetch Improvements: Removed dependency on instruction decode length from fetch buffer space calculation to shorten critical path. Slight fetch under-utilization traded for improved max clock. TLB Optimization: Changed from 32-entry fully-associative to 4-way set-associative TLB, simplifying logic for better clock speeds. With these, 486Tang reached about a 35% performance improvement (Landmark 6 benchmark), close to a 486SX-20 class CPU. --- Reflections Clock Speed Scaling: Increasing system clock is the most effective way to improve performance before memory latency dominates and caches/pipelines become critical. x86 vs ARM: x86 remains complex due to legacy (variable-length instructions, addressing modes). ARM7 (used in GBATang) is simpler with fixed-length instructions and more straightforward addressing. --- Summary 486Tang v0.1 successfully ports an x86 486 core to a small, resource-constrained FPGA platform with novel adaptations of memory, storage, and bootloading. Simulation played a crucial role in debugging and speed optimizations brought it into practical performance range. The project highlights trade-offs in legacy CPU design and FPGA adaptation, and celebrates the achievement of running a full 486-class PC on a credit card-sized FPGA board. --- Links: 486Tang on GitHub test386.asm verification suite GBATang - ARM7-based FPGA project --- 486Tang represents a compelling blend of retro PC emulation and cutting-edge FPGA engineering.