HackerSqueeze — AI‑curated tech & startup insights

Condor Computing, a subsidiary of Andes Technology formed in 2023, is a new player in the RISC-V core market, similar in business model to Arm and SiFive. At Hot Chips 2025, Condor introduced the Cuzco core, a high-performance, 8-wide out-of-order RISC-V CPU core designed to compete with top-tier cores like SiFive’s P870 and Veyron’s V1, surpassing earlier silicon cores like Alibaba T-HEAD’s C910 and SiFive’s P550. Cuzco targets clock speeds of 2 to 2.5 GHz on TSMC’s 5nm process, with a 12-stage pipeline and a 256-entry reorder buffer. The core features configurable execution slices and cache parameters to accommodate various customer needs, supporting clusters of up to eight cores linked via a CHI bus. The core uses a sophisticated branch predictor based on TAGE-SC-L (Tagged Geometric with Statistical Corrector and Loop predictor), with a 16K entry bimodal table as its base component. Its branch target buffer has 8K entries split in two levels, and a 32-entry return stack plus an indirect branch predictor. The instruction fetch unit feeds from a 64 KB, 8-way set associative instruction cache with a 64-entry fully associative TLB. The decoders handle up to eight instructions per cycle. Cuzco employs a novel “time-based” static scheduling in its backend, shifting scheduling complexity from traditional dynamic schedulers into the rename and allocate stage. This approach leverages a Time Resource Matrix (TRM) tracking resource usage up to 256 cycles in the future and searches an 8-cycle window for free resources. Instructions are issued after a predetermined wait, simplifying backend schedulers which only count cycles instead of checking readiness dynamically. Instruction replay handles scheduling mispredictions, with replay rates around 7% of instructions, considered acceptable given execution resource availability. Each execution slice contains two pipelines and can run all supported RISC-V instructions. Queues (XEQs) per functional unit hold micro-ops awaiting execution, sized according to workload characteristics. One FMA unit per slice supports floating point operations, achieving up to eight FP32 FMA ops per cycle in an eight-slice configuration with 2-cycle FP add and 4-cycle multiply latencies. The load/store unit has 64-entry queues for loads, stores, and cache misses, with four pipelines in a four-slice core, enabling 64 bytes per cycle bandwidth. The level 1 data cache is physically indexed and addressed, assisted by a 64-entry fully associative data TLB. The L2 TLB is 4-way set associative with configurable size (1K to 4K entries). L2 and L3 cache capacities are configurable, with clusters sharing an L3 cache sliced for bandwidth; each slice supports 64 B/cycle. The cluster crossbar links cores and cache slices, with a 64B/cycle CHI interface to the system. Cuzco's static scheduling differs from traditional out-of-order designs, which dynamically check dependencies every cycle for instruction issue. Instead, Cuzco predicts instruction sc