[CTF] Google CTF 2025 - webz: Exploiting zlib's Huffman Code Table Author: The Amazing Digital Orange Date: August 26, 2025 Tags: CTF, Google CTF, zlib, Huffman Coding, Exploitation --- Overview The challenge webz is a zlib exploitation challenge from Google CTF 2025. The Google-zlib implementation here includes a patch (Increase Huffman Table Size) that looks like an optimization but introduces subtle vulnerabilities. While fuzzing quickly finds the bug, the write-up derives the root cause via detailed source analysis. --- Background Google-zlib implements Deflate compression with bit-level data manipulation and optimizations that reduce readability. Core components relevant to the vulnerability: inflate.c: Main decompression state machine. inftrees.c: Constructs Huffman tables for decoding. inffast.c: High-speed Huffman decoding via inflatefast. 2-1. google-zlib-Increase-Huffman-Table-Size.patch Increases Huffman table sizes substantially (ENOUGHLENS and ENOUGHDISTS) removing some oversubscription checks to trade memory for speed. This patch removes critical checks leading to vulnerabilities related to Huffman tables. 2-2. Deflate Algorithm Recap Uses LZ77 (replace repeated strings with length/distance pairs) and Huffman coding (replaces data with variable-length bit codes). Huffman codes are prefix-free, cannot be prefixes of other codes. Deflate supports: Fixed Huffman tables: static predefined tables, no table included in stream. Dynamic Huffman tables: optimal per-data Huffman tables included as code lengths in compressed stream. Deflate compresses not only literal bytes but also length and distance values via Huffman coding. Huffman tables are stored as arrays of code lengths, from which actual codes can be reconstructed. --- Code Analysis 3-1. Inflate inflate() works as a finite-state machine decoding compressed Deflate streams in blocks. Each block can be: Stored (raw data) Fixed Huffman coded Dynamic Huffman coded Dynamic blocks include code length tables compressed themselves by a code Huffman table. The process: Read block header and type. For dynamic tables, read and decode the code length Huffman table. Decode Literal/Length and Distance Huffman tables. Call inflatefast() for efficient decoding when input/output buffer has enough data. Huffman codes are decoded with bit-level macros to pull bits: inflatefast() is optimized to avoid repetitive checks, decoding literals, lengths, and distances quickly. 3-2. Huffman Table Construction (inflatetable) Inputs: Code type (Code, Literal/Length, or Distance) Code lengths array Count of codes Pointers to output Huffman decode table The code struct holds decoding info: op: operation type or extra bits bits: number of bits in this code val: symbol or base value Tables are multi-leveled (root table + sub-tables) to handle codes longer than root bits. Construction steps: Count how many symbols have each code length. Determine min/max code lengths and set root bits accordingly. Sort symbols by code length and symbol order. Assign codes in bit-reversed order (due to LSB-first stream). Fill table entries with code structs for each code and replicate entries for unused indexes. op values meaning: 0: literal > 0 and < 32: length or distance with the number indicating extra bits 32: end of block 64: invalid code marker Base/extra arrays (lbase, lext for lengths and dbase, dext for distances) allow efficient representation of lengths/distances with extra bits. 3-3. Huffman Decode in inflate_fast Maintains a bit buffer hold and number of