Postmortem: Three Recent Infrastructure Issues Affecting Claude’s Response Quality Published: Sep 17, 2025 Summary: This technical report explains three infrastructure bugs that intermittently degraded Claude's response quality between August and early September 2025, detailing what went wrong, why fixes took time, and measures to prevent recurrence. --- Overview of the Incident Starting early August, user feedback indicated degraded response quality from Claude. Initial reports appeared similar to normal variation but grew in frequency by late August. Investigation uncovered three distinct infrastructure bugs, not related to demand or server load. Anthropic maintains a high bar for consistent model quality but didn't meet it during these issues. This postmortem offers a transparent and detailed explanation due to the issues' complexity. --- How Claude Is Served at Scale Served via first-party API, Amazon Bedrock, and Google Cloud’s Vertex AI. Deployments span multiple hardware platforms: AWS Trainium, NVIDIA GPUs, Google TPUs. Each platform requires custom optimizations but must deliver equivalent model quality. Infrastructure changes need extensive validation across platforms and configurations. --- Timeline of Events First bug introduced on August 5, affecting ~0.8% of Sonnet 4 requests. Two more bugs appeared after deployments on August 25 and 26. A load balancing change on August 29 amplified affected traffic, causing up to 16% of Sonnet 4 requests to be impacted during peak. User experiences were mixed and inconsistent, complicating diagnosis. --- Details of the Three Overlapping Bugs Context Window Routing Error Misrouting of some Sonnet 4 requests meant short-context queries sent to servers configured for 1 million token context window. Initially minor (~0.8%), peaking at 16% of requests on August 31 due to load balancing changes. Particularly affected Claude Code users (~30% had one misrouted message). Bedrock, Vertex AI platforms had small impact. Fix: Corrected routing logic ensuring proper server assignment, deployed Sept 4 and rolled out across platforms by Sept 16 (ongoing for Bedrock). Output Corruption Misconfiguration on TPU servers caused token generation errors from August 25. Unexpected tokens, such as foreign characters (Thai, Chinese) in English prompts, or syntax errors in code responses. Affected Opus 4.1, Opus 4 (Aug 25-28) and Sonnet 4 (Aug 25-Sept 2). Third-party platforms not affected. Fix: Rollback on Sept 2, added detection tests for abnormal character outputs. Approximate Top-k XLA:TPU Miscompilation Deployment on August 25 introduced a latent compiler bug affecting top-k token selection on TPU using XLA. Implicated Claude Haiku 3.5, possibly some Sonnet 4 and Opus 3 requests. Not present on third-party platforms. Bug led to incorrect token probability calculations, sometimes dropping highest probability tokens, causing inconsistent output quality. Fix: Rollback for Haiku 3.5 (Sept 4) and Opus 3 (Sept 12), switched from approximate to exact top-k sampling with enhanced precision, collaborating with XLA:TPU compiler team on a permanent fix. --- Technical Deep Dive: XLA Compiler Bug Claude uses top-p sampling on distributed TPU chips to select tokens during generation. Mixed precision arithmetic (bf16 and fp32) introduced inconsistencies. December 2024 workaround addressed dropped tokens with zero temperature sampling. August 26 changes removed workaround but exposed a deeper bug in approximate top-k optimization on TPU. The approximate method sometimes returned wrong top tokens due to latent compiler bugs. Bug exhibited inconsistent behavior influenced by batch size, runtime conditions, and debugging tools. Switched to exact top-k sampling despite minor efficiency trade-offs to preserve model quality. --- Challenges in