
"In August and early September 2025, users of Anthropic's Claude AI began reporting degraded or inconsistent responses. What initially appeared as normal performance variation turned out to be three distinct infrastructure bugs affecting Claude's output quality. While none of these issues were caused by heavy load or demand, each bug emerged in the underlying infrastructure, routing logic, or compilation pipelines."
"The team described the three overlapping issues: a context window routing error that, at the worst impacted hour on August 31, affected 16% of Sonnet 4 requests; an output corruption caused by a misconfiguration to the Claude API TPU servers that triggered an error during token generation, affecting requests made to Opus 4.1 and Opus 4 on August 25-28 and requests to Sonnet 4 from August 25 to September 2; and finally, an approximate top-k XLA:TPU miscompilation due to a latent bug in the compiler that affected requests to Claude Haiku 3.5 for almost two weeks."
"We never reduce model quality due to demand, time of day, or server load. The problems our users reported were due to infrastructure bugs alone (...) Each bug produced different symptoms on different platforms at different rates. This created a confusing mix of reports that didn't point to any single cause."
In August and early September 2025, users reported degraded or inconsistent Claude responses. Investigation found three separate infrastructure bugs that lowered output quality without relation to demand or server load. The incidents included a context-window routing error that impacted Sonnet 4, a TPU server misconfiguration causing output corruption affecting Opus and Sonnet variants, and an approximate top-k XLA:TPU miscompilation that affected Haiku 3.5 for nearly two weeks. Claude runs on AWS Trainium, NVIDIA GPUs, and Google TPUs, requiring platform-specific optimizations. The issues have been resolved and internal processes were updated to reduce recurrence.
Read at InfoQ
Unable to calculate read time
Collection
[
|
...
]