Jump to solution
Verify

The Fix

Fixes data corruption in WebSocket compressed messages when send operations are cancelled. The fix wraps compressed sends in asyncio.shield() and a lock pattern to ensure atomicity.

Based on closed aio-libs/aiohttp issue #11725 · PR/commit linked

Production note: This tends to surface only under concurrency. Reproduce with load tests and watch for lock contention/cancellation paths.

Jump to Verify Open PR/Commit
@@ -0,0 +1 @@ @@ -0,0 +1 @@ +Fixed WebSocket compressed sends to be cancellation safe. Tasks are now shielded during compression to prevent compressor state corruption. This ensures that the stateful compressor remains consistent even when send operations are cancelled -- by :user:`bdraco`. diff --git a/aiohttp/_websocket/writer.py b/aiohttp/_websocket/writer.py index fdbcda45c3c..1b27dff9371 100644
repro.py
Name: aiohttp Version: 3.13.1 Summary: Async http client/server framework (asyncio) Home-page: https://github.com/aio-libs/aiohttp Author: Author-email: License: Apache-2.0 AND MIT Location: /Users/jonathanr/src/reprodeucer/js_ws_python/.venv/lib/python3.13/site-packages Requires: aiohappyeyeballs, aiosignal, attrs, frozenlist, multidict, propcache, yarl Required-by:
verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
fix.md
Option A — Apply the official fix\nFixes data corruption in WebSocket compressed messages when send operations are cancelled. The fix wraps compressed sends in asyncio.shield() and a lock pattern to ensure atomicity.\nWhen NOT to use: This fix should not be used if the application cannot tolerate slight changes in cancellation timing.\n\nOption C — Workaround\ndescribed in the issue is to use shield(), so that ensures that the message is always sent. We may want to use the same solution internally?\nWhen NOT to use: This fix should not be used if the application cannot tolerate slight changes in cancellation timing.\n\n

Why This Fix Works in Production

  • Trigger: Note that we took very good care to lock our consecutive call to make sure there was no race at the application level.
  • Mechanism: Cancellation during WebSocket send operations leads to corrupted compressed data due to shared compressor state
Production impact:
  • If left unfixed, failures can be intermittent under concurrency (hard to reproduce; shows up as sporadic 5xx/timeouts).

Why This Breaks in Prod

  • Shows up under Python 3.13.7 in real deployments (not just unit tests).
  • Cancellation during WebSocket send operations leads to corrupted compressed data due to shared compressor state
  • Production symptom (often without a traceback): Note that we took very good care to lock our consecutive call to make sure there was no race at the application level.

Proof / Evidence

Discussion

High-signal excerpts from the issue thread (symptoms, repros, edge-cases).

“I think we can accept this as a bug or should document its not cancellation safe”
@bdraco · 2025-10-27 · repro detail · source
“So the whole API design is not cancel safe. We need to flush while holding the lock as well”
@bdraco · 2025-10-27 · source
“As soon as we compress data the internal state is modified.. we HAVE to send the data or the state is corrupt”
@bdraco · 2025-10-27 · source
“So the lock is in the wrong place. We need to have a lock around compress+write.”
@bdraco · 2025-10-27 · source

Failure Signature (Search String)

  • Note that we took very good care to lock our consecutive call to make sure there was no race at the application level.
  • We would expect that a send statement is "cancel safe" and that subsequent send statement do not use/leak previous data.
Copy-friendly signature
signature.txt
Failure Signature ----------------- Note that we took very good care to lock our consecutive call to make sure there was no race at the application level. We would expect that a send statement is "cancel safe" and that subsequent send statement do not use/leak previous data.

Error Message

Signature-only (no traceback captured)
error.txt
Error Message ------------- Note that we took very good care to lock our consecutive call to make sure there was no race at the application level. We would expect that a send statement is "cancel safe" and that subsequent send statement do not use/leak previous data.

Minimal Reproduction

repro.py
Name: aiohttp Version: 3.13.1 Summary: Async http client/server framework (asyncio) Home-page: https://github.com/aio-libs/aiohttp Author: Author-email: License: Apache-2.0 AND MIT Location: /Users/jonathanr/src/reprodeucer/js_ws_python/.venv/lib/python3.13/site-packages Requires: aiohappyeyeballs, aiosignal, attrs, frozenlist, multidict, propcache, yarl Required-by:

Environment

  • Python: 3.13.7

What Broke

Users experience concatenated messages on the receiving side due to send operation cancellations.

Why It Broke

Cancellation during WebSocket send operations leads to corrupted compressed data due to shared compressor state

Fix Options (Details)

Option A — Apply the official fix

Fixes data corruption in WebSocket compressed messages when send operations are cancelled. The fix wraps compressed sends in asyncio.shield() and a lock pattern to ensure atomicity.

When NOT to use: This fix should not be used if the application cannot tolerate slight changes in cancellation timing.

Option C — Workaround Temporary workaround

described in the issue is to use shield(), so that ensures that the message is always sent. We may want to use the same solution internally?

When NOT to use: This fix should not be used if the application cannot tolerate slight changes in cancellation timing.

Use only if you cannot change versions today. Treat this as a stopgap and remove once upgraded.

Fix reference: https://github.com/aio-libs/aiohttp/pull/11726

Last verified: 2026-02-09. Validate in your environment.

Get updates

We publish verified fixes weekly. No spam.

Subscribe

When NOT to Use This Fix

  • This fix should not be used if the application cannot tolerate slight changes in cancellation timing.

Verify Fix

verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.

Did This Fix Work in Your Case?

Quick signal helps us prioritize which fixes to verify and improve.

Prevention

  • Add a stress test that runs high-concurrency workloads and fails on thread dumps / blocked locks.
  • Enable watchdog dumps in prod (faulthandler, thread dump endpoint) to capture deadlocks quickly.

Related Issues

No related fixes found.

Sources

We don’t republish the full GitHub discussion text. Use the links above for context.