Jump to solution
Verify

The Fix

Refactor connection waiters to be cancellation safe, fixing a deadlock that could occur while attempting to get a new connection slot after a timeout.

Based on closed aio-libs/aiohttp issue #9670 · PR/commit linked

Production note: This tends to surface only under concurrency. Reproduce with load tests and watch for lock contention/cancellation paths.

Jump to Verify Open PR/Commit
@@ -0,0 +1 @@ @@ -0,0 +1 @@ +9671.bugfix.rst \ No newline at end of file diff --git a/CHANGES/9671.bugfix.rst b/CHANGES/9671.bugfix.rst
repro.py
from fastapi import FastAPI import asyncio from time import time app = FastAPI() sleep_time = 0.3 timestamp_last_change = time() last_recv = time() @app.get("/getme") async def getme(): global sleep_time, last_recv if time()-last_recv>8: # Go "up", forcing 0 sleep time if we don't know anymore about clients sleep_time = 0 elif time()-timestamp_last_change>2: if sleep_time: sleep_time = 0 else: sleep_time = 0.3 print(f"recv {sleep_time=:.3f}") last_recv = time() await asyncio.sleep(sleep_time) print(f"fin {sleep_time=:.3f}") return { "asdf": "asdf", }
verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
fix.md
Option A — Apply the official fix\nRefactor connection waiters to be cancellation safe, fixing a deadlock that could occur while attempting to get a new connection slot after a timeout.\nWhen NOT to use: This fix should not be used if the application requires strict timeout handling without cancellation.\n\n

Why This Fix Works in Production

  • Trigger: deadlock with TCPConnector limit after timeout
  • Mechanism: A deadlock occurs when new requests are not sent due to sticky timeouts in TCPConnector
Production impact:
  • If left unfixed, failures can be intermittent under concurrency (hard to reproduce; shows up as sporadic 5xx/timeouts).

Why This Breaks in Prod

  • Shows up under Python 3.12.7 in real deployments (not just unit tests).
  • A deadlock occurs when new requests are not sent due to sticky timeouts in TCPConnector
  • Production symptom (often without a traceback): deadlock with TCPConnector limit after timeout

Proof / Evidence

Discussion

High-signal excerpts from the issue thread (symptoms, repros, edge-cases).

“It would be great if you could come up with a reproducer without external dependencies as we will need to be able to create a…”
@bdraco · 2024-11-04 · source
“It looks like there are two race points the available and key in self._waiters --- could have cancelled futures there The ValueError suppression is a…”
@bdraco · 2024-11-04 · source
“Looks like its been a problem for a long time. reproducible on 3.9.5 as well”
@bdraco · 2024-11-04 · source
“I'm not sure its fixable with the current design. key in self._waiters its not atomic in respect to if the connection is available so its…”
@bdraco · 2024-11-04 · source

Failure Signature (Search String)

  • deadlock with TCPConnector limit after timeout
  • When using a limit in TCPConnector, timeouts can lead to a condition where new requests are not actually sent, resulting in "sticky" timeouts.
Copy-friendly signature
signature.txt
Failure Signature ----------------- deadlock with TCPConnector limit after timeout When using a limit in TCPConnector, timeouts can lead to a condition where new requests are not actually sent, resulting in "sticky" timeouts.

Error Message

Signature-only (no traceback captured)
error.txt
Error Message ------------- deadlock with TCPConnector limit after timeout When using a limit in TCPConnector, timeouts can lead to a condition where new requests are not actually sent, resulting in "sticky" timeouts.

Minimal Reproduction

repro.py
from fastapi import FastAPI import asyncio from time import time app = FastAPI() sleep_time = 0.3 timestamp_last_change = time() last_recv = time() @app.get("/getme") async def getme(): global sleep_time, last_recv if time()-last_recv>8: # Go "up", forcing 0 sleep time if we don't know anymore about clients sleep_time = 0 elif time()-timestamp_last_change>2: if sleep_time: sleep_time = 0 else: sleep_time = 0.3 print(f"recv {sleep_time=:.3f}") last_recv = time() await asyncio.sleep(sleep_time) print(f"fin {sleep_time=:.3f}") return { "asdf": "asdf", }

Environment

  • Python: 3.12.7

What Broke

Timeouts lead to unresponsive requests, causing potential outages in production environments.

Why It Broke

A deadlock occurs when new requests are not sent due to sticky timeouts in TCPConnector

Fix Options (Details)

Option A — Apply the official fix

Refactor connection waiters to be cancellation safe, fixing a deadlock that could occur while attempting to get a new connection slot after a timeout.

When NOT to use: This fix should not be used if the application requires strict timeout handling without cancellation.

Fix reference: https://github.com/aio-libs/aiohttp/pull/9671

Last verified: 2026-02-09. Validate in your environment.

Get updates

We publish verified fixes weekly. No spam.

Subscribe

When NOT to Use This Fix

  • This fix should not be used if the application requires strict timeout handling without cancellation.

Verify Fix

verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.

Did This Fix Work in Your Case?

Quick signal helps us prioritize which fixes to verify and improve.

Prevention

  • Add a stress test that runs high-concurrency workloads and fails on thread dumps / blocked locks.
  • Enable watchdog dumps in prod (faulthandler, thread dump endpoint) to capture deadlocks quickly.
  • Make timeouts explicit and test them (unit + integration) to avoid silent behavior changes.
  • Instrument retries (attempt count + reason) and alert on spikes to catch dependency slowdowns.

Related Issues

No related fixes found.

Sources

We don’t republish the full GitHub discussion text. Use the links above for context.