The Fix
Refactor connection waiters to be cancellation safe, fixing a deadlock that could occur while attempting to get a new connection slot after a timeout.
Based on closed aio-libs/aiohttp issue #9670 · PR/commit linked
Production note: This tends to surface only under concurrency. Reproduce with load tests and watch for lock contention/cancellation paths.
@@ -0,0 +1 @@
@@ -0,0 +1 @@
+9671.bugfix.rst
\ No newline at end of file
diff --git a/CHANGES/9671.bugfix.rst b/CHANGES/9671.bugfix.rst
from fastapi import FastAPI
import asyncio
from time import time
app = FastAPI()
sleep_time = 0.3
timestamp_last_change = time()
last_recv = time()
@app.get("/getme")
async def getme():
global sleep_time, last_recv
if time()-last_recv>8:
# Go "up", forcing 0 sleep time if we don't know anymore about clients
sleep_time = 0
elif time()-timestamp_last_change>2:
if sleep_time:
sleep_time = 0
else:
sleep_time = 0.3
print(f"recv {sleep_time=:.3f}")
last_recv = time()
await asyncio.sleep(sleep_time)
print(f"fin {sleep_time=:.3f}")
return {
"asdf": "asdf",
}
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Option A — Apply the official fix\nRefactor connection waiters to be cancellation safe, fixing a deadlock that could occur while attempting to get a new connection slot after a timeout.\nWhen NOT to use: This fix should not be used if the application requires strict timeout handling without cancellation.\n\n
Why This Fix Works in Production
- Trigger: deadlock with TCPConnector limit after timeout
- Mechanism: A deadlock occurs when new requests are not sent due to sticky timeouts in TCPConnector
- If left unfixed, failures can be intermittent under concurrency (hard to reproduce; shows up as sporadic 5xx/timeouts).
Why This Breaks in Prod
- Shows up under Python 3.12.7 in real deployments (not just unit tests).
- A deadlock occurs when new requests are not sent due to sticky timeouts in TCPConnector
- Production symptom (often without a traceback): deadlock with TCPConnector limit after timeout
Proof / Evidence
- GitHub issue: #9670
- Fix PR: https://github.com/aio-libs/aiohttp/pull/9671
- Reproduced locally: No (not executed)
- Last verified: 2026-02-09
- Confidence: 0.80
- Did this fix it?: Yes (upstream fix exists)
- Own content ratio: 0.61
Discussion
High-signal excerpts from the issue thread (symptoms, repros, edge-cases).
“It would be great if you could come up with a reproducer without external dependencies as we will need to be able to create a…”
“It looks like there are two race points the available and key in self._waiters --- could have cancelled futures there The ValueError suppression is a…”
“Looks like its been a problem for a long time. reproducible on 3.9.5 as well”
“I'm not sure its fixable with the current design. key in self._waiters its not atomic in respect to if the connection is available so its…”
Failure Signature (Search String)
- deadlock with TCPConnector limit after timeout
- When using a limit in TCPConnector, timeouts can lead to a condition where new requests are not actually sent, resulting in "sticky" timeouts.
Copy-friendly signature
Failure Signature
-----------------
deadlock with TCPConnector limit after timeout
When using a limit in TCPConnector, timeouts can lead to a condition where new requests are not actually sent, resulting in "sticky" timeouts.
Error Message
Signature-only (no traceback captured)
Error Message
-------------
deadlock with TCPConnector limit after timeout
When using a limit in TCPConnector, timeouts can lead to a condition where new requests are not actually sent, resulting in "sticky" timeouts.
Minimal Reproduction
from fastapi import FastAPI
import asyncio
from time import time
app = FastAPI()
sleep_time = 0.3
timestamp_last_change = time()
last_recv = time()
@app.get("/getme")
async def getme():
global sleep_time, last_recv
if time()-last_recv>8:
# Go "up", forcing 0 sleep time if we don't know anymore about clients
sleep_time = 0
elif time()-timestamp_last_change>2:
if sleep_time:
sleep_time = 0
else:
sleep_time = 0.3
print(f"recv {sleep_time=:.3f}")
last_recv = time()
await asyncio.sleep(sleep_time)
print(f"fin {sleep_time=:.3f}")
return {
"asdf": "asdf",
}
Environment
- Python: 3.12.7
What Broke
Timeouts lead to unresponsive requests, causing potential outages in production environments.
Why It Broke
A deadlock occurs when new requests are not sent due to sticky timeouts in TCPConnector
Fix Options (Details)
Option A — Apply the official fix
Refactor connection waiters to be cancellation safe, fixing a deadlock that could occur while attempting to get a new connection slot after a timeout.
Fix reference: https://github.com/aio-libs/aiohttp/pull/9671
Last verified: 2026-02-09. Validate in your environment.
When NOT to Use This Fix
- This fix should not be used if the application requires strict timeout handling without cancellation.
Verify Fix
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Did This Fix Work in Your Case?
Quick signal helps us prioritize which fixes to verify and improve.
Prevention
- Add a stress test that runs high-concurrency workloads and fails on thread dumps / blocked locks.
- Enable watchdog dumps in prod (faulthandler, thread dump endpoint) to capture deadlocks quickly.
- Make timeouts explicit and test them (unit + integration) to avoid silent behavior changes.
- Instrument retries (attempt count + reason) and alert on spikes to catch dependency slowdowns.
Related Issues
No related fixes found.
Sources
We don’t republish the full GitHub discussion text. Use the links above for context.