The Fix
Bump pycares from 4.8.0 to 4.9.0 to resolve hanging tests in test_leaks.py.
Based on closed aio-libs/aiohttp issue #11244 · PR/commit linked
Production note: This usually shows up under retries/timeouts. Treat it as a side-effect risk until you can verify behavior with a canary + real traffic.
@@ -36,7 +36,7 @@ propcache==0.3.2
# -r requirements/runtime-deps.in
# yarl
-pycares==4.8.0
+pycares==4.9.0
# via aiodns
[ 57s] + pytest-3.11 --ignore=_build.python311 --ignore=_build.python312 --ignore=_build.python313 -v --timeout=10 tests/test_leaks.py
[ 58s] ============================= test session starts ==============================
[ 58s] platform linux -- Python 3.11.13, pytest-8.3.5, pluggy-1.6.0 -- /usr/bin/python3.11
[ 58s] cachedir: .pytest_cache
[ 58s] rootdir: /home/abuild/rpmbuild/BUILD/python-aiohttp-3.12.13-build/aiohttp-3.12.13
[ 58s] configfile: setup.cfg
[ 58s] plugins: cov-5.0.0, mock-3.14.0, timeout-2.3.1, xdist-3.6.1, time-machine-2.16.0
[ 58s] timeout: 10.0s
[ 58s] timeout method: signal
[ 58s] timeout func_only: False
[ 58s] created: 4/4 workers
[ 58s] 4 workers [2 items]
[ 58s]
[ 58s] scheduling tests via LoadScheduling
[ 58s]
[ 58s] tests/test_leaks.py::test_leak[check_for_client_response_leak.py-ClientResponse leaked]
[ 59s] tests/test_leaks.py::test_leak[check_for_request_leak.py-Request leaked]
(after a long long time the VM would be killed for inactivity, no useful logs)
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Option A — Apply the official fix\nBump pycares from 4.8.0 to 4.9.0 to resolve hanging tests in test_leaks.py.\nWhen NOT to use: This fix is not safe if the application relies on maximum threading for performance.\n\n
Why This Fix Works in Production
- Trigger: Some tests from test_leaks.py hang with pycares 4.9
- Mechanism: Bump pycares from 4.8.0 to 4.9.0 to resolve hanging tests in test_leaks.py.
- If left unfixed, retry loops can amplify load and turn a small outage into a cascade (thundering herd).
Why This Breaks in Prod
- Shows up under Python 3.11.13 in real deployments (not just unit tests).
- Production symptom (often without a traceback): Some tests from test_leaks.py hang with pycares 4.9
Proof / Evidence
- GitHub issue: #11244
- Fix PR: https://github.com/aio-libs/aiohttp/pull/11206
- Reproduced locally: No (not executed)
- Last verified: 2026-02-09
- Confidence: 0.70
- Did this fix it?: Yes (upstream fix exists)
- Own content ratio: 0.46
Discussion
High-signal excerpts from the issue thread (symptoms, repros, edge-cases).
“We'll probably need some more information than this, given that the test suite successfully upgraded to 4.9 already: https://github.com/aio-libs/aiohttp/pull/11206”
“Well, then I will investigate more. Currently I am AFK, but I will take a look ať some point.”
“Hi, after some investigation I found out what caused this issue”
Failure Signature (Search String)
- Some tests from test_leaks.py hang with pycares 4.9
- Run tests in test_leaks.py
Copy-friendly signature
Failure Signature
-----------------
Some tests from test_leaks.py hang with pycares 4.9
Run tests in test_leaks.py
Error Message
Signature-only (no traceback captured)
Error Message
-------------
Some tests from test_leaks.py hang with pycares 4.9
Run tests in test_leaks.py
Minimal Reproduction
[ 57s] + pytest-3.11 --ignore=_build.python311 --ignore=_build.python312 --ignore=_build.python313 -v --timeout=10 tests/test_leaks.py
[ 58s] ============================= test session starts ==============================
[ 58s] platform linux -- Python 3.11.13, pytest-8.3.5, pluggy-1.6.0 -- /usr/bin/python3.11
[ 58s] cachedir: .pytest_cache
[ 58s] rootdir: /home/abuild/rpmbuild/BUILD/python-aiohttp-3.12.13-build/aiohttp-3.12.13
[ 58s] configfile: setup.cfg
[ 58s] plugins: cov-5.0.0, mock-3.14.0, timeout-2.3.1, xdist-3.6.1, time-machine-2.16.0
[ 58s] timeout: 10.0s
[ 58s] timeout method: signal
[ 58s] timeout func_only: False
[ 58s] created: 4/4 workers
[ 58s] 4 workers [2 items]
[ 58s]
[ 58s] scheduling tests via LoadScheduling
[ 58s]
[ 58s] tests/test_leaks.py::test_leak[check_for_client_response_leak.py-ClientResponse leaked]
[ 59s] tests/test_leaks.py::test_leak[check_for_request_leak.py-Request leaked]
(after a long long time the VM would be killed for inactivity, no useful logs)
Environment
- Python: 3.11.13
What Broke
Tests hang indefinitely, leading to VM termination due to inactivity.
Fix Options (Details)
Option A — Apply the official fix
Bump pycares from 4.8.0 to 4.9.0 to resolve hanging tests in test_leaks.py.
Option D — Guard side-effects with OnceOnly Guardrail for side-effects
Mitigate duplicate external side-effects under retries/timeouts/agent loops by gating the operation before calling external systems.
- Place OnceOnly between your code/agent and real side-effects (Stripe, emails, CRM, APIs).
- Use a stable key per side-effect (e.g., customer_id + action + idempotency_key).
- Fail-safe: configure fail-open vs fail-closed based on blast radius and spend risk.
- This is most useful when retries/timeouts can re-trigger the same external call.
Show example snippet (optional)
from onceonly import OnceOnly
import os
once = OnceOnly(api_key=os.environ["ONCEONLY_API_KEY"], fail_open=True)
# Stable idempotency key per real side-effect.
# Use a request id / job id / webhook delivery id / Stripe event id, etc.
event_id = "evt_..." # replace
key = f"stripe:webhook:{event_id}"
res = once.check_lock(key=key, ttl=3600)
if res.duplicate:
return {"status": "already_processed"}
# Safe to execute the side-effect exactly once.
handle_event(event_id)
Fix reference: https://github.com/aio-libs/aiohttp/pull/11206
Last verified: 2026-02-09. Validate in your environment.
When NOT to Use This Fix
- This fix is not safe if the application relies on maximum threading for performance.
- Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.
Verify Fix
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Did This Fix Work in Your Case?
Quick signal helps us prioritize which fixes to verify and improve.
Prevention
- Make timeouts explicit and test them (unit + integration) to avoid silent behavior changes.
- Instrument retries (attempt count + reason) and alert on spikes to catch dependency slowdowns.
Related Issues
No related fixes found.
Sources
We don’t republish the full GitHub discussion text. Use the links above for context.