Jump to solution
Verify

The Fix

Bump pycares from 4.8.0 to 4.9.0 to resolve hanging tests in test_leaks.py.

Based on closed aio-libs/aiohttp issue #11244 · PR/commit linked

Production note: This usually shows up under retries/timeouts. Treat it as a side-effect risk until you can verify behavior with a canary + real traffic.

Jump to Verify Open PR/Commit
@@ -36,7 +36,7 @@ propcache==0.3.2 # -r requirements/runtime-deps.in # yarl -pycares==4.8.0 +pycares==4.9.0 # via aiodns
repro.py
[ 57s] + pytest-3.11 --ignore=_build.python311 --ignore=_build.python312 --ignore=_build.python313 -v --timeout=10 tests/test_leaks.py [ 58s] ============================= test session starts ============================== [ 58s] platform linux -- Python 3.11.13, pytest-8.3.5, pluggy-1.6.0 -- /usr/bin/python3.11 [ 58s] cachedir: .pytest_cache [ 58s] rootdir: /home/abuild/rpmbuild/BUILD/python-aiohttp-3.12.13-build/aiohttp-3.12.13 [ 58s] configfile: setup.cfg [ 58s] plugins: cov-5.0.0, mock-3.14.0, timeout-2.3.1, xdist-3.6.1, time-machine-2.16.0 [ 58s] timeout: 10.0s [ 58s] timeout method: signal [ 58s] timeout func_only: False [ 58s] created: 4/4 workers [ 58s] 4 workers [2 items] [ 58s] [ 58s] scheduling tests via LoadScheduling [ 58s] [ 58s] tests/test_leaks.py::test_leak[check_for_client_response_leak.py-ClientResponse leaked] [ 59s] tests/test_leaks.py::test_leak[check_for_request_leak.py-Request leaked] (after a long long time the VM would be killed for inactivity, no useful logs)
verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
fix.md
Option A — Apply the official fix\nBump pycares from 4.8.0 to 4.9.0 to resolve hanging tests in test_leaks.py.\nWhen NOT to use: This fix is not safe if the application relies on maximum threading for performance.\n\n

Why This Fix Works in Production

  • Trigger: Some tests from test_leaks.py hang with pycares 4.9
  • Mechanism: Bump pycares from 4.8.0 to 4.9.0 to resolve hanging tests in test_leaks.py.
Production impact:
  • If left unfixed, retry loops can amplify load and turn a small outage into a cascade (thundering herd).

Why This Breaks in Prod

  • Shows up under Python 3.11.13 in real deployments (not just unit tests).
  • Production symptom (often without a traceback): Some tests from test_leaks.py hang with pycares 4.9

Proof / Evidence

Discussion

High-signal excerpts from the issue thread (symptoms, repros, edge-cases).

“We'll probably need some more information than this, given that the test suite successfully upgraded to 4.9 already: https://github.com/aio-libs/aiohttp/pull/11206”
@Dreamsorcerer · 2025-06-26 · source
“Well, then I will investigate more. Currently I am AFK, but I will take a look ať some point.”
@MeggyCal · 2025-06-26 · source
“Hi, after some investigation I found out what caused this issue”
@tiltingpenguin · 2025-07-24 · source

Failure Signature (Search String)

  • Some tests from test_leaks.py hang with pycares 4.9
  • Run tests in test_leaks.py
Copy-friendly signature
signature.txt
Failure Signature ----------------- Some tests from test_leaks.py hang with pycares 4.9 Run tests in test_leaks.py

Error Message

Signature-only (no traceback captured)
error.txt
Error Message ------------- Some tests from test_leaks.py hang with pycares 4.9 Run tests in test_leaks.py

Minimal Reproduction

repro.py
[ 57s] + pytest-3.11 --ignore=_build.python311 --ignore=_build.python312 --ignore=_build.python313 -v --timeout=10 tests/test_leaks.py [ 58s] ============================= test session starts ============================== [ 58s] platform linux -- Python 3.11.13, pytest-8.3.5, pluggy-1.6.0 -- /usr/bin/python3.11 [ 58s] cachedir: .pytest_cache [ 58s] rootdir: /home/abuild/rpmbuild/BUILD/python-aiohttp-3.12.13-build/aiohttp-3.12.13 [ 58s] configfile: setup.cfg [ 58s] plugins: cov-5.0.0, mock-3.14.0, timeout-2.3.1, xdist-3.6.1, time-machine-2.16.0 [ 58s] timeout: 10.0s [ 58s] timeout method: signal [ 58s] timeout func_only: False [ 58s] created: 4/4 workers [ 58s] 4 workers [2 items] [ 58s] [ 58s] scheduling tests via LoadScheduling [ 58s] [ 58s] tests/test_leaks.py::test_leak[check_for_client_response_leak.py-ClientResponse leaked] [ 59s] tests/test_leaks.py::test_leak[check_for_request_leak.py-Request leaked] (after a long long time the VM would be killed for inactivity, no useful logs)

Environment

  • Python: 3.11.13

What Broke

Tests hang indefinitely, leading to VM termination due to inactivity.

Fix Options (Details)

Option A — Apply the official fix

Bump pycares from 4.8.0 to 4.9.0 to resolve hanging tests in test_leaks.py.

When NOT to use: This fix is not safe if the application relies on maximum threading for performance.

Option D — Guard side-effects with OnceOnly Guardrail for side-effects

Mitigate duplicate external side-effects under retries/timeouts/agent loops by gating the operation before calling external systems.

  • Place OnceOnly between your code/agent and real side-effects (Stripe, emails, CRM, APIs).
  • Use a stable key per side-effect (e.g., customer_id + action + idempotency_key).
  • Fail-safe: configure fail-open vs fail-closed based on blast radius and spend risk.
  • This is most useful when retries/timeouts can re-trigger the same external call.
Show example snippet (optional)
onceonly.py
from onceonly import OnceOnly import os once = OnceOnly(api_key=os.environ["ONCEONLY_API_KEY"], fail_open=True) # Stable idempotency key per real side-effect. # Use a request id / job id / webhook delivery id / Stripe event id, etc. event_id = "evt_..." # replace key = f"stripe:webhook:{event_id}" res = once.check_lock(key=key, ttl=3600) if res.duplicate: return {"status": "already_processed"} # Safe to execute the side-effect exactly once. handle_event(event_id)

See OnceOnly SDK

When NOT to use: Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.

Fix reference: https://github.com/aio-libs/aiohttp/pull/11206

Last verified: 2026-02-09. Validate in your environment.

Get updates

We publish verified fixes weekly. No spam.

Subscribe

When NOT to Use This Fix

  • This fix is not safe if the application relies on maximum threading for performance.
  • Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.

Verify Fix

verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.

Did This Fix Work in Your Case?

Quick signal helps us prioritize which fixes to verify and improve.

Prevention

  • Make timeouts explicit and test them (unit + integration) to avoid silent behavior changes.
  • Instrument retries (attempt count + reason) and alert on spikes to catch dependency slowdowns.

Related Issues

No related fixes found.

Sources

We don’t republish the full GitHub discussion text. Use the links above for context.