The Fix
pip install celery==5.3.0b2
Based on closed celery/celery issue #6819 · PR/commit linked
Production note: This usually shows up under retries/timeouts. Treat it as a side-effect risk until you can verify behavior with a canary + real traffic.
@@ -229,6 +229,7 @@ def __init__(self, main=None, loader=None, backend=None,
self._local = threading.local()
+ self._backend_cache = None
self.clock = LamportClock()
from django.http import HttpResponse
from redis_leak.celery import debug_task
import threading
_local = threading.local()
def run_task(request):
# res = debug_task.apply_async()
# result = res.get()
print("\n start request")
print("get_ident", threading.get_ident())
print("current_thread", threading.current_thread())
if hasattr(_local,"attr"):
print("_local has attr", _local.attr)
else:
_local.attr = threading.get_ident()
print("_local has no attr, store", _local.attr, "as attr on _local ")
return HttpResponse("foo")
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Option A — Upgrade to fixed release\npip install celery==5.3.0b2\nWhen NOT to use: This fix should not be used if the backend is not thread-safe.\n\nOption C — Workaround\nseems to be setting the ``timeout`` setting in redis.conf ([see here](https://redis.io/topics/clients#client-timeouts)) which eventually closes the stale connections from Redis's side.\nWhen NOT to use: This fix should not be used if the backend is not thread-safe.\n\n
Why This Fix Works in Production
- Trigger: Redis result backend connections leak
- Mechanism: The Redis backend was not properly managing connection lifecycles, leading to leaks
- Why the fix works: Addresses the Redis connection leak issue by allowing users to opt-in to sharing the backend object across threads if the backend is thread safe. (first fixed release: 5.3.0b2).
- If left unfixed, this can cause silent data inconsistencies that propagate (bad cache entries, incorrect downstream decisions).
Why This Breaks in Prod
- The Redis backend was not properly managing connection lifecycles, leading to leaks
- Production symptom (often without a traceback): Redis result backend connections leak
Proof / Evidence
- GitHub issue: #6819
- Fix PR: https://github.com/celery/celery/pull/8058
- First fixed release: 5.3.0b2
- Reproduced locally: No (not executed)
- Last verified: 2026-02-09
- Confidence: 0.75
- Did this fix it?: Yes (upstream fix exists)
- Own content ratio: 0.61
Discussion
High-signal excerpts from the issue thread (symptoms, repros, edge-cases).
“@matusvalo there is a repro of the leak available here: https://github.com/LivePreso/redis-leak”
“I've confirmed the repro at https://github.com/LivePreso/redis-leak still holds with the new 5.2.0 release.”
“An update: I tried making oid and backend properties of celery shared between all threads by using if-lock-if but got into trouble as I think…”
“G'day folks, we've been seeing what I think is the same issue”
Failure Signature (Search String)
- Redis result backend connections leak
- * [x] I have included all related issues and possible duplicate issues
Copy-friendly signature
Failure Signature
-----------------
Redis result backend connections leak
* [x] I have included all related issues and possible duplicate issues
Error Message
Signature-only (no traceback captured)
Error Message
-------------
Redis result backend connections leak
* [x] I have included all related issues and possible duplicate issues
Minimal Reproduction
from django.http import HttpResponse
from redis_leak.celery import debug_task
import threading
_local = threading.local()
def run_task(request):
# res = debug_task.apply_async()
# result = res.get()
print("\n start request")
print("get_ident", threading.get_ident())
print("current_thread", threading.current_thread())
if hasattr(_local,"attr"):
print("_local has attr", _local.attr)
else:
_local.attr = threading.get_ident()
print("_local has no attr, store", _local.attr, "as attr on _local ")
return HttpResponse("foo")
What Broke
Connections to Redis were not being closed, causing resource exhaustion.
Why It Broke
The Redis backend was not properly managing connection lifecycles, leading to leaks
Fix Options (Details)
Option A — Upgrade to fixed release Safe default (recommended)
pip install celery==5.3.0b2
Use when you can deploy the upstream fix. It is usually lower-risk than long-lived workarounds.
Option C — Workaround Temporary workaround
seems to be setting the ``timeout`` setting in redis.conf ([see here](https://redis.io/topics/clients#client-timeouts)) which eventually closes the stale connections from Redis's side.
Use only if you cannot change versions today. Treat this as a stopgap and remove once upgraded.
Option D — Guard side-effects with OnceOnly Guardrail for side-effects
Mitigate duplicate external side-effects under retries/timeouts/agent loops by gating the operation before calling external systems.
- Place OnceOnly between your code/agent and real side-effects (Stripe, emails, CRM, APIs).
- Use a stable key per side-effect (e.g., customer_id + action + idempotency_key).
- Fail-safe: configure fail-open vs fail-closed based on blast radius and spend risk.
- This does NOT fix data corruption; it only prevents duplicate side-effects.
Show example snippet (optional)
from onceonly import OnceOnly
import os
once = OnceOnly(api_key=os.environ["ONCEONLY_API_KEY"], fail_open=True)
# Stable idempotency key per real side-effect.
# Use a request id / job id / webhook delivery id / Stripe event id, etc.
event_id = "evt_..." # replace
key = f"stripe:webhook:{event_id}"
res = once.check_lock(key=key, ttl=3600)
if res.duplicate:
return {"status": "already_processed"}
# Safe to execute the side-effect exactly once.
handle_event(event_id)
Fix reference: https://github.com/celery/celery/pull/8058
First fixed release: 5.3.0b2
Last verified: 2026-02-09. Validate in your environment.
When NOT to Use This Fix
- This fix should not be used if the backend is not thread-safe.
- Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.
Verify Fix
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Did This Fix Work in Your Case?
Quick signal helps us prioritize which fixes to verify and improve.
Prevention
- Capture the exact failing error string in logs and tests so you can reproduce via a minimal script.
- Pin production dependencies and upgrade only with a reproducible test that hits the failing path.
Version Compatibility Table
| Version | Status |
|---|---|
| 5.3.0b2 | Fixed |
Related Issues
No related fixes found.
Sources
We don’t republish the full GitHub discussion text. Use the links above for context.