Jump to solution
Verify

The Fix

pip install celery==5.3.0b2

Based on closed celery/celery issue #6819 · PR/commit linked

Production note: This usually shows up under retries/timeouts. Treat it as a side-effect risk until you can verify behavior with a canary + real traffic.

Jump to Verify Open PR/Commit
@@ -229,6 +229,7 @@ def __init__(self, main=None, loader=None, backend=None, self._local = threading.local() + self._backend_cache = None self.clock = LamportClock()
repro.py
from django.http import HttpResponse from redis_leak.celery import debug_task import threading _local = threading.local() def run_task(request): # res = debug_task.apply_async() # result = res.get() print("\n start request") print("get_ident", threading.get_ident()) print("current_thread", threading.current_thread()) if hasattr(_local,"attr"): print("_local has attr", _local.attr) else: _local.attr = threading.get_ident() print("_local has no attr, store", _local.attr, "as attr on _local ") return HttpResponse("foo")
verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
fix.md
Option A — Upgrade to fixed release\npip install celery==5.3.0b2\nWhen NOT to use: This fix should not be used if the backend is not thread-safe.\n\nOption C — Workaround\nseems to be setting the ``timeout`` setting in redis.conf ([see here](https://redis.io/topics/clients#client-timeouts)) which eventually closes the stale connections from Redis's side.\nWhen NOT to use: This fix should not be used if the backend is not thread-safe.\n\n

Why This Fix Works in Production

  • Trigger: Redis result backend connections leak
  • Mechanism: The Redis backend was not properly managing connection lifecycles, leading to leaks
  • Why the fix works: Addresses the Redis connection leak issue by allowing users to opt-in to sharing the backend object across threads if the backend is thread safe. (first fixed release: 5.3.0b2).
Production impact:
  • If left unfixed, this can cause silent data inconsistencies that propagate (bad cache entries, incorrect downstream decisions).

Why This Breaks in Prod

  • The Redis backend was not properly managing connection lifecycles, leading to leaks
  • Production symptom (often without a traceback): Redis result backend connections leak

Proof / Evidence

  • GitHub issue: #6819
  • Fix PR: https://github.com/celery/celery/pull/8058
  • First fixed release: 5.3.0b2
  • Reproduced locally: No (not executed)
  • Last verified: 2026-02-09
  • Confidence: 0.75
  • Did this fix it?: Yes (upstream fix exists)
  • Own content ratio: 0.61

Discussion

High-signal excerpts from the issue thread (symptoms, repros, edge-cases).

“@matusvalo there is a repro of the leak available here: https://github.com/LivePreso/redis-leak”
@bennullgraham · 2022-03-29 · repro detail · source
“I've confirmed the repro at https://github.com/LivePreso/redis-leak still holds with the new 5.2.0 release.”
@bennullgraham · 2021-11-16 · repro detail · source
“An update: I tried making oid and backend properties of celery shared between all threads by using if-lock-if but got into trouble as I think…”
@ronlut · 2021-06-30 · source
“G'day folks, we've been seeing what I think is the same issue”
@bennullgraham · 2021-11-03 · repro detail · source

Failure Signature (Search String)

  • Redis result backend connections leak
  • * [x] I have included all related issues and possible duplicate issues
Copy-friendly signature
signature.txt
Failure Signature ----------------- Redis result backend connections leak * [x] I have included all related issues and possible duplicate issues

Error Message

Signature-only (no traceback captured)
error.txt
Error Message ------------- Redis result backend connections leak * [x] I have included all related issues and possible duplicate issues

Minimal Reproduction

repro.py
from django.http import HttpResponse from redis_leak.celery import debug_task import threading _local = threading.local() def run_task(request): # res = debug_task.apply_async() # result = res.get() print("\n start request") print("get_ident", threading.get_ident()) print("current_thread", threading.current_thread()) if hasattr(_local,"attr"): print("_local has attr", _local.attr) else: _local.attr = threading.get_ident() print("_local has no attr, store", _local.attr, "as attr on _local ") return HttpResponse("foo")

What Broke

Connections to Redis were not being closed, causing resource exhaustion.

Why It Broke

The Redis backend was not properly managing connection lifecycles, leading to leaks

Fix Options (Details)

Option A — Upgrade to fixed release Safe default (recommended)

pip install celery==5.3.0b2

When NOT to use: This fix should not be used if the backend is not thread-safe.

Use when you can deploy the upstream fix. It is usually lower-risk than long-lived workarounds.

Option C — Workaround Temporary workaround

seems to be setting the ``timeout`` setting in redis.conf ([see here](https://redis.io/topics/clients#client-timeouts)) which eventually closes the stale connections from Redis's side.

When NOT to use: This fix should not be used if the backend is not thread-safe.

Use only if you cannot change versions today. Treat this as a stopgap and remove once upgraded.

Option D — Guard side-effects with OnceOnly Guardrail for side-effects

Mitigate duplicate external side-effects under retries/timeouts/agent loops by gating the operation before calling external systems.

  • Place OnceOnly between your code/agent and real side-effects (Stripe, emails, CRM, APIs).
  • Use a stable key per side-effect (e.g., customer_id + action + idempotency_key).
  • Fail-safe: configure fail-open vs fail-closed based on blast radius and spend risk.
  • This does NOT fix data corruption; it only prevents duplicate side-effects.
Show example snippet (optional)
onceonly.py
from onceonly import OnceOnly import os once = OnceOnly(api_key=os.environ["ONCEONLY_API_KEY"], fail_open=True) # Stable idempotency key per real side-effect. # Use a request id / job id / webhook delivery id / Stripe event id, etc. event_id = "evt_..." # replace key = f"stripe:webhook:{event_id}" res = once.check_lock(key=key, ttl=3600) if res.duplicate: return {"status": "already_processed"} # Safe to execute the side-effect exactly once. handle_event(event_id)

See OnceOnly SDK

When NOT to use: Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.

Fix reference: https://github.com/celery/celery/pull/8058

First fixed release: 5.3.0b2

Last verified: 2026-02-09. Validate in your environment.

Get updates

We publish verified fixes weekly. No spam.

Subscribe

When NOT to Use This Fix

  • This fix should not be used if the backend is not thread-safe.
  • Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.

Verify Fix

verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.

Did This Fix Work in Your Case?

Quick signal helps us prioritize which fixes to verify and improve.

Prevention

  • Capture the exact failing error string in logs and tests so you can reproduce via a minimal script.
  • Pin production dependencies and upgrade only with a reproducible test that hits the failing path.

Version Compatibility Table

VersionStatus
5.3.0b2 Fixed

Related Issues

No related fixes found.

Sources

We don’t republish the full GitHub discussion text. Use the links above for context.