Jump to solution
Verify

The Fix

pip install celery==4.4.0rc5

Based on closed celery/celery issue #5483 · PR/commit linked

Jump to Verify Open PR/Commit
@@ -183,7 +183,7 @@ def close_database(self, **kwargs): for conn in self._db.connections.all(): try: - conn.close() + conn.close_if_unusable_or_obsolete() except self.interface_errors:
repro.py
def fix_django_db(**kwargs): # Calling db.close() on some DB connections will cause the inherited DB # conn to also get broken in the parent process so we need to remove it # without triggering any network IO that close() might cause. for c in django.db.connections.all(): if c and c.connection: try: os.close(c.connection.fileno()) except (AttributeError, OSError, TypeError, django.db.InterfaceError): pass try: c.close() except django.db.InterfaceError: pass except django.db.DatabaseError as exc: str_exc = str(exc) if 'closed' not in str_exc and 'not connected' not in str_exc: raise celery.signals.worker_process_init.connect(fix_django_db)
verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
fix.md
Option A — Upgrade to fixed release\npip install celery==4.4.0rc5\nWhen NOT to use: This fix is not suitable if the application relies on persistent DB connections across forks.\n\n

Why This Fix Works in Production

  • Trigger: The first step is necessary so the second step doesn't actually close the db connections of the parent process that they where inherited from.
  • Mechanism: The method close_if_unusable_or_obsolete does not always close Django DB connections after a fork
  • Why the fix works: Fixes the issue where Celery doesn't re-use DB connections with Django when `CONN_MAX_AGE` is set, leading to high database load. (first fixed release: 4.4.0rc5).
Production impact:
  • If left unfixed, this can cause silent data inconsistencies that propagate (bad cache entries, incorrect downstream decisions).

Why This Breaks in Prod

  • The method close_if_unusable_or_obsolete does not always close Django DB connections after a fork
  • Production symptom (often without a traceback): The first step is necessary so the second step doesn't actually close the db connections of the parent process that they where inherited from.

Proof / Evidence

  • GitHub issue: #5483
  • Fix PR: https://github.com/celery/celery/pull/4292
  • First fixed release: 4.4.0rc5
  • Reproduced locally: No (not executed)
  • Last verified: 2026-02-09
  • Confidence: 0.85
  • Did this fix it?: Yes (upstream fix exists)
  • Own content ratio: 0.50

Discussion

High-signal excerpts from the issue thread (symptoms, repros, edge-cases).

“@Jokairui you need to undo the changes of #4116. Until this is fixed in celery, you can either set CONN_MAX_AGE to 0 to avoid triggering…”
@Chronial · 2019-04-26 · confirmation · source
“so, what exactly should we deal with this 'mysql server has gone away' bug? @Chronial @auvipy”
@Jokairui · 2019-04-26 · source

Failure Signature (Search String)

  • The first step is necessary so the second step doesn't actually close the db connections of the parent process that they where inherited from.
  • Unfortunately, the second step was broken by #4292, with this change: https://github.com/celery/celery/pull/4292/files#diff-7da7bceb78d87096e818d34c80115d18L186.
Copy-friendly signature
signature.txt
Failure Signature ----------------- The first step is necessary so the second step doesn't actually close the db connections of the parent process that they where inherited from. Unfortunately, the second step was broken by #4292, with this change: https://github.com/celery/celery/pull/4292/files#diff-7da7bceb78d87096e818d34c80115d18L186.

Error Message

Signature-only (no traceback captured)
error.txt
Error Message ------------- The first step is necessary so the second step doesn't actually close the db connections of the parent process that they where inherited from. Unfortunately, the second step was broken by #4292, with this change: https://github.com/celery/celery/pull/4292/files#diff-7da7bceb78d87096e818d34c80115d18L186.

Minimal Reproduction

repro.py
def fix_django_db(**kwargs): # Calling db.close() on some DB connections will cause the inherited DB # conn to also get broken in the parent process so we need to remove it # without triggering any network IO that close() might cause. for c in django.db.connections.all(): if c and c.connection: try: os.close(c.connection.fileno()) except (AttributeError, OSError, TypeError, django.db.InterfaceError): pass try: c.close() except django.db.InterfaceError: pass except django.db.DatabaseError as exc: str_exc = str(exc) if 'closed' not in str_exc and 'not connected' not in str_exc: raise celery.signals.worker_process_init.connect(fix_django_db)

What Broke

Django DB connections remain in an invalid state, causing high database load and potential timeouts.

Why It Broke

The method close_if_unusable_or_obsolete does not always close Django DB connections after a fork

Fix Options (Details)

Option A — Upgrade to fixed release Safe default (recommended)

pip install celery==4.4.0rc5

When NOT to use: This fix is not suitable if the application relies on persistent DB connections across forks.

Use when you can deploy the upstream fix. It is usually lower-risk than long-lived workarounds.

Option D — Guard side-effects with OnceOnly Guardrail for side-effects

Mitigate duplicate external side-effects under retries/timeouts/agent loops by gating the operation before calling external systems.

  • Place OnceOnly between your code/agent and real side-effects (Stripe, emails, CRM, APIs).
  • Use a stable key per side-effect (e.g., customer_id + action + idempotency_key).
  • Fail-safe: configure fail-open vs fail-closed based on blast radius and spend risk.
  • This does NOT fix data corruption; it only prevents duplicate side-effects.
Show example snippet (optional)
onceonly.py
from onceonly import OnceOnly import os once = OnceOnly(api_key=os.environ["ONCEONLY_API_KEY"], fail_open=True) # Stable idempotency key per real side-effect. # Use a request id / job id / webhook delivery id / Stripe event id, etc. event_id = "evt_..." # replace key = f"stripe:webhook:{event_id}" res = once.check_lock(key=key, ttl=3600) if res.duplicate: return {"status": "already_processed"} # Safe to execute the side-effect exactly once. handle_event(event_id)

See OnceOnly SDK

When NOT to use: Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.

Fix reference: https://github.com/celery/celery/pull/4292

First fixed release: 4.4.0rc5

Last verified: 2026-02-09. Validate in your environment.

Get updates

We publish verified fixes weekly. No spam.

Subscribe

When NOT to Use This Fix

  • This fix is not suitable if the application relies on persistent DB connections across forks.
  • Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.

Verify Fix

verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.

Did This Fix Work in Your Case?

Quick signal helps us prioritize which fixes to verify and improve.

Prevention

  • Capture the exact failing error string in logs and tests so you can reproduce via a minimal script.
  • Pin production dependencies and upgrade only with a reproducible test that hits the failing path.

Version Compatibility Table

VersionStatus
4.4.0rc5 Fixed

Related Issues

No related fixes found.

Sources

We don’t republish the full GitHub discussion text. Use the links above for context.