Jump to solution
Verify

The Fix

pip install celery==4.4.0rc5

Based on closed celery/celery issue #5220 · PR/commit linked

Production note: Most teams hit this during upgrades or environment changes. Roll out with a canary and smoke critical endpoints (health, OpenAPI/docs) before 100%.

Jump to Verify Open PR/Commit
@@ -1203,11 +1203,14 @@ def freeze(self, _id=None, group_id=None, chord=None, header_result = self.tasks.freeze( parent_id=parent_id, root_id=root_id, chord=self.body) - bodyres = self.body.freeze(_id, root_id=root_id) + + body_result = self.body.freeze(
repro.py
from pprint import pprint from time import sleep @app.task def a(i): result = 'A %s' % i sleep((i%3)/ 10.0) pprint(result) return result @app.task def b(self,i): result = 'B %s' % i sleep((i%3)/ 10.0) pprint(result) return result @app.task def c(self,i): result = 'C %s' % i sleep((i%3)/ 10.0) pprint(result) return result
verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
fix.md
Option A — Upgrade to fixed release\npip install celery==4.4.0rc5\nWhen NOT to use: This fix is not applicable if the broker pool limit is set to 1.\n\n

Why This Fix Works in Production

  • Trigger: chord(scene_chains)(fuse.s()).get()
  • Mechanism: Using chords in chains with a specific configuration leads to a deadlock situation
  • Why the fix works: Fixes an issue where using chords in chords with a chain causes the last task to never run, leading to a hang. (first fixed release: 4.4.0rc5).
Production impact:
  • If left unfixed, the same config can fail only in production (env differences), causing startup failures or partial feature outages.

Why This Breaks in Prod

  • Shows up under Python 3.6.4 in real deployments (not just unit tests).
  • Using chords in chains with a specific configuration leads to a deadlock situation
  • Surfaces as: Traceback (most recent call last):

Proof / Evidence

  • GitHub issue: #5220
  • Fix PR: https://github.com/celery/celery/pull/5222
  • First fixed release: 4.4.0rc5
  • Reproduced locally: No (not executed)
  • Last verified: 2026-02-09
  • Confidence: 0.85
  • Did this fix it?: Yes (upstream fix exists)
  • Own content ratio: 0.36

Discussion

High-signal excerpts from the issue thread (symptoms, repros, edge-cases).

“Right, so I solved this by **removing** this from my Celery config: #broker_pool_limit = 1 <-- Remove this setting or groups of chains do not…”
@danielmeppiel · 2021-06-10 · source
“Hi, I have exactly the same issue on Celery 5.0.5 (same result in 5.1.0) using Redis as the result backend Trying to trigger a group…”
@danielmeppiel · 2021-06-10 · source

Failure Signature (Search String)

  • chord(scene_chains)(fuse.s()).get()

Error Message

Stack trace
error.txt
Error Message ------------- Traceback (most recent call last): File "pipeline/chords_test.py", line 17, in <module> chord(scene_chains)(fuse.s()).get() File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 1328, in __call__ return self.apply_async((), {'body': body} if body else {}, **options) File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 1381, in apply_async return self.run(tasks, body, args, task_id=task_id, **merged_options) File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 1449, in run header_result = header(*partial_args, task_id=group_id, **options) File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 1077, in __call__ return self.apply_async(partial_args, **options) File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 1103, in apply_async args=args, kwargs=kwargs, **options)) File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 1186, in _apply_tasks **options) File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 650, in apply_async dict(self.options, **options) if options else self.options)) File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 677, in run result_from_apply = first_task.apply_async(**options) File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 225, in apply_async return _a ... (truncated) ...

Minimal Reproduction

repro.py
from pprint import pprint from time import sleep @app.task def a(i): result = 'A %s' % i sleep((i%3)/ 10.0) pprint(result) return result @app.task def b(self,i): result = 'B %s' % i sleep((i%3)/ 10.0) pprint(result) return result @app.task def c(self,i): result = 'C %s' % i sleep((i%3)/ 10.0) pprint(result) return result

Environment

  • Python: 3.6.4

What Broke

The group of chains hangs indefinitely without executing any tasks.

Why It Broke

Using chords in chains with a specific configuration leads to a deadlock situation

Fix Options (Details)

Option A — Upgrade to fixed release Safe default (recommended)

pip install celery==4.4.0rc5

When NOT to use: This fix is not applicable if the broker pool limit is set to 1.

Use when you can deploy the upstream fix. It is usually lower-risk than long-lived workarounds.

Option D — Guard side-effects with OnceOnly Guardrail for side-effects

Mitigate duplicate external side-effects under retries/timeouts/agent loops by gating the operation before calling external systems.

  • Place OnceOnly between your code/agent and real side-effects (Stripe, emails, CRM, APIs).
  • Use a stable key per side-effect (e.g., customer_id + action + idempotency_key).
  • Fail-safe: configure fail-open vs fail-closed based on blast radius and spend risk.
Show example snippet (optional)
onceonly.py
from onceonly import OnceOnly import os once = OnceOnly(api_key=os.environ["ONCEONLY_API_KEY"], fail_open=True) # Stable idempotency key per real side-effect. # Use a request id / job id / webhook delivery id / Stripe event id, etc. event_id = "evt_..." # replace key = f"stripe:webhook:{event_id}" res = once.check_lock(key=key, ttl=3600) if res.duplicate: return {"status": "already_processed"} # Safe to execute the side-effect exactly once. handle_event(event_id)

See OnceOnly SDK

When NOT to use: Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.

Fix reference: https://github.com/celery/celery/pull/5222

First fixed release: 4.4.0rc5

Last verified: 2026-02-09. Validate in your environment.

Get updates

We publish verified fixes weekly. No spam.

Subscribe

When NOT to Use This Fix

  • This fix is not applicable if the broker pool limit is set to 1.
  • Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.

Verify Fix

verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.

Did This Fix Work in Your Case?

Quick signal helps us prioritize which fixes to verify and improve.

Prevention

  • Add a stress test that runs high-concurrency workloads and fails on thread dumps / blocked locks.
  • Enable watchdog dumps in prod (faulthandler, thread dump endpoint) to capture deadlocks quickly.

Version Compatibility Table

VersionStatus
4.4.0rc5 Fixed

Related Issues

No related fixes found.

Sources

We don’t republish the full GitHub discussion text. Use the links above for context.