The Fix
pip install celery==4.4.0rc5
Based on closed celery/celery issue #5220 · PR/commit linked
Production note: Most teams hit this during upgrades or environment changes. Roll out with a canary and smoke critical endpoints (health, OpenAPI/docs) before 100%.
@@ -1203,11 +1203,14 @@ def freeze(self, _id=None, group_id=None, chord=None,
header_result = self.tasks.freeze(
parent_id=parent_id, root_id=root_id, chord=self.body)
- bodyres = self.body.freeze(_id, root_id=root_id)
+
+ body_result = self.body.freeze(
from pprint import pprint
from time import sleep
@app.task
def a(i):
result = 'A %s' % i
sleep((i%3)/ 10.0)
pprint(result)
return result
@app.task
def b(self,i):
result = 'B %s' % i
sleep((i%3)/ 10.0)
pprint(result)
return result
@app.task
def c(self,i):
result = 'C %s' % i
sleep((i%3)/ 10.0)
pprint(result)
return result
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Option A — Upgrade to fixed release\npip install celery==4.4.0rc5\nWhen NOT to use: This fix is not applicable if the broker pool limit is set to 1.\n\n
Why This Fix Works in Production
- Trigger: chord(scene_chains)(fuse.s()).get()
- Mechanism: Using chords in chains with a specific configuration leads to a deadlock situation
- Why the fix works: Fixes an issue where using chords in chords with a chain causes the last task to never run, leading to a hang. (first fixed release: 4.4.0rc5).
- If left unfixed, the same config can fail only in production (env differences), causing startup failures or partial feature outages.
Why This Breaks in Prod
- Shows up under Python 3.6.4 in real deployments (not just unit tests).
- Using chords in chains with a specific configuration leads to a deadlock situation
- Surfaces as: Traceback (most recent call last):
Proof / Evidence
- GitHub issue: #5220
- Fix PR: https://github.com/celery/celery/pull/5222
- First fixed release: 4.4.0rc5
- Reproduced locally: No (not executed)
- Last verified: 2026-02-09
- Confidence: 0.85
- Did this fix it?: Yes (upstream fix exists)
- Own content ratio: 0.36
Discussion
High-signal excerpts from the issue thread (symptoms, repros, edge-cases).
“Right, so I solved this by **removing** this from my Celery config: #broker_pool_limit = 1 <-- Remove this setting or groups of chains do not…”
“Hi, I have exactly the same issue on Celery 5.0.5 (same result in 5.1.0) using Redis as the result backend Trying to trigger a group…”
Failure Signature (Search String)
- chord(scene_chains)(fuse.s()).get()
Error Message
Stack trace
Error Message
-------------
Traceback (most recent call last):
File "pipeline/chords_test.py", line 17, in <module>
chord(scene_chains)(fuse.s()).get()
File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 1328, in __call__
return self.apply_async((), {'body': body} if body else {}, **options)
File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 1381, in apply_async
return self.run(tasks, body, args, task_id=task_id, **merged_options)
File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 1449, in run
header_result = header(*partial_args, task_id=group_id, **options)
File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 1077, in __call__
return self.apply_async(partial_args, **options)
File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 1103, in apply_async
args=args, kwargs=kwargs, **options))
File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 1186, in _apply_tasks
**options)
File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 650, in apply_async
dict(self.options, **options) if options else self.options))
File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 677, in run
result_from_apply = first_task.apply_async(**options)
File "/usr/local/lib/python3.6/site-packages/celery/canvas.py", line 225, in apply_async
return _a
... (truncated) ...
Minimal Reproduction
from pprint import pprint
from time import sleep
@app.task
def a(i):
result = 'A %s' % i
sleep((i%3)/ 10.0)
pprint(result)
return result
@app.task
def b(self,i):
result = 'B %s' % i
sleep((i%3)/ 10.0)
pprint(result)
return result
@app.task
def c(self,i):
result = 'C %s' % i
sleep((i%3)/ 10.0)
pprint(result)
return result
Environment
- Python: 3.6.4
What Broke
The group of chains hangs indefinitely without executing any tasks.
Why It Broke
Using chords in chains with a specific configuration leads to a deadlock situation
Fix Options (Details)
Option A — Upgrade to fixed release Safe default (recommended)
pip install celery==4.4.0rc5
Use when you can deploy the upstream fix. It is usually lower-risk than long-lived workarounds.
Option D — Guard side-effects with OnceOnly Guardrail for side-effects
Mitigate duplicate external side-effects under retries/timeouts/agent loops by gating the operation before calling external systems.
- Place OnceOnly between your code/agent and real side-effects (Stripe, emails, CRM, APIs).
- Use a stable key per side-effect (e.g., customer_id + action + idempotency_key).
- Fail-safe: configure fail-open vs fail-closed based on blast radius and spend risk.
Show example snippet (optional)
from onceonly import OnceOnly
import os
once = OnceOnly(api_key=os.environ["ONCEONLY_API_KEY"], fail_open=True)
# Stable idempotency key per real side-effect.
# Use a request id / job id / webhook delivery id / Stripe event id, etc.
event_id = "evt_..." # replace
key = f"stripe:webhook:{event_id}"
res = once.check_lock(key=key, ttl=3600)
if res.duplicate:
return {"status": "already_processed"}
# Safe to execute the side-effect exactly once.
handle_event(event_id)
Fix reference: https://github.com/celery/celery/pull/5222
First fixed release: 4.4.0rc5
Last verified: 2026-02-09. Validate in your environment.
When NOT to Use This Fix
- This fix is not applicable if the broker pool limit is set to 1.
- Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.
Verify Fix
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Did This Fix Work in Your Case?
Quick signal helps us prioritize which fixes to verify and improve.
Prevention
- Add a stress test that runs high-concurrency workloads and fails on thread dumps / blocked locks.
- Enable watchdog dumps in prod (faulthandler, thread dump endpoint) to capture deadlocks quickly.
Version Compatibility Table
| Version | Status |
|---|---|
| 4.4.0rc5 | Fixed |
Related Issues
No related fixes found.
Sources
We don’t republish the full GitHub discussion text. Use the links above for context.