The Fix
pip install celery==5.1.0rc1
Based on closed celery/celery issue #6220 · PR/commit linked
Production note: This usually shows up under retries/timeouts. Treat it as a side-effect risk until you can verify behavior with a canary + real traffic.
@@ -22,6 +22,7 @@
from celery import current_app, group, maybe_signature, states
from celery._state import get_current_task
+from celery.app.task import Context
from celery.exceptions import (BackendGetMetaError, BackendStoreError,
ChordError, ImproperlyConfigured,
from celery import Celery
app = Celery('task_fail', backend='redis://localhost:6379', broker='redis://localhost:6379')
@app.task
def simple_pass():
print("Passing")
pass
@app.task
def simple_fail():
print("Failing!")
raise Exception('Fail!')
@app.task
def record_failure(*args, **kwargs):
print(f'Fail with args={args}, kwargs={kwargs}')
if __name__ == "__main__":
from celery import chord
# Test 1: Passes - When the chain fails at the end, the failure logic on
# the chord is executed.
try:
chain = simple_pass.si() | simple_fail.si()
~chord([simple_pass.si(), chain],
simple_pass.si().on_error(record_failure.s(test="test_1")))
except Exception:
print("Test 1 worked")
# Test 2: Fails - When the chain has an intermediate exception, the failure
# logic on the chord is not executed & the client simply hangs.
try:
chain = simple_pass.si() | simple_fail.si() | simple_pass.si()
~chord([simple_pass.si(), chain],
simple_pass.si().on_error(record_failure.s(test="test_2")))
except Exception:
print("Test 2 worked")
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Option A — Upgrade to fixed release\npip install celery==5.1.0rc1\nWhen NOT to use: Do not use this fix if your application relies on the previous error handling behavior.\n\n
Why This Fix Works in Production
- Trigger: [2020-07-10 15:28:52,940: INFO/MainProcess] Received task: task_fail.simple_pass[65241973-d598-4bbe-a0c8-1f86694c9f33]
- Mechanism: Error propagation and errback calling from chord and group children was not handled correctly
- Why the fix works: Fix propagation of errors and errback calling from chord and group children, addressing issue #6220. (first fixed release: 5.1.0rc1).
- If left unfixed, this can cause silent data inconsistencies that propagate (bad cache entries, incorrect downstream decisions).
Why This Breaks in Prod
- Shows up under Python 3.6 in real deployments (not just unit tests).
- Error propagation and errback calling from chord and group children was not handled correctly
- Surfaces as: [2020-07-10 15:28:52,940: INFO/MainProcess] Received task: task_fail.simple_pass[65241973-d598-4bbe-a0c8-1f86694c9f33]
Proof / Evidence
- GitHub issue: #6220
- Fix PR: https://github.com/celery/celery/pull/6746
- First fixed release: 5.1.0rc1
- Reproduced locally: No (not executed)
- Last verified: 2026-02-09
- Confidence: 0.75
- Did this fix it?: Yes (upstream fix exists)
- Own content ratio: 0.19
Discussion
High-signal excerpts from the issue thread (symptoms, repros, edge-cases).
“Test case added in this PR: https://github.com/celery/celery/pull/6226”
“Just want to give a shout out and thank you to everyone who worked on fixing this! I know how hard it is to find…”
“This is still occurring - experiencing this issue right now on Python 3.11.2 with Celery 5.3”
“> This is still occurring - experiencing this issue right now on Python 3.11.2 with Celery 5.3 help us to reproduce that please”
Failure Signature (Search String)
- [2020-07-10 15:28:52,940: INFO/MainProcess] Received task: task_fail.simple_pass[65241973-d598-4bbe-a0c8-1f86694c9f33]
Error Message
Stack trace
Error Message
-------------
[2020-07-10 15:28:52,940: INFO/MainProcess] Received task: task_fail.simple_pass[65241973-d598-4bbe-a0c8-1f86694c9f33]
[2020-07-10 15:28:52,945: WARNING/ForkPoolWorker-8] Passing
[2020-07-10 15:28:52,954: INFO/ForkPoolWorker-8] Task task_fail.simple_pass[65241973-d598-4bbe-a0c8-1f86694c9f33] succeeded in 0.008173699956387281s: None
[2020-07-10 15:28:52,967: INFO/MainProcess] Received task: task_fail.simple_pass[fc15c163-0616-4b87-94c3-2c1b225f2e0e]
[2020-07-10 15:28:52,972: WARNING/ForkPoolWorker-8] Passing
[2020-07-10 15:28:52,985: INFO/ForkPoolWorker-8] Task task_fail.simple_pass[fc15c163-0616-4b87-94c3-2c1b225f2e0e] succeeded in 0.013192899990826845s: None
[2020-07-10 15:28:52,985: INFO/MainProcess] Received task: task_fail.simple_fail[0b0a3925-7ea8-4423-8e87-3a9f00d5c967]
[2020-07-10 15:28:52,992: WARNING/ForkPoolWorker-8] Failing!
[2020-07-10 15:28:53,010: ERROR/ForkPoolWorker-8] Chord callback for '76840678-80a3-4915-96b8-04c135ecb237' raised: ChordError("Dependency 0b0a3925-7ea8-4423-8e87-3a9f00d5c967 raised Exception('Fail!',)",)
Traceback (most recent call last):
File "~/celery/celery.venv/lib/python3.6/site-packages/celery/app/trace.py", line 412, in trace_task
R = retval = fun(*args, **kwargs)
File "~/celery/celery.venv/lib/python3.6/site-packages/celery/app/trace.py", line 704, in __protected_call__
return self.run(*args, **kwargs)
File "~/celery/task_
... (truncated) ...
Stack trace
Error Message
-------------
[2020-07-10 15:28:55,039: INFO/MainProcess] Received task: task_fail.simple_pass[5779ce6a-64af-40ea-9b26-8ff3bdad348d]
[2020-07-10 15:28:55,050: INFO/MainProcess] Received task: task_fail.simple_pass[39469dfa-401c-45f0-9314-5849f6e7ca5f]
[2020-07-10 15:28:55,051: WARNING/ForkPoolWorker-8] Passing
[2020-07-10 15:28:55,051: WARNING/ForkPoolWorker-1] Passing
[2020-07-10 15:28:55,058: INFO/ForkPoolWorker-8] Task task_fail.simple_pass[5779ce6a-64af-40ea-9b26-8ff3bdad348d] succeeded in 0.007474799989722669s: None
[2020-07-10 15:28:55,062: INFO/ForkPoolWorker-1] Task task_fail.simple_pass[39469dfa-401c-45f0-9314-5849f6e7ca5f] succeeded in 0.010612400015816092s: None
[2020-07-10 15:28:55,069: INFO/MainProcess] Received task: task_fail.simple_fail[2fead196-cfe2-4ce5-896f-303c2f9970e2]
[2020-07-10 15:28:55,073: WARNING/ForkPoolWorker-8] Failing!
[2020-07-10 15:28:55,081: ERROR/ForkPoolWorker-8] Task task_fail.simple_fail[2fead196-cfe2-4ce5-896f-303c2f9970e2] raised unexpected: Exception('Fail!',)
Traceback (most recent call last):
File "/home/crawjor/test/celery/celery.venv/lib/python3.6/site-packages/celery/app/trace.py", line 412, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/crawjor/test/celery/celery.venv/lib/python3.6/site-packages/celery/app/trace.py", line 704, in __protected_call__
return self.run(*args, **kwargs)
File "/home/crawjor/test/celery/task_f
... (truncated) ...
Stack trace
Error Message
-------------
File "task_fail.py", line 40, in <module>
simple_pass.si().on_error(record_failure.s(test="test_2")))
File "~/celery/celery.venv/lib/python3.6/site-packages/celery/canvas.py", line 475, in __invert__
return self.apply_async().get()
File "~/celery/celery.venv/lib/python3.6/site-packages/celery/result.py", line 237, in get
on_message=on_message,
File "~/celery/celery.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 200, in wait_for_pending
for _ in self._wait_for_pending(result, **kwargs):
File "~/celery/celery.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 268, in _wait_for_pending
on_interval=on_interval):
File "~/celery/celery.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 55, in drain_events_until
yield self.wait_for(p, wait, timeout=interval)
File "~/celery/celery.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 64, in wait_for
wait(timeout=timeout)
File "~/celery/celery.venv/lib/python3.6/site-packages/celery/backends/redis.py", line 161, in drain_events
message = self._pubsub.get_message(timeout=timeout)
File "~/celery/celery.venv/lib/python3.6/site-packages/redis/client.py", line 3617, in get_message
response = self.parse_response(block=False, timeout=timeout)
File "~/celery/celery.venv/lib/python3.6/site-packages/redis/clie
... (truncated) ...
Minimal Reproduction
from celery import Celery
app = Celery('task_fail', backend='redis://localhost:6379', broker='redis://localhost:6379')
@app.task
def simple_pass():
print("Passing")
pass
@app.task
def simple_fail():
print("Failing!")
raise Exception('Fail!')
@app.task
def record_failure(*args, **kwargs):
print(f'Fail with args={args}, kwargs={kwargs}')
if __name__ == "__main__":
from celery import chord
# Test 1: Passes - When the chain fails at the end, the failure logic on
# the chord is executed.
try:
chain = simple_pass.si() | simple_fail.si()
~chord([simple_pass.si(), chain],
simple_pass.si().on_error(record_failure.s(test="test_1")))
except Exception:
print("Test 1 worked")
# Test 2: Fails - When the chain has an intermediate exception, the failure
# logic on the chord is not executed & the client simply hangs.
try:
chain = simple_pass.si() | simple_fail.si() | simple_pass.si()
~chord([simple_pass.si(), chain],
simple_pass.si().on_error(record_failure.s(test="test_2")))
except Exception:
print("Test 2 worked")
Environment
- Python: 3.6
What Broke
Chords did not trigger error handling when inner tasks failed, leading to unhandled exceptions.
Why It Broke
Error propagation and errback calling from chord and group children was not handled correctly
Fix Options (Details)
Option A — Upgrade to fixed release Safe default (recommended)
pip install celery==5.1.0rc1
Use when you can deploy the upstream fix. It is usually lower-risk than long-lived workarounds.
Option D — Guard side-effects with OnceOnly Guardrail for side-effects
Mitigate duplicate external side-effects under retries/timeouts/agent loops by gating the operation before calling external systems.
- Place OnceOnly between your code/agent and real side-effects (Stripe, emails, CRM, APIs).
- Use a stable key per side-effect (e.g., customer_id + action + idempotency_key).
- Fail-safe: configure fail-open vs fail-closed based on blast radius and spend risk.
- This does NOT fix data corruption; it only prevents duplicate side-effects.
Show example snippet (optional)
from onceonly import OnceOnly
import os
once = OnceOnly(api_key=os.environ["ONCEONLY_API_KEY"], fail_open=True)
# Stable idempotency key per real side-effect.
# Use a request id / job id / webhook delivery id / Stripe event id, etc.
event_id = "evt_..." # replace
key = f"stripe:webhook:{event_id}"
res = once.check_lock(key=key, ttl=3600)
if res.duplicate:
return {"status": "already_processed"}
# Safe to execute the side-effect exactly once.
handle_event(event_id)
Fix reference: https://github.com/celery/celery/pull/6746
First fixed release: 5.1.0rc1
Last verified: 2026-02-09. Validate in your environment.
When NOT to Use This Fix
- Do not use this fix if your application relies on the previous error handling behavior.
- Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.
Verify Fix
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Did This Fix Work in Your Case?
Quick signal helps us prioritize which fixes to verify and improve.
Prevention
- Make timeouts explicit and test them (unit + integration) to avoid silent behavior changes.
- Instrument retries (attempt count + reason) and alert on spikes to catch dependency slowdowns.
Version Compatibility Table
| Version | Status |
|---|---|
| 5.1.0rc1 | Fixed |
Related Issues
No related fixes found.
Sources
We don’t republish the full GitHub discussion text. Use the links above for context.