Jump to solution
Verify

The Fix

pip install celery==5.1.0rc1

Based on closed celery/celery issue #6220 · PR/commit linked

Production note: This usually shows up under retries/timeouts. Treat it as a side-effect risk until you can verify behavior with a canary + real traffic.

Jump to Verify Open PR/Commit
@@ -22,6 +22,7 @@ from celery import current_app, group, maybe_signature, states from celery._state import get_current_task +from celery.app.task import Context from celery.exceptions import (BackendGetMetaError, BackendStoreError, ChordError, ImproperlyConfigured,
repro.py
from celery import Celery app = Celery('task_fail', backend='redis://localhost:6379', broker='redis://localhost:6379') @app.task def simple_pass(): print("Passing") pass @app.task def simple_fail(): print("Failing!") raise Exception('Fail!') @app.task def record_failure(*args, **kwargs): print(f'Fail with args={args}, kwargs={kwargs}') if __name__ == "__main__": from celery import chord # Test 1: Passes - When the chain fails at the end, the failure logic on # the chord is executed. try: chain = simple_pass.si() | simple_fail.si() ~chord([simple_pass.si(), chain], simple_pass.si().on_error(record_failure.s(test="test_1"))) except Exception: print("Test 1 worked") # Test 2: Fails - When the chain has an intermediate exception, the failure # logic on the chord is not executed & the client simply hangs. try: chain = simple_pass.si() | simple_fail.si() | simple_pass.si() ~chord([simple_pass.si(), chain], simple_pass.si().on_error(record_failure.s(test="test_2"))) except Exception: print("Test 2 worked")
verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
fix.md
Option A — Upgrade to fixed release\npip install celery==5.1.0rc1\nWhen NOT to use: Do not use this fix if your application relies on the previous error handling behavior.\n\n

Why This Fix Works in Production

  • Trigger: [2020-07-10 15:28:52,940: INFO/MainProcess] Received task: task_fail.simple_pass[65241973-d598-4bbe-a0c8-1f86694c9f33]
  • Mechanism: Error propagation and errback calling from chord and group children was not handled correctly
  • Why the fix works: Fix propagation of errors and errback calling from chord and group children, addressing issue #6220. (first fixed release: 5.1.0rc1).
Production impact:
  • If left unfixed, this can cause silent data inconsistencies that propagate (bad cache entries, incorrect downstream decisions).

Why This Breaks in Prod

  • Shows up under Python 3.6 in real deployments (not just unit tests).
  • Error propagation and errback calling from chord and group children was not handled correctly
  • Surfaces as: [2020-07-10 15:28:52,940: INFO/MainProcess] Received task: task_fail.simple_pass[65241973-d598-4bbe-a0c8-1f86694c9f33]

Proof / Evidence

  • GitHub issue: #6220
  • Fix PR: https://github.com/celery/celery/pull/6746
  • First fixed release: 5.1.0rc1
  • Reproduced locally: No (not executed)
  • Last verified: 2026-02-09
  • Confidence: 0.75
  • Did this fix it?: Yes (upstream fix exists)
  • Own content ratio: 0.19

Discussion

High-signal excerpts from the issue thread (symptoms, repros, edge-cases).

“Test case added in this PR: https://github.com/celery/celery/pull/6226”
@msci-jordancrawford · 2020-07-13 · source
“Just want to give a shout out and thank you to everyone who worked on fixing this! I know how hard it is to find…”
@msci-jordancrawford · 2021-05-10 · source
“This is still occurring - experiencing this issue right now on Python 3.11.2 with Celery 5.3”
@mwodonnell · 2023-06-17 · source
“> This is still occurring - experiencing this issue right now on Python 3.11.2 with Celery 5.3 help us to reproduce that please”
@auvipy · 2023-06-18 · source

Failure Signature (Search String)

  • [2020-07-10 15:28:52,940: INFO/MainProcess] Received task: task_fail.simple_pass[65241973-d598-4bbe-a0c8-1f86694c9f33]

Error Message

Stack trace
error.txt
Error Message ------------- [2020-07-10 15:28:52,940: INFO/MainProcess] Received task: task_fail.simple_pass[65241973-d598-4bbe-a0c8-1f86694c9f33] [2020-07-10 15:28:52,945: WARNING/ForkPoolWorker-8] Passing [2020-07-10 15:28:52,954: INFO/ForkPoolWorker-8] Task task_fail.simple_pass[65241973-d598-4bbe-a0c8-1f86694c9f33] succeeded in 0.008173699956387281s: None [2020-07-10 15:28:52,967: INFO/MainProcess] Received task: task_fail.simple_pass[fc15c163-0616-4b87-94c3-2c1b225f2e0e] [2020-07-10 15:28:52,972: WARNING/ForkPoolWorker-8] Passing [2020-07-10 15:28:52,985: INFO/ForkPoolWorker-8] Task task_fail.simple_pass[fc15c163-0616-4b87-94c3-2c1b225f2e0e] succeeded in 0.013192899990826845s: None [2020-07-10 15:28:52,985: INFO/MainProcess] Received task: task_fail.simple_fail[0b0a3925-7ea8-4423-8e87-3a9f00d5c967] [2020-07-10 15:28:52,992: WARNING/ForkPoolWorker-8] Failing! [2020-07-10 15:28:53,010: ERROR/ForkPoolWorker-8] Chord callback for '76840678-80a3-4915-96b8-04c135ecb237' raised: ChordError("Dependency 0b0a3925-7ea8-4423-8e87-3a9f00d5c967 raised Exception('Fail!',)",) Traceback (most recent call last): File "~/celery/celery.venv/lib/python3.6/site-packages/celery/app/trace.py", line 412, in trace_task R = retval = fun(*args, **kwargs) File "~/celery/celery.venv/lib/python3.6/site-packages/celery/app/trace.py", line 704, in __protected_call__ return self.run(*args, **kwargs) File "~/celery/task_ ... (truncated) ...
Stack trace
error.txt
Error Message ------------- [2020-07-10 15:28:55,039: INFO/MainProcess] Received task: task_fail.simple_pass[5779ce6a-64af-40ea-9b26-8ff3bdad348d] [2020-07-10 15:28:55,050: INFO/MainProcess] Received task: task_fail.simple_pass[39469dfa-401c-45f0-9314-5849f6e7ca5f] [2020-07-10 15:28:55,051: WARNING/ForkPoolWorker-8] Passing [2020-07-10 15:28:55,051: WARNING/ForkPoolWorker-1] Passing [2020-07-10 15:28:55,058: INFO/ForkPoolWorker-8] Task task_fail.simple_pass[5779ce6a-64af-40ea-9b26-8ff3bdad348d] succeeded in 0.007474799989722669s: None [2020-07-10 15:28:55,062: INFO/ForkPoolWorker-1] Task task_fail.simple_pass[39469dfa-401c-45f0-9314-5849f6e7ca5f] succeeded in 0.010612400015816092s: None [2020-07-10 15:28:55,069: INFO/MainProcess] Received task: task_fail.simple_fail[2fead196-cfe2-4ce5-896f-303c2f9970e2] [2020-07-10 15:28:55,073: WARNING/ForkPoolWorker-8] Failing! [2020-07-10 15:28:55,081: ERROR/ForkPoolWorker-8] Task task_fail.simple_fail[2fead196-cfe2-4ce5-896f-303c2f9970e2] raised unexpected: Exception('Fail!',) Traceback (most recent call last): File "/home/crawjor/test/celery/celery.venv/lib/python3.6/site-packages/celery/app/trace.py", line 412, in trace_task R = retval = fun(*args, **kwargs) File "/home/crawjor/test/celery/celery.venv/lib/python3.6/site-packages/celery/app/trace.py", line 704, in __protected_call__ return self.run(*args, **kwargs) File "/home/crawjor/test/celery/task_f ... (truncated) ...
Stack trace
error.txt
Error Message ------------- File "task_fail.py", line 40, in <module> simple_pass.si().on_error(record_failure.s(test="test_2"))) File "~/celery/celery.venv/lib/python3.6/site-packages/celery/canvas.py", line 475, in __invert__ return self.apply_async().get() File "~/celery/celery.venv/lib/python3.6/site-packages/celery/result.py", line 237, in get on_message=on_message, File "~/celery/celery.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 200, in wait_for_pending for _ in self._wait_for_pending(result, **kwargs): File "~/celery/celery.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 268, in _wait_for_pending on_interval=on_interval): File "~/celery/celery.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 55, in drain_events_until yield self.wait_for(p, wait, timeout=interval) File "~/celery/celery.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 64, in wait_for wait(timeout=timeout) File "~/celery/celery.venv/lib/python3.6/site-packages/celery/backends/redis.py", line 161, in drain_events message = self._pubsub.get_message(timeout=timeout) File "~/celery/celery.venv/lib/python3.6/site-packages/redis/client.py", line 3617, in get_message response = self.parse_response(block=False, timeout=timeout) File "~/celery/celery.venv/lib/python3.6/site-packages/redis/clie ... (truncated) ...

Minimal Reproduction

repro.py
from celery import Celery app = Celery('task_fail', backend='redis://localhost:6379', broker='redis://localhost:6379') @app.task def simple_pass(): print("Passing") pass @app.task def simple_fail(): print("Failing!") raise Exception('Fail!') @app.task def record_failure(*args, **kwargs): print(f'Fail with args={args}, kwargs={kwargs}') if __name__ == "__main__": from celery import chord # Test 1: Passes - When the chain fails at the end, the failure logic on # the chord is executed. try: chain = simple_pass.si() | simple_fail.si() ~chord([simple_pass.si(), chain], simple_pass.si().on_error(record_failure.s(test="test_1"))) except Exception: print("Test 1 worked") # Test 2: Fails - When the chain has an intermediate exception, the failure # logic on the chord is not executed & the client simply hangs. try: chain = simple_pass.si() | simple_fail.si() | simple_pass.si() ~chord([simple_pass.si(), chain], simple_pass.si().on_error(record_failure.s(test="test_2"))) except Exception: print("Test 2 worked")

Environment

  • Python: 3.6

What Broke

Chords did not trigger error handling when inner tasks failed, leading to unhandled exceptions.

Why It Broke

Error propagation and errback calling from chord and group children was not handled correctly

Fix Options (Details)

Option A — Upgrade to fixed release Safe default (recommended)

pip install celery==5.1.0rc1

When NOT to use: Do not use this fix if your application relies on the previous error handling behavior.

Use when you can deploy the upstream fix. It is usually lower-risk than long-lived workarounds.

Option D — Guard side-effects with OnceOnly Guardrail for side-effects

Mitigate duplicate external side-effects under retries/timeouts/agent loops by gating the operation before calling external systems.

  • Place OnceOnly between your code/agent and real side-effects (Stripe, emails, CRM, APIs).
  • Use a stable key per side-effect (e.g., customer_id + action + idempotency_key).
  • Fail-safe: configure fail-open vs fail-closed based on blast radius and spend risk.
  • This does NOT fix data corruption; it only prevents duplicate side-effects.
Show example snippet (optional)
onceonly.py
from onceonly import OnceOnly import os once = OnceOnly(api_key=os.environ["ONCEONLY_API_KEY"], fail_open=True) # Stable idempotency key per real side-effect. # Use a request id / job id / webhook delivery id / Stripe event id, etc. event_id = "evt_..." # replace key = f"stripe:webhook:{event_id}" res = once.check_lock(key=key, ttl=3600) if res.duplicate: return {"status": "already_processed"} # Safe to execute the side-effect exactly once. handle_event(event_id)

See OnceOnly SDK

When NOT to use: Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.

Fix reference: https://github.com/celery/celery/pull/6746

First fixed release: 5.1.0rc1

Last verified: 2026-02-09. Validate in your environment.

Get updates

We publish verified fixes weekly. No spam.

Subscribe

When NOT to Use This Fix

  • Do not use this fix if your application relies on the previous error handling behavior.
  • Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.

Verify Fix

verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.

Did This Fix Work in Your Case?

Quick signal helps us prioritize which fixes to verify and improve.

Prevention

  • Make timeouts explicit and test them (unit + integration) to avoid silent behavior changes.
  • Instrument retries (attempt count + reason) and alert on spikes to catch dependency slowdowns.

Version Compatibility Table

VersionStatus
5.1.0rc1 Fixed

Related Issues

No related fixes found.

Sources

We don’t republish the full GitHub discussion text. Use the links above for context.