Jump to solution
Verify

The Fix

pip install celery==4.4.0rc5

Based on closed celery/celery issue #5844 · PR/commit linked

Production note: This usually shows up under retries/timeouts. Treat it as a side-effect risk until you can verify behavior with a canary + real traffic.

Jump to Verify Open PR/Commit
@@ -510,6 +510,10 @@ def on_failure(self, exc_info, send_failed_event=True, return_ok=False): elif ack: self.acknowledge() + else: + # supporting the behaviour where a task failed and + # need to be removed from prefetched local queue
repro.py
from celery import Celery app.conf.worker_prefetch_multiplier = 1 @app.task(base=BaseAsyncJobTask, acks_late=True, acks_on_failure_or_timeout=False) def dead_letter_q_task(): return 1/0 dead_letter_q_task.delay() dead_letter_q_task.delay() dead_letter_q_task.delay()
verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
fix.md
Option A — Upgrade to fixed release\npip install celery==4.4.0rc5\nWhen NOT to use: Do not use this fix if your application relies on tasks being retried after failure.\n\n

Why This Fix Works in Production

  • Trigger: SQS backend will stop consuming tasks after failures
  • Mechanism: The SQS backend did not reject tasks on failure, causing the worker to stop consuming tasks
  • Why the fix works: The SQS backend was modified to reject tasks on failure, preventing the worker from stopping after a failure. (first fixed release: 4.4.0rc5).
Production impact:
  • If left unfixed, the same config can fail only in production (env differences), causing startup failures or partial feature outages.

Why This Breaks in Prod

  • The SQS backend did not reject tasks on failure, causing the worker to stop consuming tasks
  • Production symptom (often without a traceback): SQS backend will stop consuming tasks after failures

Proof / Evidence

  • GitHub issue: #5844
  • Fix PR: https://github.com/celery/celery/pull/5843
  • First fixed release: 4.4.0rc5
  • Reproduced locally: No (not executed)
  • Last verified: 2026-02-09
  • Confidence: 0.85
  • Did this fix it?: Yes (upstream fix exists)
  • Own content ratio: 0.60

Discussion

High-signal excerpts from the issue thread (symptoms, repros, edge-cases).

“can you send a PR with a test? did you try celery==4.4.0rc4?”
@auvipy · 2019-11-21 · source
“Tried with celery==4.4.0rc4, reproduced test case: https://github.com/galCohen88/kombu/pull/1/files”
@galCohen88 · 2019-11-21 · source
“Figured it out PR: https://github.com/celery/celery/pull/5843”
@galCohen88 · 2019-11-22 · source
“I would need a little guidance, as I'm not sure how I can access QoS attributes (https://github.com/celery/kombu/blob/master/kombu/transport/virtual/base.py#L182) from Task context”
@galCohen88 · 2019-11-21 · source

Failure Signature (Search String)

  • SQS backend will stop consuming tasks after failures
  • - [X] I have included all related issues and possible duplicate issues
Copy-friendly signature
signature.txt
Failure Signature ----------------- SQS backend will stop consuming tasks after failures - [X] I have included all related issues and possible duplicate issues

Error Message

Signature-only (no traceback captured)
error.txt
Error Message ------------- SQS backend will stop consuming tasks after failures - [X] I have included all related issues and possible duplicate issues

Minimal Reproduction

repro.py
from celery import Celery app.conf.worker_prefetch_multiplier = 1 @app.task(base=BaseAsyncJobTask, acks_late=True, acks_on_failure_or_timeout=False) def dead_letter_q_task(): return 1/0 dead_letter_q_task.delay() dead_letter_q_task.delay() dead_letter_q_task.delay()

What Broke

Workers stopped consuming tasks after a failure, leading to task backlog.

Why It Broke

The SQS backend did not reject tasks on failure, causing the worker to stop consuming tasks

Fix Options (Details)

Option A — Upgrade to fixed release Safe default (recommended)

pip install celery==4.4.0rc5

When NOT to use: Do not use this fix if your application relies on tasks being retried after failure.

Use when you can deploy the upstream fix. It is usually lower-risk than long-lived workarounds.

Fix reference: https://github.com/celery/celery/pull/5843

First fixed release: 4.4.0rc5

Last verified: 2026-02-09. Validate in your environment.

Get updates

We publish verified fixes weekly. No spam.

Subscribe

When NOT to Use This Fix

  • Do not use this fix if your application relies on tasks being retried after failure.

Verify Fix

verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.

Did This Fix Work in Your Case?

Quick signal helps us prioritize which fixes to verify and improve.

Prevention

  • Make timeouts explicit and test them (unit + integration) to avoid silent behavior changes.
  • Instrument retries (attempt count + reason) and alert on spikes to catch dependency slowdowns.

Version Compatibility Table

VersionStatus
4.4.0rc5 Fixed

Related Issues

No related fixes found.

Sources

We don’t republish the full GitHub discussion text. Use the links above for context.