Jump to solution
Details

The Fix

pip install celery==4.4.1

Based on closed celery/celery issue #5936 · PR/commit linked

Production note: This usually shows up under retries/timeouts. Treat it as a side-effect risk until you can verify behavior with a canary + real traffic.

Open PR/Commit
@@ -185,6 +185,8 @@ def __init__(self, host=None, port=None, db=None, password=None, socket_timeout = _get('redis_socket_timeout') socket_connect_timeout = _get('redis_socket_connect_timeout') + retry_on_timeout = _get('redis_retry_on_timeout') + socket_keepalive = _get('redis_socket_keepalive')
fix.md
Option A — Upgrade to fixed release\npip install celery==4.4.1\nWhen NOT to use: This fix is not applicable if the Redis backend does not require enhanced connection reliability.\n\n

Why This Fix Works in Production

  • Trigger: - [x] I have included all related issues and possible duplicate issues
  • Mechanism: The Redis backend configuration lacked parameters for socket_keepalive and retry_on_timeout
  • Why the fix works: Added parameters `retry_on_timeout` and `socket_keepalive` to the Redis connection configuration to enhance connection reliability. (first fixed release: 4.4.1).
Production impact:
  • If left unfixed, the same config can fail only in production (env differences), causing startup failures or partial feature outages.

Why This Breaks in Prod

  • The Redis backend configuration lacked parameters for socket_keepalive and retry_on_timeout
  • Production symptom (often without a traceback): - [x] I have included all related issues and possible duplicate issues

Proof / Evidence

  • GitHub issue: #5936
  • Fix PR: https://github.com/celery/celery/pull/5952
  • First fixed release: 4.4.1
  • Reproduced locally: No (not executed)
  • Last verified: 2026-02-09
  • Confidence: 0.85
  • Did this fix it?: Yes (upstream fix exists)
  • Own content ratio: 0.69

Discussion

High-signal excerpts from the issue thread (symptoms, repros, edge-cases).

“There are all sorts of Redis connection errors that could cause problems with Celery that we deal on a daily basis... Most typical ones are…”
@dejlek · 2020-01-28 · source
“Yep, it will definitely help in this case.”
@dejlek · 2020-01-30 · source
“> There are all sorts of Redis connection errors that could cause problems with Celery that we deal on a daily basis..”
@Seleznev-nvkz · 2020-01-29 · source
“> What about using result_backend_trasport_options? Thank you for your feedback”
@Seleznev-nvkz · 2020-02-05 · source

Failure Signature (Search String)

  • - [x] I have included all related issues and possible duplicate issues
  • or possible duplicates to this issue as requested by the checklist above.
Copy-friendly signature
signature.txt
Failure Signature ----------------- - [x] I have included all related issues and possible duplicate issues or possible duplicates to this issue as requested by the checklist above.

Error Message

Signature-only (no traceback captured)
error.txt
Error Message ------------- - [x] I have included all related issues and possible duplicate issues or possible duplicates to this issue as requested by the checklist above.

What Broke

TimeoutError occurred during subscription events, impacting task execution reliability.

Why It Broke

The Redis backend configuration lacked parameters for socket_keepalive and retry_on_timeout

Fix Options (Details)

Option A — Upgrade to fixed release Safe default (recommended)

pip install celery==4.4.1

When NOT to use: This fix is not applicable if the Redis backend does not require enhanced connection reliability.

Use when you can deploy the upstream fix. It is usually lower-risk than long-lived workarounds.

Fix reference: https://github.com/celery/celery/pull/5952

First fixed release: 4.4.1

Last verified: 2026-02-09. Validate in your environment.

Get updates

We publish verified fixes weekly. No spam.

Subscribe

When NOT to Use This Fix

  • This fix is not applicable if the Redis backend does not require enhanced connection reliability.

Did This Fix Work in Your Case?

Quick signal helps us prioritize which fixes to verify and improve.

Prevention

  • Make timeouts explicit and test them (unit + integration) to avoid silent behavior changes.
  • Instrument retries (attempt count + reason) and alert on spikes to catch dependency slowdowns.

Version Compatibility Table

VersionStatus
4.4.1 Fixed

Related Issues

No related fixes found.

Sources

We don’t republish the full GitHub discussion text. Use the links above for context.