Jump to solution
Verify

The Fix

pip install celery==4.4.0rc5

Based on closed celery/celery issue #5358 · PR/commit linked

Production note: This usually shows up under retries/timeouts. Treat it as a side-effect risk until you can verify behavior with a canary + real traffic.

Jump to Verify Open PR/Commit
@@ -816,7 +816,7 @@ def _connection(self, url, userid=None, password=None, transport=transport or conf.broker_transport, ssl=self.either('broker_use_ssl', ssl), - heartbeat=heartbeat, + heartbeat=heartbeat or self.conf.broker_heartbeat, login_method=login_method or conf.broker_login_method,
repro.py
import time from datetime import datetime from pathlib import Path from celery import Celery # not OK app = Celery(Path(__file__).stem, backend='rpc://localhost', broker='pyamqp://') # OK #app = Celery(Path(__file__).stem, backend='rpc://localhost', broker='redis://') # OK #app = Celery(Path(__file__).stem, backend='redis://', broker='pyamqp://') @app.task def ping(v): time.sleep(10) return v if __name__ == '__main__': start = time.time() tasks = [ping.delay(i) for i in range(100)] for task in tasks: result = task.get() now = time.time() timestamp = datetime.now().strftime('%Y-%m-%dT%H:%M:%S') print(f'{timestamp}: {result}: {now - start}')
verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
fix.md
Option A — Upgrade to fixed release\npip install celery==4.4.0rc5\nWhen NOT to use: This fix is not applicable if the application relies on a different heartbeat configuration.\n\n

Why This Fix Works in Production

  • Mechanism: The broker connection was not using the heartbeat setting from the app configuration
  • Why the fix works: Addresses the issue of the broker connection not using the heartbeat setting from the app configuration, which can lead to connection resets. (first fixed release: 4.4.0rc5).
Production impact:
  • If left unfixed, the same config can fail only in production (env differences), causing startup failures or partial feature outages.

Why This Breaks in Prod

  • Shows up under Python 3.7 in real deployments (not just unit tests).
  • The broker connection was not using the heartbeat setting from the app configuration
  • Surfaces as: ...

Proof / Evidence

  • GitHub issue: #5358
  • Fix PR: https://github.com/celery/celery/pull/4148
  • First fixed release: 4.4.0rc5
  • Reproduced locally: No (not executed)
  • Last verified: 2026-02-09
  • Confidence: 0.85
  • Did this fix it?: Yes (upstream fix exists)
  • Own content ratio: 0.17

Discussion

High-signal excerpts from the issue thread (symptoms, repros, edge-cases).

“I'm also experiencing this on 4.3.0rc1 even though it has been said in issues that this should be resolved in 4.3.0. Using broker amqp://rabbitmq and…”
@marcuslind90 · 2019-02-25 · confirmation · source
“It is confusing to close issues that haven't been solved yet”
@preeth1 · 2021-03-05 · source
“@thedrow i am trying to reproduce this issue, it throws Errno 104 after 360seconds in my case”
@last-partizan · 2019-03-15 · repro detail · source
“Just chiming in, but I see the same thing happening on 4.3.0, but on Python 3.6”
@Shookit · 2019-04-04 · source

Error Message

Stack trace
error.txt
Error Message ------------- ... 2019-02-23T12:49:59: 66: 170.30147862434387 2019-02-23T12:49:59: 67: 170.30566883087158 Traceback (most recent call last): File "repro.py", line 27, in <module> result = task.get() File "/tmp/celery-repro/celery/celery/result.py", line 226, in get on_message=on_message, File "/tmp/celery-repro/celery/celery/backends/asynchronous.py", line 188, in wait_for_pending for _ in self._wait_for_pending(result, **kwargs): File "/tmp/celery-repro/celery/celery/backends/asynchronous.py", line 255, in _wait_for_pending on_interval=on_interval): File "/tmp/celery-repro/celery/celery/backends/asynchronous.py", line 56, in drain_events_until yield self.wait_for(p, wait, timeout=1) File "/tmp/celery-repro/celery/celery/backends/asynchronous.py", line 65, in wait_for wait(timeout=timeout) File "/tmp/celery-repro/celery/celery/backends/rpc.py", line 63, in drain_events return self._connection.drain_events(timeout=timeout) File "/tmp/celery-repro/.venv/lib/python3.7/site-packages/kombu/connection.py", line 315, in drain_events return self.transport.drain_events(self.connection, **kwargs) File "/tmp/celery-repro/.venv/lib/python3.7/site-packages/kombu/transport/pyamqp.py", line 103, in drain_events return connection.drain_events(**kwargs) File "/tmp/celery-repro/.venv/lib/python3.7/site-packages/amqp/connection.py", line 500, in drain_eve ... (truncated) ...
Stack trace
error.txt
Error Message ------------- 2019-02-27T20:25:14: 0: 10.03208875656128 2019-02-27T20:25:24: 1: 20.04633927345276 2019-02-27T20:25:34: 2: 30.0596764087677 2019-02-27T20:25:44: 3: 40.07441329956055 2019-02-27T20:25:54: 4: 50.089730739593506 2019-02-27T20:26:04: 5: 60.10718393325806 2019-02-27T20:26:14: 6: 70.12166166305542 2019-02-27T20:26:24: 7: 80.13692331314087 2019-02-27T20:26:34: 8: 90.14560532569885 2019-02-27T20:26:44: 9: 100.15549874305725 2019-02-27T20:26:54: 10: 110.1715259552002 2019-02-27T20:27:04: 11: 120.18532824516296 2019-02-27T20:27:14: 12: 130.1992917060852 2019-02-27T20:27:24: 13: 140.20705103874207 2019-02-27T20:27:34: 14: 150.21889424324036 2019-02-27T20:27:44: 15: 160.23034191131592 2019-02-27T20:27:54: 16: 170.23874020576477 Traceback (most recent call last): File "repro.py", line 27, in <module> result = task.get() File "/home/chris/PersonalDocuments/projects/celery/celery/celery/result.py", line 226, in get on_message=on_message, File "/home/chris/PersonalDocuments/projects/celery/celery/celery/backends/asynchronous.py", line 188, in wait_for_pending for _ in self._wait_for_pending(result, **kwargs): File "/home/chris/PersonalDocuments/projects/celery/celery/celery/backends/asynchronous.py", line 255, in _wait_for_pending on_interval=on_interval): File "/home/chris/PersonalDocuments/projects/celery/celery/celery/backends/asynchronous.py", line 56, in drain_e ... (truncated) ...
Stack trace
error.txt
Error Message ------------- pipenv run python repro.py 2019-04-04T17:29:07: 0: 10.707868337631226 2019-04-04T17:29:17: 1: 21.063886642456055 Traceback (most recent call last): File "repro.py", line 31, in <module> result = ping.delay(i).get() File "/home/serg/work/git/celery-5358/.venv/lib/python3.6/site-packages/celery/result.py", line 226, in get on_message=on_message, File "/home/serg/work/git/celery-5358/.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 188, in wait_for_pending for _ in self._wait_for_pending(result, **kwargs): File "/home/serg/work/git/celery-5358/.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 255, in _wait_for_pending on_interval=on_interval): File "/home/serg/work/git/celery-5358/.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 56, in drain_events_until yield self.wait_for(p, wait, timeout=1) File "/home/serg/work/git/celery-5358/.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 65, in wait_for wait(timeout=timeout) File "/home/serg/work/git/celery-5358/.venv/lib/python3.6/site-packages/celery/backends/rpc.py", line 63, in drain_events return self._connection.drain_events(timeout=timeout) File "/home/serg/work/git/celery-5358/.venv/lib/python3.6/site-packages/kombu/connection.py", line 315, in drain_events return self.tra ... (truncated) ...

Minimal Reproduction

repro.py
import time from datetime import datetime from pathlib import Path from celery import Celery # not OK app = Celery(Path(__file__).stem, backend='rpc://localhost', broker='pyamqp://') # OK #app = Celery(Path(__file__).stem, backend='rpc://localhost', broker='redis://') # OK #app = Celery(Path(__file__).stem, backend='redis://', broker='pyamqp://') @app.task def ping(v): time.sleep(10) return v if __name__ == '__main__': start = time.time() tasks = [ping.delay(i) for i in range(100)] for task in tasks: result = task.get() now = time.time() timestamp = datetime.now().strftime('%Y-%m-%dT%H:%M:%S') print(f'{timestamp}: {result}: {now - start}')

Environment

  • Python: 3.7

What Broke

Tasks fail with a connection reset error after waiting for a long time.

Why It Broke

The broker connection was not using the heartbeat setting from the app configuration

Fix Options (Details)

Option A — Upgrade to fixed release Safe default (recommended)

pip install celery==4.4.0rc5

When NOT to use: This fix is not applicable if the application relies on a different heartbeat configuration.

Use when you can deploy the upstream fix. It is usually lower-risk than long-lived workarounds.

Fix reference: https://github.com/celery/celery/pull/4148

First fixed release: 4.4.0rc5

Last verified: 2026-02-09. Validate in your environment.

Get updates

We publish verified fixes weekly. No spam.

Subscribe

When NOT to Use This Fix

  • This fix is not applicable if the application relies on a different heartbeat configuration.

Verify Fix

verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.

Did This Fix Work in Your Case?

Quick signal helps us prioritize which fixes to verify and improve.

Prevention

  • Add a stress test that runs high-concurrency workloads and fails on thread dumps / blocked locks.
  • Enable watchdog dumps in prod (faulthandler, thread dump endpoint) to capture deadlocks quickly.

Version Compatibility Table

VersionStatus
4.4.0rc5 Fixed

Related Issues

No related fixes found.

Sources

We don’t republish the full GitHub discussion text. Use the links above for context.