The Fix
pip install celery==4.4.0rc5
Based on closed celery/celery issue #5358 · PR/commit linked
Production note: This usually shows up under retries/timeouts. Treat it as a side-effect risk until you can verify behavior with a canary + real traffic.
@@ -816,7 +816,7 @@ def _connection(self, url, userid=None, password=None,
transport=transport or conf.broker_transport,
ssl=self.either('broker_use_ssl', ssl),
- heartbeat=heartbeat,
+ heartbeat=heartbeat or self.conf.broker_heartbeat,
login_method=login_method or conf.broker_login_method,
import time
from datetime import datetime
from pathlib import Path
from celery import Celery
# not OK
app = Celery(Path(__file__).stem, backend='rpc://localhost', broker='pyamqp://')
# OK
#app = Celery(Path(__file__).stem, backend='rpc://localhost', broker='redis://')
# OK
#app = Celery(Path(__file__).stem, backend='redis://', broker='pyamqp://')
@app.task
def ping(v):
time.sleep(10)
return v
if __name__ == '__main__':
start = time.time()
tasks = [ping.delay(i) for i in range(100)]
for task in tasks:
result = task.get()
now = time.time()
timestamp = datetime.now().strftime('%Y-%m-%dT%H:%M:%S')
print(f'{timestamp}: {result}: {now - start}')
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Option A — Upgrade to fixed release\npip install celery==4.4.0rc5\nWhen NOT to use: This fix is not applicable if the application relies on a different heartbeat configuration.\n\n
Why This Fix Works in Production
- Mechanism: The broker connection was not using the heartbeat setting from the app configuration
- Why the fix works: Addresses the issue of the broker connection not using the heartbeat setting from the app configuration, which can lead to connection resets. (first fixed release: 4.4.0rc5).
- If left unfixed, the same config can fail only in production (env differences), causing startup failures or partial feature outages.
Why This Breaks in Prod
- Shows up under Python 3.7 in real deployments (not just unit tests).
- The broker connection was not using the heartbeat setting from the app configuration
- Surfaces as: ...
Proof / Evidence
- GitHub issue: #5358
- Fix PR: https://github.com/celery/celery/pull/4148
- First fixed release: 4.4.0rc5
- Reproduced locally: No (not executed)
- Last verified: 2026-02-09
- Confidence: 0.85
- Did this fix it?: Yes (upstream fix exists)
- Own content ratio: 0.17
Discussion
High-signal excerpts from the issue thread (symptoms, repros, edge-cases).
“I'm also experiencing this on 4.3.0rc1 even though it has been said in issues that this should be resolved in 4.3.0. Using broker amqp://rabbitmq and…”
“It is confusing to close issues that haven't been solved yet”
“@thedrow i am trying to reproduce this issue, it throws Errno 104 after 360seconds in my case”
“Just chiming in, but I see the same thing happening on 4.3.0, but on Python 3.6”
Error Message
Stack trace
Error Message
-------------
...
2019-02-23T12:49:59: 66: 170.30147862434387
2019-02-23T12:49:59: 67: 170.30566883087158
Traceback (most recent call last):
File "repro.py", line 27, in <module>
result = task.get()
File "/tmp/celery-repro/celery/celery/result.py", line 226, in get
on_message=on_message,
File "/tmp/celery-repro/celery/celery/backends/asynchronous.py", line 188, in wait_for_pending
for _ in self._wait_for_pending(result, **kwargs):
File "/tmp/celery-repro/celery/celery/backends/asynchronous.py", line 255, in _wait_for_pending
on_interval=on_interval):
File "/tmp/celery-repro/celery/celery/backends/asynchronous.py", line 56, in drain_events_until
yield self.wait_for(p, wait, timeout=1)
File "/tmp/celery-repro/celery/celery/backends/asynchronous.py", line 65, in wait_for
wait(timeout=timeout)
File "/tmp/celery-repro/celery/celery/backends/rpc.py", line 63, in drain_events
return self._connection.drain_events(timeout=timeout)
File "/tmp/celery-repro/.venv/lib/python3.7/site-packages/kombu/connection.py", line 315, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File "/tmp/celery-repro/.venv/lib/python3.7/site-packages/kombu/transport/pyamqp.py", line 103, in drain_events
return connection.drain_events(**kwargs)
File "/tmp/celery-repro/.venv/lib/python3.7/site-packages/amqp/connection.py", line 500, in drain_eve
... (truncated) ...
Stack trace
Error Message
-------------
2019-02-27T20:25:14: 0: 10.03208875656128
2019-02-27T20:25:24: 1: 20.04633927345276
2019-02-27T20:25:34: 2: 30.0596764087677
2019-02-27T20:25:44: 3: 40.07441329956055
2019-02-27T20:25:54: 4: 50.089730739593506
2019-02-27T20:26:04: 5: 60.10718393325806
2019-02-27T20:26:14: 6: 70.12166166305542
2019-02-27T20:26:24: 7: 80.13692331314087
2019-02-27T20:26:34: 8: 90.14560532569885
2019-02-27T20:26:44: 9: 100.15549874305725
2019-02-27T20:26:54: 10: 110.1715259552002
2019-02-27T20:27:04: 11: 120.18532824516296
2019-02-27T20:27:14: 12: 130.1992917060852
2019-02-27T20:27:24: 13: 140.20705103874207
2019-02-27T20:27:34: 14: 150.21889424324036
2019-02-27T20:27:44: 15: 160.23034191131592
2019-02-27T20:27:54: 16: 170.23874020576477
Traceback (most recent call last):
File "repro.py", line 27, in <module>
result = task.get()
File "/home/chris/PersonalDocuments/projects/celery/celery/celery/result.py", line 226, in get
on_message=on_message,
File "/home/chris/PersonalDocuments/projects/celery/celery/celery/backends/asynchronous.py", line 188, in wait_for_pending
for _ in self._wait_for_pending(result, **kwargs):
File "/home/chris/PersonalDocuments/projects/celery/celery/celery/backends/asynchronous.py", line 255, in _wait_for_pending
on_interval=on_interval):
File "/home/chris/PersonalDocuments/projects/celery/celery/celery/backends/asynchronous.py", line 56, in drain_e
... (truncated) ...
Stack trace
Error Message
-------------
pipenv run python repro.py
2019-04-04T17:29:07: 0: 10.707868337631226
2019-04-04T17:29:17: 1: 21.063886642456055
Traceback (most recent call last):
File "repro.py", line 31, in <module>
result = ping.delay(i).get()
File "/home/serg/work/git/celery-5358/.venv/lib/python3.6/site-packages/celery/result.py", line 226, in get
on_message=on_message,
File "/home/serg/work/git/celery-5358/.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 188, in wait_for_pending
for _ in self._wait_for_pending(result, **kwargs):
File "/home/serg/work/git/celery-5358/.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 255, in _wait_for_pending
on_interval=on_interval):
File "/home/serg/work/git/celery-5358/.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 56, in drain_events_until
yield self.wait_for(p, wait, timeout=1)
File "/home/serg/work/git/celery-5358/.venv/lib/python3.6/site-packages/celery/backends/asynchronous.py", line 65, in wait_for
wait(timeout=timeout)
File "/home/serg/work/git/celery-5358/.venv/lib/python3.6/site-packages/celery/backends/rpc.py", line 63, in drain_events
return self._connection.drain_events(timeout=timeout)
File "/home/serg/work/git/celery-5358/.venv/lib/python3.6/site-packages/kombu/connection.py", line 315, in drain_events
return self.tra
... (truncated) ...
Minimal Reproduction
import time
from datetime import datetime
from pathlib import Path
from celery import Celery
# not OK
app = Celery(Path(__file__).stem, backend='rpc://localhost', broker='pyamqp://')
# OK
#app = Celery(Path(__file__).stem, backend='rpc://localhost', broker='redis://')
# OK
#app = Celery(Path(__file__).stem, backend='redis://', broker='pyamqp://')
@app.task
def ping(v):
time.sleep(10)
return v
if __name__ == '__main__':
start = time.time()
tasks = [ping.delay(i) for i in range(100)]
for task in tasks:
result = task.get()
now = time.time()
timestamp = datetime.now().strftime('%Y-%m-%dT%H:%M:%S')
print(f'{timestamp}: {result}: {now - start}')
Environment
- Python: 3.7
What Broke
Tasks fail with a connection reset error after waiting for a long time.
Why It Broke
The broker connection was not using the heartbeat setting from the app configuration
Fix Options (Details)
Option A — Upgrade to fixed release Safe default (recommended)
pip install celery==4.4.0rc5
Use when you can deploy the upstream fix. It is usually lower-risk than long-lived workarounds.
Fix reference: https://github.com/celery/celery/pull/4148
First fixed release: 4.4.0rc5
Last verified: 2026-02-09. Validate in your environment.
When NOT to Use This Fix
- This fix is not applicable if the application relies on a different heartbeat configuration.
Verify Fix
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Did This Fix Work in Your Case?
Quick signal helps us prioritize which fixes to verify and improve.
Prevention
- Add a stress test that runs high-concurrency workloads and fails on thread dumps / blocked locks.
- Enable watchdog dumps in prod (faulthandler, thread dump endpoint) to capture deadlocks quickly.
Version Compatibility Table
| Version | Status |
|---|---|
| 4.4.0rc5 | Fixed |
Related Issues
No related fixes found.
Sources
We don’t republish the full GitHub discussion text. Use the links above for context.