The Fix
pip install redis==7.1.0
Based on closed redis/redis-py issue #1789 · PR/commit linked
Production note: Most teams hit this during upgrades or environment changes. Roll out with a canary and smoke critical endpoints (health, OpenAPI/docs) before 100%.
@@ -604,7 +604,9 @@ def connect(self):
return
try:
- sock = self._connect()
+ sock = self.retry.call_with_retry(
+ lambda: self._connect(), lambda error: self.disconnect(error)
│ django_redis.exceptions.ConnectionInterrupted: Redis ConnectionError: Error 111 connecting to redis-cache-master.fusion.svc.cluster.local:6379. Connection refused. │
│ During handling of the above exception, another exception occurred: │
│ Traceback (most recent call last): │
│ File "/usr/local/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner │
│ response = get_response(request) │
│ File "/usr/local/lib/python3.9/site-packages/django/utils/deprecation.py", line 116, in __call__ │
│ response = self.process_request(request) │
│ File "/usr/local/lib/python3.9/site-packages/django/middleware/cache.py", line 145, in process_request │
│ cache_key = get_cache_key(request, self.key_prefix, 'GET', cache=self.cache) │
│ File "/usr/local/lib/python3.9/site-packages/django/utils/cache.py", line 362, in get_cache_key │
│ headerlist = cache.get(cache_key) │
│ File "/usr/local/lib/python3.9/site-packages/django_redis/cache.py", line 91, in get │
│ value = self._get(key, default, version, client) │
│ File "/usr/local/lib/python3.9/site-packages/django_redis/cache.py", line 38, in _decorator │
│ raise e.__cause__ │
│ File "/usr/local/lib/python3.9/site-packages/django_redis/client/default.py", line 258, in get │
│ value = client.get(key) │
│ File "/usr/local/lib/python3.9/site-packages/redis/commands/
... (truncated) ...
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Option A — Upgrade to fixed release\npip install redis==7.1.0\nWhen NOT to use: Do not use this fix if the client is not configured to handle transient network failures.\n\n
Why This Fix Works in Production
- Trigger: [Bug] Clients running with connection pool configured cannot retry on transient network failures
- Mechanism: Added a retry mechanism on socket timeouts when connecting to the server to handle transient network failures.
- Why the fix works: Added a retry mechanism on socket timeouts when connecting to the server to handle transient network failures. (first fixed release: 7.1.0).
- If left unfixed, the same config can fail only in production (env differences), causing startup failures or partial feature outages.
Why This Breaks in Prod
- Shows up under Python 3.9.5 in real deployments (not just unit tests).
- Production symptom (often without a traceback): [Bug] Clients running with connection pool configured cannot retry on transient network failures
Proof / Evidence
- GitHub issue: #1789
- Fix PR: https://github.com/redis/redis-py/pull/1895
- First fixed release: 7.1.0
- Reproduced locally: No (not executed)
- Last verified: 2026-02-07
- Confidence: 0.85
- Did this fix it?: Yes (upstream fix exists)
- Own content ratio: 0.34
Discussion
High-signal excerpts from the issue thread (symptoms, repros, edge-cases).
“The fix was merged: https://github.com/redis/redis-py/pull/1895”
Failure Signature (Search String)
- [Bug] Clients running with connection pool configured cannot retry on transient network failures
- It seems that this occurs because the `retry` object is bound only to the connection rather than the client or connection pool so the client cannot currently retry without first
Copy-friendly signature
Failure Signature
-----------------
[Bug] Clients running with connection pool configured cannot retry on transient network failures
It seems that this occurs because the `retry` object is bound only to the connection rather than the client or connection pool so the client cannot currently retry without first retrieving the connection from the pool.
Error Message
Signature-only (no traceback captured)
Error Message
-------------
[Bug] Clients running with connection pool configured cannot retry on transient network failures
It seems that this occurs because the `retry` object is bound only to the connection rather than the client or connection pool so the client cannot currently retry without first retrieving the connection from the pool.
Minimal Reproduction
│ django_redis.exceptions.ConnectionInterrupted: Redis ConnectionError: Error 111 connecting to redis-cache-master.fusion.svc.cluster.local:6379. Connection refused. │
│ During handling of the above exception, another exception occurred: │
│ Traceback (most recent call last): │
│ File "/usr/local/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner │
│ response = get_response(request) │
│ File "/usr/local/lib/python3.9/site-packages/django/utils/deprecation.py", line 116, in __call__ │
│ response = self.process_request(request) │
│ File "/usr/local/lib/python3.9/site-packages/django/middleware/cache.py", line 145, in process_request │
│ cache_key = get_cache_key(request, self.key_prefix, 'GET', cache=self.cache) │
│ File "/usr/local/lib/python3.9/site-packages/django/utils/cache.py", line 362, in get_cache_key │
│ headerlist = cache.get(cache_key) │
│ File "/usr/local/lib/python3.9/site-packages/django_redis/cache.py", line 91, in get │
│ value = self._get(key, default, version, client) │
│ File "/usr/local/lib/python3.9/site-packages/django_redis/cache.py", line 38, in _decorator │
│ raise e.__cause__ │
│ File "/usr/local/lib/python3.9/site-packages/django_redis/client/default.py", line 258, in get │
│ value = client.get(key) │
│ File "/usr/local/lib/python3.9/site-packages/redis/commands/
... (truncated) ...
Environment
- Python: 3.9.5
What Broke
Clients experience connection errors and cannot recover from transient network failures.
Fix Options (Details)
Option A — Upgrade to fixed release Safe default (recommended)
pip install redis==7.1.0
Use when you can deploy the upstream fix. It is usually lower-risk than long-lived workarounds.
Fix reference: https://github.com/redis/redis-py/pull/1895
First fixed release: 7.1.0
Last verified: 2026-02-07. Validate in your environment.
When NOT to Use This Fix
- Do not use this fix if the client is not configured to handle transient network failures.
Verify Fix
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Did This Fix Work in Your Case?
Quick signal helps us prioritize which fixes to verify and improve.
Prevention
- Track RSS + object counts after deployments; alert on monotonic growth and GC pressure.
- Add a long-running test that repeats the failing call path and asserts stable memory.
Version Compatibility Table
| Version | Status |
|---|---|
| 7.1.0 | Fixed |
Related Issues
No related fixes found.
Sources
We don’t republish the full GitHub discussion text. Use the links above for context.