Jump to solution
Verify

The Fix

pip install redis==7.1.1

Based on closed redis/redis-py issue #3203 · PR/commit linked

Production note: This usually shows up under retries/timeouts. Treat it as a side-effect risk until you can verify behavior with a canary + real traffic.

Jump to Verify Open PR/Commit
@@ -296,7 +296,14 @@ def set_parser(self, parser_class: Type[BaseParser]) -> None: async def connect(self): """Connects to the Redis server if not already connected""" - await self.connect_check_health(check_health=True) + # try once the socket connect with the handshake, retry the whole + # connect/handshake flow based on retry policy
repro.py
def event_handler(msg): print(msg) redis_pub_sub.psubscribe(**{"test": event_handler}) def exception_handler(ex, pubsub, thread): print(traceback.format_exc()) redis_pub_sub.run_in_thread(sleep_time=0.1, exception_handler=exception_handler) # Close Redis connection
verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
fix.md
Option A — Upgrade to fixed release\npip install redis==7.1.1\nWhen NOT to use: This fix should not be used if the application cannot tolerate any connection errors.\n\nOption C — Workaround\nis to wrap everything inside a retry call, as shown below. However, this might introduce unexpected side effects.\nWhen NOT to use: This fix should not be used if the application cannot tolerate any connection errors.\n\n

Why This Fix Works in Production

  • Trigger: return do()
  • Mechanism: The retry functionality in the PubSub-client fails to handle OSError correctly during reconnection attempts
  • Why the fix works: Adds retries for both the socket connection and the initial connection handshake, improving the retry behavior for Pub/Sub connection issues. (first fixed release: 7.1.1).
Production impact:
  • If left unfixed, retry loops can amplify load and turn a small outage into a cascade (thundering herd).

Why This Breaks in Prod

  • Shows up under Python 3.12.2 in real deployments (not just unit tests).
  • The retry functionality in the PubSub-client fails to handle OSError correctly during reconnection attempts
  • Surfaces as: Traceback (most recent call last):

Proof / Evidence

  • GitHub issue: #3203
  • Fix PR: https://github.com/redis/redis-py/pull/3863
  • First fixed release: 7.1.1
  • Reproduced locally: No (not executed)
  • Last verified: 2026-02-08
  • Confidence: 0.80
  • Did this fix it?: Yes (upstream fix exists)
  • Own content ratio: 0.39

Discussion

High-signal excerpts from the issue thread (symptoms, repros, edge-cases).

“Btw @petyaslavova , I have a better explanation of the problem here”
@ManelCoutinhoSensei · 2025-08-08 · source
“Having the same issue with pipelines. Solved using this hack:”
@d33m00n · 2025-08-08 · source
“Hey, Based on your description, I believe I've encountered the same issue with the synchronous version (as you said)”
@ManelCoutinhoSensei · 2025-01-30 · repro detail · source

Failure Signature (Search String)

  • return do()

Error Message

Stack trace
error.txt
Error Message ------------- Traceback (most recent call last): File "/path/to/env/lib/python3.10/site-packages/redis/retry.py", line 62, in call_with_retry return do() File "/path/to/env/lib/python3.10/site-packages/redis/client.py", line 842, in <lambda> lambda: command(*args, **kwargs), File "/path/to/env/lib/python3.10/site-packages/redis/client.py", line 859, in try_read if not conn.can_read(timeout=timeout): File "/path/to/env/lib/python3.10/site-packages/redis/connection.py", line 600, in can_read return self._parser.can_read(timeout) File "/path/to/env/lib/python3.10/site-packages/redis/_parsers/hiredis.py", line 79, in can_read return self.read_from_socket(timeout=timeout, raise_on_timeout=False) File "/path/to/env/lib/python3.10/site-packages/redis/_parsers/hiredis.py", line 90, in read_from_socket raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR) redis.exceptions.ConnectionError: Connection closed by server. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/path/to/env/lib/python3.10/site-packages/redis/connection.py", line 622, in read_response response = self._parser.read_response(disable_decoding=disable_decoding) File "/path/to/env/lib/python3.10/site-packages/redis/_parsers/hiredis.py", line 128, in read_response self.read_from_socket() File "/path/to/env/lib/python3.10/site-p ... (truncated) ...

Minimal Reproduction

repro.py
def event_handler(msg): print(msg) redis_pub_sub.psubscribe(**{"test": event_handler}) def exception_handler(ex, pubsub, thread): print(traceback.format_exc()) redis_pub_sub.run_in_thread(sleep_time=0.1, exception_handler=exception_handler) # Close Redis connection

Environment

  • Python: 3.12.2

What Broke

PubSub-client crashes and does not recover when Redis is restarted, causing outages.

Why It Broke

The retry functionality in the PubSub-client fails to handle OSError correctly during reconnection attempts

Fix Options (Details)

Option A — Upgrade to fixed release Safe default (recommended)

pip install redis==7.1.1

When NOT to use: This fix should not be used if the application cannot tolerate any connection errors.

Use when you can deploy the upstream fix. It is usually lower-risk than long-lived workarounds.

Option C — Workaround Temporary workaround

is to wrap everything inside a retry call, as shown below. However, this might introduce unexpected side effects.

When NOT to use: This fix should not be used if the application cannot tolerate any connection errors.

Use only if you cannot change versions today. Treat this as a stopgap and remove once upgraded.

Fix reference: https://github.com/redis/redis-py/pull/3863

First fixed release: 7.1.1

Last verified: 2026-02-08. Validate in your environment.

Get updates

We publish verified fixes weekly. No spam.

Subscribe

When NOT to Use This Fix

  • This fix should not be used if the application cannot tolerate any connection errors.

Verify Fix

verify
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.

Did This Fix Work in Your Case?

Quick signal helps us prioritize which fixes to verify and improve.

Prevention

  • Make timeouts explicit and test them (unit + integration) to avoid silent behavior changes.
  • Instrument retries (attempt count + reason) and alert on spikes to catch dependency slowdowns.

Version Compatibility Table

VersionStatus
7.1.1 Fixed

Related Issues

No related fixes found.

Sources

We don’t republish the full GitHub discussion text. Use the links above for context.