Celery Memory Leak on Unhandled Exceptions (Fix)

The Fix

pip install celery==5.6.0

Based on closed celery/celery issue #8882 · PR/commit linked

Production note: This usually shows up under retries/timeouts. Treat it as a side-effect risk until you can verify behavior with a canary + real traffic.

Jump to Verify Open PR/Commit

@@ -190,6 +190,7 @@ def handle_retry(self, task, req, store_errors=True, **kwargs):
         # and it's exc' attribute is the original exception raised (if any).
         type_, _, tb = sys.exc_info()
+        einfo = None
         try:
             reason = self.retval

repro.py

from celery import Celery

app = Celery('tasks', broker='pyamqp://guest@localhost//')


@app.task
def ok():
    pass


@app.task
def bad():
    raise RuntimeError("err")


@app.task(bind=True)
def again(self):
    if self.request.retries <  self.max_retries:
        raise self.retry(countdown=0.1)


if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument("task")
    parser.add_argument("--count", type=int, default=1)
    args = parser.parse_args()

    task = app.tasks.get(f"tasks.{args.task}")
    for _ in range(args.count):
        task.delay()

verify

Re-run the minimal reproduction on your broken version, then apply the fix and re-run.

fix.md

Option A — Upgrade to fixed release\npip install celery==5.6.0\nWhen NOT to use: This fix is not applicable if the application does not handle exceptions properly.\n\n

Why This Fix Works in Production

Trigger: Memory Leak on Unhandled Exceptions
Mechanism: Memory not being garbage collected due to reference cycles in traceback during unhandled exceptions
Why the fix works: Fixes a critical memory leak in Celery's exception handling that was causing significant memory growth when tasks raise unhandled exceptions. (first fixed release: 5.6.0).

Why This Breaks in Prod

Shows up under Python 3.11 in real deployments (not just unit tests).
Memory not being garbage collected due to reference cycles in traceback during unhandled exceptions
Production symptom (often without a traceback): Memory Leak on Unhandled Exceptions

Proof / Evidence

GitHub issue: #8882
Fix PR: https://github.com/celery/celery/pull/9799
First fixed release: 5.6.0
Reproduced locally: No (not executed)
Last verified: 2026-02-09
Confidence: 0.85
Did this fix it?: Yes (upstream fix exists)
Own content ratio: 0.58

Discussion

High-signal excerpts from the issue thread (symptoms, repros, edge-cases).

Jump to Sources Open on GitHub

“> Is seems that for now the solution is indeed to call gc.collect() directly or not to use nested exceptions. > > Anyway, hopefully this…”

@auvipy · 2025-04-03 · confirmation · source

“Recently I encountered the same problem - the worker's memory was not released upon an exception”

@shell-escape · 2025-04-02 · confirmation · source

Failure Signature (Search String)

Memory Leak on Unhandled Exceptions
- [X] I have included all related issues and possible duplicate issues

Copy-friendly signature

signature.txt

Failure Signature
-----------------
Memory Leak on Unhandled Exceptions
- [X] I have included all related issues and possible duplicate issues

Error Message

Signature-only (no traceback captured)

error.txt

Error Message
-------------
Memory Leak on Unhandled Exceptions
- [X] I have included all related issues and possible duplicate issues

Minimal Reproduction

repro.py

from celery import Celery

app = Celery('tasks', broker='pyamqp://guest@localhost//')


@app.task
def ok():
    pass


@app.task
def bad():
    raise RuntimeError("err")


@app.task(bind=True)
def again(self):
    if self.request.retries <  self.max_retries:
        raise self.retry(countdown=0.1)


if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument("task")
    parser.add_argument("--count", type=int, default=1)
    args = parser.parse_args()

    task = app.tasks.get(f"tasks.{args.task}")
    for _ in range(args.count):
        task.delay()

Environment

Python: 3.11

What Broke

Worker memory usage increased significantly with unhandled exceptions, leading to crashes.

Why It Broke

Memory not being garbage collected due to reference cycles in traceback during unhandled exceptions

Fix Options (Details)

Option A — Upgrade to fixed release Safe default (recommended)

pip install celery==5.6.0

When NOT to use: This fix is not applicable if the application does not handle exceptions properly.

Use when you can deploy the upstream fix. It is usually lower-risk than long-lived workarounds.

Option D — Guard side-effects with OnceOnly Guardrail for side-effects

Mitigate duplicate external side-effects under retries/timeouts/agent loops by gating the operation before calling external systems.

Place OnceOnly between your code/agent and real side-effects (Stripe, emails, CRM, APIs).
Use a stable key per side-effect (e.g., customer_id + action + idempotency_key).
Fail-safe: configure fail-open vs fail-closed based on blast radius and spend risk.

Show example snippet (optional)

onceonly.py

from onceonly import OnceOnly
import os

once = OnceOnly(api_key=os.environ["ONCEONLY_API_KEY"], fail_open=True)

# Stable idempotency key per real side-effect.
# Use a request id / job id / webhook delivery id / Stripe event id, etc.
event_id = "evt_..."  # replace
key = f"stripe:webhook:{event_id}"

res = once.check_lock(key=key, ttl=3600)
if res.duplicate:
    return {"status": "already_processed"}

# Safe to execute the side-effect exactly once.
handle_event(event_id)

See OnceOnly SDK

When NOT to use: Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.

Fix reference: https://github.com/celery/celery/pull/9799

First fixed release: 5.6.0

Last verified: 2026-02-09. Validate in your environment.

When NOT to Use This Fix

This fix is not applicable if the application does not handle exceptions properly.
Do not use this to hide logic bugs or data corruption. Use it to block duplicate external side-effects and enforce tool permissions/spend caps.

Verify Fix

verify

Re-run the minimal reproduction on your broken version, then apply the fix and re-run.

Did This Fix Work in Your Case?

Quick signal helps us prioritize which fixes to verify and improve.

Prevention

Track RSS + object counts after deployments; alert on monotonic growth and GC pressure.
Add a long-running test that repeats the failing call path and asserts stable memory.

Version Compatibility Table

Version	Status
5.6.0	Fixed

Related Issues

No related fixes found.

Cluster: celery:memory-leak Celery hub Celery best practices All hubs All clusters

Related clusters: Configuration error Data consistency Duplicates

Sources

We don’t republish the full GitHub discussion text. Use the links above for context.