The Fix
The Consul backend must correctly associate requests and responses, addressing a race condition issue when saving results.
Based on closed celery/celery issue #5605 · PR/commit linked
Production note: This tends to surface only under concurrency. Reproduce with load tests and watch for lock contention/cancellation paths.
@@ -31,7 +31,6 @@ class ConsulBackend(KeyValueStoreBackend):
supports_autoexpire = True
- client = None
consistency = 'consistent'
path = None
is actually *sometimes* getting a simple boolean response. What can return such a result? The Consul HTTP API that returns a boolean response is the "delete". The ["set" code](https://github.com/celery/celery/blob/f04c2cabce883450e4304a6dfbd16514bef60e73/celery/backends/consul.py#L90) and ["delete" code](https://github.com/celery/celery/blob/f04c2cabce883450e4304a6dfbd16514bef60e73/celery/backends/consul.py#L100) both appear to be called, and the evidence is clear that the various responses are getting mixed up. For example, I see instances where the code is attempting to get the ['ID'] of a boolean!
From the way the backend code is written with inline callbacks, the only way I can see this happening is if a given listener is threaded and the session context is getting confused, but before I plunge into the deep end and propose a fix for the Consul library handling the sessions, I am looking for confirmation that my suspicions match Celery usage of its backend.
Which backend are you using for the broker, ie: RabbitMQ, Redis, SQS, something else?
RabbitMQ.
In that case every task that gets a result must define a unique response queue which celery handles internally -- is consul reading from the queue for both expired results or just using the result.get() method? I read somewhere here recently you have to call .get() or .ignore() on every result promise object when using the results backend otherwise it can lead to something possibly like you are describing. It can happen if in your code you only care about the results from some tasks.
For clarity, I'm just a user. I didn't write either the Consul result backend or the Python-consul package.
That said my code does wait for and check every result.
However, I don't understand what any of that has to do with the problem I described, or the clarification I am seeking.
All I need to check is whether, when Celery invokes the backend to save results, it uses multiple threads? The reason for asking is that I am seeing errors which can be explained if threads are in use because the Python-consul package is not threadsafe
@ShaheedHaque As I already stated above, if you are using the default prefork pool, which you have said you are, then Celery is invoking multiple processes and not multiple threads.
@thedrow Apologies, I did not consider this to fit any of the templates, and specifically I did not consider this a bug report against Celery. Here is the template anyway...
- [x] I have read the relevant section in the
[contribution guide](http://docs.celeryproject.org/en/latest/contributing.html#other-bugs)
on reporting bugs.
- [x] I have checked the [issues list](https://github.com/celery/celery/issues?q=is%3Aissue+label%3A%22Issue+Type%3A+Bug+Report%22+-label%3A%22Category%3A+Documentation%22)
for similar or identical bug reports.
- [x] I have checked the [pull requests list](https://github.com/celery/celery/pulls?q=is%3Apr+label%3A%22PR+Type%3A+Bugfix%22+-label%3A%22Category%3A+Documentation%22)
for existing proposed fixes.
- [x] I have checked the [commit log](https://github.com/celery/celery/commits/master)
to find out if the bug was already fixed in the master branch.
- [x] I have included all related issues and possible duplicate issues
in this issue (If there are none, check this box anyway).
## Mandatory Debugging Information
- [x] I have included the output of ``celery -A proj report`` in the issue.
- [ ] I have verified that the issue exists against the `master` branch of Celery.
- [x] I have included the contents of ``pip freeze`` in the issue.
- [x] I have
... (truncated) ...
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Option A — Apply the official fix\nThe Consul backend must correctly associate requests and responses, addressing a race condition issue when saving results.\nWhen NOT to use: This fix is not suitable if the application requires a single connection for performance reasons.\n\nOption C — Workaround\n(on the basis that correctness trumps performance), I'd be happy to provide a PR.\nWhen NOT to use: This fix is not suitable if the application requires a single connection for performance reasons.\n\n
Why This Fix Works in Production
- Trigger: Race condition in Consul result backend when saving a result
- Mechanism: The Consul backend does not cleanly associate responses from Consul with the outbound Celery request, leading to race conditions
- If left unfixed, failures can be intermittent under concurrency (hard to reproduce; shows up as sporadic 5xx/timeouts).
Why This Breaks in Prod
- Shows up under Python 3.7 in real deployments (not just unit tests).
- The Consul backend does not cleanly associate responses from Consul with the outbound Celery request, leading to race conditions
- Surfaces as: Traceback (most recent call last):\n File "/usr/local/lib/python3.7/dist-packages/celery/app/trace.py", line 449, in trace_task\n uuid, retval, task_request, publish_result,\n…
Proof / Evidence
- GitHub issue: #5605
- Fix PR: https://github.com/celery/celery/pull/6823
- Reproduced locally: No (not executed)
- Last verified: 2026-02-11
- Confidence: 0.70
- Did this fix it?: Yes (upstream fix exists)
- Own content ratio: 0.25
Discussion
High-signal excerpts from the issue thread (symptoms, repros, edge-cases).
“I now believe this is an issue in Consul itself. I'll leave this open to report on progress.”
“Which backend are you using for the broker, ie: RabbitMQ, Redis, SQS, something else?”
“@ShaheedHaque As I already stated above, if you are using the default prefork pool, which you have said you are, then Celery is invoking multiple…”
“OK, I believe there *is* a bug in Celery here too”
Failure Signature (Search String)
- Race condition in Consul result backend when saving a result
Error Message
Stack trace
Error Message
-------------
Traceback (most recent call last):\n File "/usr/local/lib/python3.7/dist-packages/celery/app/trace.py", line 449, in trace_task\n uuid, retval, task_request, publish_result,\n File "/usr/local/lib/python3.7/dist-packages/celery/backends/base.py", line 149, in mark_as_done\n self.store_result(task_id, result, state, request=request)\n File "/usr/local/lib/python3.7/dist-packages/celery/backends/base.py", line 342, in store_result\n request=request, **kwargs)\n File "/usr/local/lib/python3.7/dist-packages/celery/backends/base.py", line 714, in _store_result\n self.set(self.get_key_for_task(task_id), self.encode(meta))\n File "/usr/local/lib/python3.7/dist-packages/celery/backends/consul.py", line 92, in set\n ttl=self.expires)\n File "/usr/local/lib/python3.7/dist-packages/consul/base.py", line 1781, in create\n data=data)\n File "/usr/local/lib/python3.7/dist-packages/consul/std.py", line 33, in put\n self.session.request('PUT', uri, body=data, headers=JSON_HEADER)))\n File "/usr/local/lib/python3.7/dist-packages/consul/base.py", line 234, in cb\n data = data['ID']
Stack trace
Error Message
-------------
Traceback (most recent call last):\n File "/usr/local/lib/python3.8/dist-packages/celery/backends/consul.py", line 64, in get\n _, data = self.client.kv.get(key)\n File "/usr/local/lib/python3.8/dist-packages/python_consul-1.1.0-py3.8.egg/consul/base.py", line 553, in get\n return self.agent.http.get(\n File "/usr/local/lib/python3.8/dist-packages/python_consul-1.1.0-py3.8.egg/consul/std.py", line 21, in get\n return callback(self.response(\n File "/usr/local/lib/python3.8/dist-packages/python_consul-1.1.0-py3.8.egg/consul/base.py", line 232, in cb\n if item.get(decode) is not None:
Stack trace
Error Message
-------------
Traceback (most recent call last):\n File "/usr/local/lib/python3.8/dist-packages/celery/app/trace.py", line 479, in trace_task\n mark_as_done(\n File "/usr/local/lib/python3.8/dist-packages/celery/backends/base.py", line 158, in mark_as_done\n self.store_result(task_id, result, state, request=request)\n File "/usr/local/lib/python3.8/dist-packages/celery/backends/base.py", line 442, in store_result\n self._store_result(task_id, result, state, traceback,\n File "/usr/local/lib/python3.8/dist-packages/celery/backends/base.py", line 853, in _store_result\n current_meta = self._get_task_meta_for(task_id)\n File "/usr/local/lib/python3.8/dist-packages/celery/backends/base.py", line 871, in _get_task_meta_for\n meta = self.get(self.get_key_for_task(task_id))\n File "/usr/local/lib/python3.8/dist-packages/celery/backends/consul.py", line 64, in get\n _, data = self.client.kv.get(key)\n File "/usr/local/lib/python3.8/dist-packages/python_consul-1.1.0-py3.8.egg/consul/base.py", line 553, in get\n return self.agent.http.get(\n File "/usr/local/lib/python3.8/dist-packages/python_consul-1.1.0-py3.8.egg/consul/std.py", line 21, in get\n return callback(self.response(\n File "/usr/local/lib/python3.8/dist-packages/python_consul-1.1.0-py3.8.egg/consul/base.py", line 232, in cb\n if item.get(decode) is not None:
Minimal Reproduction
is actually *sometimes* getting a simple boolean response. What can return such a result? The Consul HTTP API that returns a boolean response is the "delete". The ["set" code](https://github.com/celery/celery/blob/f04c2cabce883450e4304a6dfbd16514bef60e73/celery/backends/consul.py#L90) and ["delete" code](https://github.com/celery/celery/blob/f04c2cabce883450e4304a6dfbd16514bef60e73/celery/backends/consul.py#L100) both appear to be called, and the evidence is clear that the various responses are getting mixed up. For example, I see instances where the code is attempting to get the ['ID'] of a boolean!
From the way the backend code is written with inline callbacks, the only way I can see this happening is if a given listener is threaded and the session context is getting confused, but before I plunge into the deep end and propose a fix for the Consul library handling the sessions, I am looking for confirmation that my suspicions match Celery usage of its backend.
Which backend are you using for the broker, ie: RabbitMQ, Redis, SQS, something else?
RabbitMQ.
In that case every task that gets a result must define a unique response queue which celery handles internally -- is consul reading from the queue for both expired results or just using the result.get() method? I read somewhere here recently you have to call .get() or .ignore() on every result promise object when using the results backend otherwise it can lead to something possibly like you are describing. It can happen if in your code you only care about the results from some tasks.
For clarity, I'm just a user. I didn't write either the Consul result backend or the Python-consul package.
That said my code does wait for and check every result.
However, I don't understand what any of that has to do with the problem I described, or the clarification I am seeking.
All I need to check is whether, when Celery invokes the backend to save results, it uses multiple threads? The reason for asking is that I am seeing errors which can be explained if threads are in use because the Python-consul package is not threadsafe
@ShaheedHaque As I already stated above, if you are using the default prefork pool, which you have said you are, then Celery is invoking multiple processes and not multiple threads.
@thedrow Apologies, I did not consider this to fit any of the templates, and specifically I did not consider this a bug report against Celery. Here is the template anyway...
- [x] I have read the relevant section in the
[contribution guide](http://docs.celeryproject.org/en/latest/contributing.html#other-bugs)
on reporting bugs.
- [x] I have checked the [issues list](https://github.com/celery/celery/issues?q=is%3Aissue+label%3A%22Issue+Type%3A+Bug+Report%22+-label%3A%22Category%3A+Documentation%22)
for similar or identical bug reports.
- [x] I have checked the [pull requests list](https://github.com/celery/celery/pulls?q=is%3Apr+label%3A%22PR+Type%3A+Bugfix%22+-label%3A%22Category%3A+Documentation%22)
for existing proposed fixes.
- [x] I have checked the [commit log](https://github.com/celery/celery/commits/master)
to find out if the bug was already fixed in the master branch.
- [x] I have included all related issues and possible duplicate issues
in this issue (If there are none, check this box anyway).
## Mandatory Debugging Information
- [x] I have included the output of ``celery -A proj report`` in the issue.
- [ ] I have verified that the issue exists against the `master` branch of Celery.
- [x] I have included the contents of ``pip freeze`` in the issue.
- [x] I have
... (truncated) ...
Environment
- Python: 3.7
What Broke
Results may be incorrectly saved or expired due to concurrent access, causing task failures.
Why It Broke
The Consul backend does not cleanly associate responses from Consul with the outbound Celery request, leading to race conditions
Fix Options (Details)
Option A — Apply the official fix
The Consul backend must correctly associate requests and responses, addressing a race condition issue when saving results.
Option C — Workaround Temporary workaround
(on the basis that correctness trumps performance), I'd be happy to provide a PR.
Use only if you cannot change versions today. Treat this as a stopgap and remove once upgraded.
Fix reference: https://github.com/celery/celery/pull/6823
Last verified: 2026-02-11. Validate in your environment.
When NOT to Use This Fix
- This fix is not suitable if the application requires a single connection for performance reasons.
Verify Fix
Re-run the minimal reproduction on your broken version, then apply the fix and re-run.
Did This Fix Work in Your Case?
Quick signal helps us prioritize which fixes to verify and improve.
Prevention
- Add a stress test that runs high-concurrency workloads and fails on thread dumps / blocked locks.
- Enable watchdog dumps in prod (faulthandler, thread dump endpoint) to capture deadlocks quickly.
Related Issues
No related fixes found.
Sources
We don’t republish the full GitHub discussion text. Use the links above for context.