Choosing a Python task queue library in 2026
Table of Contents
Python’s task queue ecosystem is in a pretty good place right now. There are more serious options than there used to be, and as async web frameworks have become common, background jobs are the next thing people usually need to figure out.
This post compares the Python task queue libraries worth considering in 2026: Celery, Dramatiq, FastStream, Taskiq, and Repid. The comparison covers broker support, async behavior, benchmark results, and the places where they differ.
Small disclosure before we get into it: Repid is my project. Take that bias for what it is; the goal is still a useful comparison.
TL;DR #
If you only want a shortlist, I would start here:
| If you need… | Start with… |
|---|---|
| Mature, biggest ecosystem, but can be slower | Celery |
| Stream-processing-style message handling | FastStream |
| High-throughput, asyncio-native, ease of integration | Repid |
Task queues: what problem do they solve? #
Let’s zoom out for a moment. What is the use case?
A common one is API latency. Your request handler is fast enough until it has to wait on something slow: an LLM call, a long database query, an email provider, payment processing, or any other work that does not have to finish before the user gets a response.
So there is work that can move out of the request path. This is where a task queue helps.
The queue decouples incoming API traffic from the slow part. The API no longer has to do the work immediately. It can schedule a task, return quickly, and let a separate worker deal with the rest.
If you want to learn more about various queue types, I recommend checking out a beautiful article by Sam Rose.
Task queues under load #
Great, so the API is fast again. Problem solved? Not quite.
Under load, the worker implementation starts to matter a lot. Here is a synchronous worker model, roughly what you get with Celery or Dramatiq in their usual form:
The worker runs a set of processes that perform the tasks. Once those processes are busy, messages start piling up in the queue. The longer each task takes, the more workers and processes you need to keep up.
The relationship is direct: higher latency means you need more parallel workers. But those tasks were moved out of the API exactly because they were slow. If most of the time is spent waiting on I/O, processing each task synchronously is usually the wrong tradeoff.
Asyncio helps in that specific case. A worker can start a slow operation, await it, and do other work while it waits. Parallelism is no longer tied directly to the number of OS processes. For I/O-heavy jobs, a single core can keep thousands of operations in flight.
CPU-bound work #
CPU-bound work is different. If the task spends most of its time executing some computation, it’s not like asyncio can magically create more CPU. Async-first libraries still need process pools for this kind of workload.
There can still be a small benefit: an async dispatcher can await process-pool work without blocking its own loop. It won’t be visible in synthetic benchmarks, however it can be key in mixed workloads. But the main benefit of async libraries is handling many I/O-bound tasks concurrently.
Contenders #
The comparison uses Python libraries that support multiple brokers, on their latest versions at the time of writing:
- Celery
v5.6.3 - Dramatiq
v2.1.0 - FastStream
v0.6.7 - Taskiq
v0.12.4 - Repid
v2.1.2
Smaller single-broker libraries such as huey and arq are out of scope here. So are durable execution tools such as Temporal and DBOS. They can solve some of the same problems, but they are different categories of tool. This post is about general-purpose task queue libraries.
What the code looks like #
Before looking at throughput, it is worth looking at the overall framework shape. These examples are intentionally small and use RabbitMQ. I tested them against RabbitMQ 4.3.
Celery #
# celery_app.py
from celery import Celery
app = Celery(
"celery_app",
broker="amqp://guest:guest@localhost:5672//",
)
@app.task
def send_email(user_id: int) -> None:
print(f"send email to {user_id}")
# producer.py
from celery_app import send_email
send_email.delay(123)
For a local single-process smoke test, run celery -A celery_app worker --pool=solo --loglevel=info.
Celery’s standard worker control path uses RabbitMQ features that are now deprecated and disabled by default, so in my local RabbitMQ container I had to set deprecated_features.permit.transient_nonexcl_queues = true.
Dramatiq #
# dramatiq_app.py
import dramatiq
from dramatiq.brokers.rabbitmq import RabbitmqBroker
broker = RabbitmqBroker(url="amqp://guest:guest@localhost:5672/")
dramatiq.set_broker(broker)
@dramatiq.actor(queue_name="send-email")
def send_email(user_id: int) -> None:
print(f"send email to {user_id}")
# producer.py
from dramatiq_app import send_email
send_email.send(123)
Run the worker with dramatiq dramatiq_app --processes 1 --threads 1.
FastStream #
# faststream_app.py
from faststream import FastStream
from faststream.rabbit import RabbitBroker
from pydantic import BaseModel
broker = RabbitBroker("amqp://guest:guest@localhost:5672/")
app = FastStream(broker)
class SendEmail(BaseModel):
user_id: int
@broker.subscriber("send-email")
async def send_email(message: SendEmail) -> None:
print(f"send email to {message.user_id}")
# producer.py
import asyncio
from faststream_app import broker
async def main() -> None:
async with broker:
await broker.publish({"user_id": 123}, "send-email")
asyncio.run(main())
Run the worker with faststream run faststream_app:app. The CLI needs the faststream[cli] extra.
Taskiq #
# taskiq_app.py
from taskiq_aio_pika import AioPikaBroker
broker = AioPikaBroker("amqp://guest:guest@localhost:5672/")
@broker.task
async def send_email(user_id: int) -> None:
print(f"send email to {user_id}")
# producer.py
import asyncio
from taskiq_app import broker, send_email
async def main() -> None:
await broker.startup()
await send_email.kiq(123)
await broker.shutdown()
asyncio.run(main())
Run the worker with taskiq worker taskiq_app:broker --workers 1 --max-async-tasks 1.
Repid #
# repid_app.py
from repid import AmqpServer, Repid, Router
app = Repid()
app.servers.register_server(
"default",
AmqpServer("amqp://guest:guest@localhost:5672"),
is_default=True,
)
router = Router()
@router.actor(channel="send-email")
async def send_email(user_id: int) -> None:
print(f"send email to {user_id}")
app.include_router(router)
# producer.py
import asyncio
from repid_app import app
async def main() -> None:
async with app.servers.default.connection():
await app.send_message_json(
channel="send-email",
payload={"user_id": 123},
headers={"topic": "send_email"},
)
asyncio.run(main())
# worker.py
import asyncio
from repid_app import app
async def main() -> None:
async with app.servers.default.connection():
await app.run_worker()
asyncio.run(main())
Run the worker with python worker.py.
Repid uses RabbitMQ through AMQP 1.0, compared to others using AMQP 0.9.1. It also doesn’t try to hide topology creation and expects it to be created by the user.
Benchmarks #
The benchmark code is available at github.com/aleksul/repid-benchmarks.
Sidenote: I also measured latency, but it did not reveal anything interesting - in these runs it mostly followed task duration. We will focus on throughput here.
To test throughput in I/O-bound scenarios, let’s model a few cases with sleeps of different lengths:
0.01 sec- similar to a cache retrieval0.1 sec- similar to a database call0.5 secand1 sec- similar to an API call5 sec- similar to an LLM response
I/O-bound, high-concurrency benchmark #
In this run, concurrency is not capped. Each connection can process and prefetch as many tasks as it can handle.
There is a noticeable drop-off and more variance at 0.01 seconds. The likely cause is that individual processes overfetch work even though the CPU is already maxed out, so the whole system becomes less efficient.
One caveat matters here: Celery and Dramatiq are synchronous frameworks, so this benchmark uses gevent (also known as “green threads”) to make them useful for I/O-bound workloads. It gives them a big throughput boost. It also relies on monkey patching, so your milage may vary when it comes to using it in a production app.
I/O-bound, limited-concurrency benchmark #
Next, let’s cap concurrency at 2000 tasks and add the non-green-thread versions of Celery and Dramatiq.
The theoretical maximum is 8 processes * 2000 concurrency * (1 / t sec).
The non-green-thread variants, marked as no-GT, are barely visible on this graph. The longer the task takes, the worse they look.
The asyncio libraries also beat the green-thread variants in this setup and reach higher CPU utilization. That usually means better vertical scaling before you have to add more machines.
The concurrency limit helped too. There was less contention between processes, the load spread more evenly, and the short-sleep cases had less variance. Throughput improved overall.
I/O-bound, steady-rate benchmark #
This benchmark looks at a running system without mixing in warmup and end-of-benchmark drain.
The shape is mostly the same as in the previous benchmarks, which helps to confirm prior results. Warmup and drain do not seem to be hiding a completely different story. The async-first libraries keep their advantage on shorter I/O-heavy work. At the longest task duration, most of the non-Celery results bunch together because of the concurrency limit.
CPU-bound benchmark #
For the CPU-bound benchmark, each task runs a SHA-256 hash for enough iterations to hit the target time.
For CPU-bound throughput, the differences are much smaller. That is expected: once the CPU is saturated, the library has less room to matter.
Interpreting the benchmarks #
Repid is the fastest in these benchmarks. FastStream and Taskiq also do well because they are asyncio-native. Dramatiq is much faster than Celery in the I/O-heavy cases, but that mostly shows up when both are using gevent. In their standard sync form, both leave a lot of I/O performance on the table.
Feature comparison #
Broker support #
| Framework/Broker | RabbitMQ | Redis | NATS | Amazon SQS | GCP Pub/Sub | Kafka |
|---|---|---|---|---|---|---|
| Celery | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
| Dramatiq | ✅ | ✅ | ❌ | 🟠 | ❌ | ❌ |
| FastStream | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ |
| Taskiq | ✅ | ✅ | ✅ | 🟠 | ❌ | 🟠 |
| Repid | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
✅ - supported
🟠 - community-supported
❌ - not supported
Async and sync support #
Repid, FastStream, and Taskiq are async-first, but they can still execute sync code. Of the three, Repid lets you override the ThreadPool or ProcessPool per actor. Taskiq supports this per worker.
Celery and Dramatiq are sync-first. They can run async actors, but it usually requires workarounds or does not help much because the surrounding worker code is still sync.
Dependency injection #
Repid and FastStream provide dependency injection natively. Taskiq has an extension for it. Celery and Dramatiq do not provide dependency injection.
Result backend #
Taskiq, Celery, and Dramatiq provide result backends. Repid and FastStream leave result storage to the user.
AsyncAPI #
Repid and FastStream provide native AsyncAPI integration. The others do not support it.
Parsing #
Dramatiq requires its own JSON format. Celery, Taskiq, and FastStream require their own message format, but let you override the parsers. Repid does not require a specific format.
Repid, FastStream, and Taskiq parse the message body into arguments and support Pydantic validation. Celery and Dramatiq pass arguments as defined in the sent message.
Repid can parse headers into actor arguments, and optionally validate with Pydantic.
Ack modes #
Celery is often criticized for acknowledging tasks before execution by default. You can configure late acknowledgements, but the default still surprises people. FastStream, Dramatiq, and Taskiq can have different acknowledgement behavior depending on the broker, but generally provide sane defaults. Repid abstracts this across brokers and exposes acknowledgement policies.
Retries #
At the API level, Celery provides the most retry machinery out of the box. Dramatiq and Taskiq provide some bootstrapping, but the actual retry logic should be written through middleware. Repid and FastStream keep retry policy closer to broker behavior or application code.
That is not inherently worse. Brokers often have their own retry tools, and sometimes those are the right place to define the behavior. RabbitMQ has dead-letter exchanges and delayed-message patterns. GCP Pub/Sub lets you configure retry policy per subscription. If your retry strategy depends heavily on the broker, a library-level abstraction can either help or get in the way.
Production behavior #
The details below are RabbitMQ-oriented, because the examples and benchmarks above use RabbitMQ. Kafka offsets, SQS visibility timeouts, and Pub/Sub ack deadlines move the failure modes around, but the same questions still matter.
Broker outage and publishing #
None of these libraries is a durable local outbox. If RabbitMQ is unavailable when a producer sends a message, the framework cannot make that message durable by itself. For business-critical events, the usual answer is an application-level outbox or another durable handoff before publishing.
The framework still affects failure handling. Celery retries publishing by default, but with a short default policy; RabbitMQ publisher confirms should be enabled if you need to detect broker-side drops caused by resource limits. Dramatiq has RabbitMQ publisher confirms too, but confirm_delivery=True is opt-in. FastStream and Taskiq use aio_pika.connect_robust for RabbitMQ connections. Repid’s AMQP implementation has reconnect/backoff logic and recreates managed sessions and links after reconnect.
Delays and scheduling #
For delayed work, the important question is where the delay is stored.
Celery’s eta and countdown tasks are fetched by workers immediately and kept in worker memory until they are due. Celery’s docs explicitly warn against using them for distant-future scheduling and recommend database-backed scheduling for longer delays.
Dramatiq stores delayed messages in broker-side delay queues first, then moves them into worker memory until their ETA. Its own guide warns that the broker is not a database and scheduled messages should be a small subset of all messages.
Taskiq has a scheduler, and taskiq-aio-pika supports delayed publishing through either RabbitMQ’s delayed-message exchange plugin or a TTL delay queue.
Repid and FastStream do not provide a built-in scheduler. For delayed work, either use broker-specific features, or a dedicated scheduler like APScheduler or Rocketry.
Shutdown during deploys #
Rolling deploys are a commonly used in production. The practical question is whether the worker stops taking new messages, how long it waits for running work, and what happens after that timeout.
Celery has the most detailed shutdown model: warm, soft, cold, and hard shutdown. TERM starts warm shutdown, and soft shutdown can add a bounded window before cold termination. The docs also call out an ETA-task edge case: if a worker only has reserved ETA tasks, soft shutdown on idle may be needed to reduce task-loss risk.
Dramatiq stops worker threads before consumers so broker heartbeats keep running while current tasks finish. It then stops consumers and requeues in-memory messages during shutdown. The worker shutdown timeout is configurable.
FastStream’s graceful_timeout controls how long the broker waits for already consumed messages before shutdown completes.
Taskiq handles SIGTERM/SIGINT by setting a shutdown event, waits for running tasks with wait_tasks_timeout, then shuts the broker down with shutdown_timeout.
Repid exposes this as graceful_shutdown_time on run_worker. Workers can also expose health-check and AsyncAPI servers, which helps Kubernetes stop routing work before the process exits.
What to monitor #
The production dashboard I would want is mostly the same for all of them: publish failures, reconnects, ready queue depth, reserved or in-worker messages, delayed work, retry counts, dead-letter volume, task duration, and shutdown timeouts.
Celery gives you more of this ecosystem out of the box. Dramatiq has useful middlewares, including Prometheus metrics. FastStream and Repid provide service-style pieces such as health checks and AsyncAPI. With Taskiq, expect to assemble more from broker metrics, middleware, logs, and dashboards.
Closing thoughts #
Repid fits the case where raw I/O throughput, broker coverage, AsyncAPI, and configurable acknowledgements matter most. Those are exactly the problems it was built to solve.
If your mental model is closer to stream processing than classic background jobs, FastStream is probably the more natural fit. If you want the biggest ecosystem and the most examples, Celery is still a solid choice. Treat it as a mature sync-first tool rather than expecting it to feel like a modern async framework.
I’ve tried to keep this comparison as objective as possible, but since Repid is my project, I have one small favor to ask: if it looks useful to you, consider leaving Repid a star on GitHub. It helps more people find the project and helps me stay motivated to keep working on it.