Choosing a Python task queue library in 2026

Table of Contents

Python’s task queue ecosystem is in a pretty good place right now. There are more serious options than there used to be, and as async web frameworks have become common, background jobs are the next thing people usually need to figure out.

This post compares the Python task queue libraries worth considering in 2026: Celery, Dramatiq, FastStream, Taskiq, and Repid. The comparison covers broker support, async behavior, benchmark results, and the places where they differ.

Small disclosure before we get into it: Repid is my project. Take that bias for what it is; the goal is still a useful comparison.

TL;DR #

If you only want a shortlist, I would start here:

If you need…	Start with…
Mature, biggest ecosystem, but can be slower	Celery
Stream-processing-style message handling	FastStream
High-throughput, asyncio-native, ease of integration	Repid

Task queues: what problem do they solve? #

Let’s zoom out for a moment. What is the use case?

A common one is API latency. Your request handler is fast enough until it has to wait on something slow: an LLM call, a long database query, an email provider, payment processing, or any other work that does not have to finish before the user gets a response.

Animated diagram showing a web request blocked by a slow external API call

So there is work that can move out of the request path. This is where a task queue helps.

The queue decouples incoming API traffic from the slow part. The API no longer has to do the work immediately. It can schedule a task, return quickly, and let a separate worker deal with the rest.

Animated diagram showing a web app enqueueing work and returning quickly while a worker calls the slow API

If you want to learn more about various queue types, I recommend checking out a beautiful article by Sam Rose.

Task queues under load #

Great, so the API is fast again. Problem solved? Not quite.

Under load, the worker implementation starts to matter a lot. Here is a synchronous worker model, roughly what you get with Celery or Dramatiq in their usual form:

Animated diagram showing a synchronous process pool becoming saturated while messages pile up in the queue

The worker runs a set of processes that perform the tasks. Once those processes are busy, messages start piling up in the queue. The longer each task takes, the more workers and processes you need to keep up.

The relationship is direct: higher latency means you need more parallel workers. But those tasks were moved out of the API exactly because they were slow. If most of the time is spent waiting on I/O, processing each task synchronously is usually the wrong tradeoff.

Asyncio helps in that specific case. A worker can start a slow operation, await it, and do other work while it waits. Parallelism is no longer tied directly to the number of OS processes. For I/O-heavy jobs, a single core can keep thousands of operations in flight.

Animated diagram showing one async worker keeping multiple external API calls in flight

CPU-bound work #

CPU-bound work is different. If the task spends most of its time executing some computation, it’s not like asyncio can magically create more CPU. Async-first libraries still need process pools for this kind of workload.

There can still be a small benefit: an async dispatcher can await process-pool work without blocking its own loop. It won’t be visible in synthetic benchmarks, however it can be key in mixed workloads. But the main benefit of async libraries is handling many I/O-bound tasks concurrently.

Animated diagram showing an async dispatcher feeding CPU-heavy work to a process pool

Contenders #

The comparison uses Python libraries that support multiple brokers, on their latest versions at the time of writing:

Celery v5.6.3
Dramatiq v2.1.0
FastStream v0.6.7
Taskiq v0.12.4
Repid v2.1.2

Smaller single-broker libraries such as huey and arq are out of scope here. So are durable execution tools such as Temporal and DBOS. They can solve some of the same problems, but they are different categories of tool. This post is about general-purpose task queue libraries.

What the code looks like #

Before looking at throughput, it is worth looking at the overall framework shape. These examples are intentionally small and use RabbitMQ. I tested them against RabbitMQ 4.3.

Celery #

# celery_app.py
from celery import Celery

app = Celery(
    "celery_app", 
    broker="amqp://guest:guest@localhost:5672//",
)


@app.task
def send_email(user_id: int) -> None:
    print(f"send email to {user_id}")

# producer.py
from celery_app import send_email

send_email.delay(123)

For a local single-process smoke test, run celery -A celery_app worker --pool=solo --loglevel=info.

Celery’s standard worker control path uses RabbitMQ features that are now deprecated and disabled by default, so in my local RabbitMQ container I had to set deprecated_features.permit.transient_nonexcl_queues = true.

Dramatiq #

# dramatiq_app.py
import dramatiq
from dramatiq.brokers.rabbitmq import RabbitmqBroker

broker = RabbitmqBroker(url="amqp://guest:guest@localhost:5672/")
dramatiq.set_broker(broker)


@dramatiq.actor(queue_name="send-email")
def send_email(user_id: int) -> None:
    print(f"send email to {user_id}")

# producer.py
from dramatiq_app import send_email

send_email.send(123)

Run the worker with dramatiq dramatiq_app --processes 1 --threads 1.

FastStream #

# faststream_app.py
from faststream import FastStream
from faststream.rabbit import RabbitBroker
from pydantic import BaseModel

broker = RabbitBroker("amqp://guest:guest@localhost:5672/")
app = FastStream(broker)


class SendEmail(BaseModel):
    user_id: int


@broker.subscriber("send-email")
async def send_email(message: SendEmail) -> None:
    print(f"send email to {message.user_id}")

# producer.py
import asyncio

from faststream_app import broker


async def main() -> None:
    async with broker:
        await broker.publish({"user_id": 123}, "send-email")


asyncio.run(main())

Run the worker with faststream run faststream_app:app. The CLI needs the faststream[cli] extra.

Taskiq #

# taskiq_app.py
from taskiq_aio_pika import AioPikaBroker

broker = AioPikaBroker("amqp://guest:guest@localhost:5672/")


@broker.task
async def send_email(user_id: int) -> None:
    print(f"send email to {user_id}")

# producer.py
import asyncio

from taskiq_app import broker, send_email


async def main() -> None:
    await broker.startup()
    await send_email.kiq(123)
    await broker.shutdown()


asyncio.run(main())

Run the worker with taskiq worker taskiq_app:broker --workers 1 --max-async-tasks 1.

Repid #

# repid_app.py
from repid import AmqpServer, Repid, Router

app = Repid()
app.servers.register_server(
    "default",
    AmqpServer("amqp://guest:guest@localhost:5672"),
    is_default=True,
)

router = Router()


@router.actor(channel="send-email")
async def send_email(user_id: int) -> None:
    print(f"send email to {user_id}")


app.include_router(router)

# producer.py
import asyncio

from repid_app import app


async def main() -> None:
    async with app.servers.default.connection():
        await app.send_message_json(
            channel="send-email",
            payload={"user_id": 123},
            headers={"topic": "send_email"},
        )


asyncio.run(main())

# worker.py
import asyncio

from repid_app import app


async def main() -> None:
    async with app.servers.default.connection():
        await app.run_worker()


asyncio.run(main())

Run the worker with python worker.py.

Repid uses RabbitMQ through AMQP 1.0, compared to others using AMQP 0.9.1. It also doesn’t try to hide topology creation and expects it to be created by the user.

Benchmarks #

All benchmarks were run against RabbitMQ 4.3 on the same machine with an 8-core/16-thread processor.

The benchmark code is available at github.com/aleksul/repid-benchmarks.

Sidenote: I also measured latency, but it did not reveal anything interesting - in these runs it mostly followed task duration. We will focus on throughput here.

To test throughput in I/O-bound scenarios, let’s model a few cases with sleeps of different lengths:

0.01 sec - similar to a cache retrieval
0.1 sec - similar to a database call
0.5 sec and 1 sec - similar to an API call
5 sec - similar to an LLM response

I/O-bound, high-concurrency benchmark #

In this run, concurrency is not capped. Each connection can process and prefetch as many tasks as it can handle.

Benchmark chart, showing throughput of each library at high concurrency.

There is a noticeable drop-off and more variance at 0.01 seconds. The likely cause is that individual processes overfetch work even though the CPU is already maxed out, so the whole system becomes less efficient.

One caveat matters here: Celery and Dramatiq are synchronous frameworks, so this benchmark uses gevent (also known as “green threads”) to make them useful for I/O-bound workloads. It gives them a big throughput boost. It also relies on monkey patching, so your milage may vary when it comes to using it in a production app.

I/O-bound, limited-concurrency benchmark #

Next, let’s cap concurrency at 2000 tasks and add the non-green-thread versions of Celery and Dramatiq.

Benchmark chart, showing throughput of each library at 2000 concurrency.

The theoretical maximum is 8 processes * 2000 concurrency * (1 / t sec).

The non-green-thread variants, marked as no-GT, are barely visible on this graph. The longer the task takes, the worse they look.

The asyncio libraries also beat the green-thread variants in this setup and reach higher CPU utilization. That usually means better vertical scaling before you have to add more machines.

The concurrency limit helped too. There was less contention between processes, the load spread more evenly, and the short-sleep cases had less variance. Throughput improved overall.

I/O-bound, steady-rate benchmark #

This benchmark looks at a running system without mixing in warmup and end-of-benchmark drain.

Benchmark chart, showing throughput of each library in a “steady” scenario.

The shape is mostly the same as in the previous benchmarks, which helps to confirm prior results. Warmup and drain do not seem to be hiding a completely different story. The async-first libraries keep their advantage on shorter I/O-heavy work. At the longest task duration, most of the non-Celery results bunch together because of the concurrency limit.

CPU-bound benchmark #

For the CPU-bound benchmark, each task runs a SHA-256 hash for enough iterations to hit the target time.

Benchmark chart, showing throughput of each library in a cpu-bound scenario.

For CPU-bound throughput, the differences are much smaller. That is expected: once the CPU is saturated, the library has less room to matter.

Interpreting the benchmarks #

Repid is the fastest in these benchmarks. FastStream and Taskiq also do well because they are asyncio-native. Dramatiq is much faster than Celery in the I/O-heavy cases, but that mostly shows up when both are using gevent. In their standard sync form, both leave a lot of I/O performance on the table.

Feature comparison #

Broker support #

Framework/Broker	RabbitMQ	Redis	NATS	Amazon SQS	GCP Pub/Sub	Kafka
Celery	✅	✅	❌	✅	✅	✅
Dramatiq	✅	✅	❌	🟠	❌	❌
FastStream	✅	✅	✅	❌	❌	✅
Taskiq	✅	✅	✅	🟠	❌	🟠
Repid	✅	✅	✅	✅	✅	✅

✅ - supported

🟠 - community-supported

❌ - not supported

There are much more brokers than this table can fit, the list is not exhaustive. For example, FastStream supports MQTT, Taskiq supports ZeroMQ, and if you want to build on top of Postgres, you might want to try out something like PgQueuer/Procrastinate/etc.

Async and sync support #

Repid, FastStream, and Taskiq are async-first, but they can still execute sync code. Of the three, Repid lets you override the ThreadPool or ProcessPool per actor. Taskiq supports this per worker.

Celery and Dramatiq are sync-first. They can run async actors, but it usually requires workarounds or does not help much because the surrounding worker code is still sync.

Dependency injection #

Repid and FastStream provide dependency injection natively. Taskiq has an extension for it. Celery and Dramatiq do not provide dependency injection.

Result backend #

Taskiq, Celery, and Dramatiq provide result backends. Repid and FastStream leave result storage to the user.

AsyncAPI #

Repid and FastStream provide native AsyncAPI integration. The others do not support it.

FastStream also pioneered try-it-out feature in AsyncAPI.

Parsing #

Dramatiq requires its own JSON format. Celery and Taskiq require their own message format, but let you override the parsers. FastStream and Repid do not require a specific format.

Repid, FastStream, and Taskiq parse the message body into arguments and support Pydantic validation. Celery and Dramatiq pass arguments as defined in the sent message.

Repid can parse headers into actor arguments, and optionally validate with Pydantic.

Ack modes #

Celery is often criticized for acknowledging tasks before execution by default. You can configure late acknowledgements, but the default still surprises people. All others can have different acknowledgement behavior depending on the broker, but generally provide sane defaults. Notably, FastStream and Repid abstract this across brokers and expose acknowledgement policies.

Retries #

At the API level, Celery provides the most retry machinery out of the box. Dramatiq and Taskiq provide some bootstrapping, but the actual retry logic should be written through middleware. Repid and FastStream keep retry policy closer to broker behavior or application code.

That is not inherently worse. Brokers often have their own retry tools, and sometimes those are the right place to define the behavior. RabbitMQ has dead-letter exchanges and delayed-message patterns. GCP Pub/Sub lets you configure retry policy per subscription. If your retry strategy depends heavily on the broker, a library-level abstraction can either help or get in the way.

Production behavior #

The details below are RabbitMQ-oriented, because the examples and benchmarks above use RabbitMQ, and I just feel that it’s the most common choice. Kafka offsets, SQS visibility timeouts, and Pub/Sub ack deadlines can have their own quirks, but the same questions still matter.

Broker outage and publishing #

None of these libraries is a durable local outbox. If RabbitMQ is unavailable when a producer sends a message, the framework cannot make that message durable by itself. For business-critical events, the usual answer is an application-level outbox or another durable handoff before publishing.

The framework still affects failure handling. Celery retries publishing by default, but with a short default policy; RabbitMQ publisher confirms should be enabled if you need to detect broker-side drops caused by resource limits. Dramatiq has RabbitMQ publisher confirms too, but confirm_delivery=True is opt-in. FastStream and Taskiq use aio_pika.connect_robust for RabbitMQ connections. Repid’s AMQP implementation has reconnect/backoff logic and recreates managed sessions and links after reconnect.

Delays and scheduling #

For delayed work, the important question is where the delay is stored.

Celery’s eta and countdown tasks are fetched by workers immediately and kept in worker memory until they are due. Celery’s docs explicitly warn against using them for distant-future scheduling and recommend database-backed scheduling for longer delays.

Dramatiq stores delayed messages in broker-side delay queues first, then moves them into worker memory until their ETA. Its own guide warns that the broker is not a database and scheduled messages should be a small subset of all messages.

Taskiq has a scheduler, and taskiq-aio-pika supports delayed publishing through either RabbitMQ’s delayed-message exchange plugin or a TTL delay queue.

Repid and FastStream do not provide a built-in scheduler. For delayed work, either use broker-specific features, or a dedicated scheduler like APScheduler or Rocketry.

Shutdown during deploys #

Rolling deploys are a commonly used in production. The practical question is whether the worker stops taking new messages, how long it waits for running work, and what happens after that timeout.

Celery has the most detailed shutdown model: warm, soft, cold, and hard shutdown. TERM starts warm shutdown, and soft shutdown can add a bounded window before cold termination. The docs also call out an ETA-task edge case: if a worker only has reserved ETA tasks, soft shutdown on idle may be needed to reduce task-loss risk.

Dramatiq stops worker threads before consumers so broker heartbeats keep running while current tasks finish. It then stops consumers and requeues in-memory messages during shutdown. The worker shutdown timeout is configurable.

FastStream’s graceful_timeout controls how long the broker waits for already consumed messages before shutdown completes.

Taskiq handles SIGTERM/SIGINT by setting a shutdown event, waits for running tasks with wait_tasks_timeout, then shuts the broker down with shutdown_timeout.

Repid exposes this as graceful_shutdown_time on run_worker.

What to monitor #

The production dashboard I would want is mostly the same for all of them: publish failures, reconnects, ready queue depth, reserved or in-worker messages, delayed work, retry counts, dead-letter volume, task duration, and shutdown timeouts.

Celery gives you more the most mature ecosystem - more often than not, other SDKs provide integration examples. Celery also has Flower - a companion dashboard for monitoring.

FastStream and Repid have builtin healthcheck endpoints and AsyncAPI serving.

Dramatiq, FastStream and Taskiq have bundled Prometheus exporter middlewares.

Last but not least, all frameworks support OpenTelemetry. Repid provides a cookbook example on how to do it yourself, others provide either a middleware as part of the library or as a separate package.

Closing thoughts #

Repid fits the case where raw I/O throughput, broker coverage, AsyncAPI, and configurable acknowledgements matter most. Those are exactly the problems it was built to solve.

If your mental model is closer to stream processing than classic background jobs, FastStream is probably the more natural fit. If you want the biggest ecosystem and the most examples, Celery is still a solid choice. Treat it as a mature sync-first tool rather than expecting it to feel like a modern async framework.

I’ve tried to keep this comparison as objective as possible, but since Repid is my project, I have one small favor to ask: if it looks useful to you, consider leaving Repid a star on GitHub. It helps more people find the project and helps me stay motivated to keep working on it.