Postgres as a Queue: When You Don't Need Kafka or RabbitMQ
Build a production-grade task queue entirely in PostgreSQL using SELECT FOR UPDATE SKIP LOCKED, LISTEN/NOTIFY, exponential backoff retries, and dead-letter handling. Covers PGMQ, Graphile Worker, River, and when a dedicated broker actually earns its keep.
Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

You Probably Don't Need a Message Broker
Every time a team adds Kafka or RabbitMQ to process background jobs, they're signing up for another cluster to operate, another protocol to learn, and another failure mode to debug at 3 AM. Meanwhile, PostgreSQL -- the database already running their application -- ships with all the primitives needed to build a reliable task queue: row-level locking with SKIP LOCKED, asynchronous notifications with LISTEN/NOTIFY, and advisory locks for coordination. For the vast majority of workloads, a Postgres queue handles 1,000 to 10,000 jobs per second on modest hardware. That covers background emails, webhook deliveries, report generation, image processing pipelines, and most async workflows you'll encounter below the scale of a large social network.
This guide walks through building a production-grade task queue entirely in PostgreSQL, from the jobs table schema to retry logic with exponential backoff, dead-letter handling, and priority scheduling. We'll also cover when Postgres queues break down and a dedicated broker actually earns its keep.
What Is a PostgreSQL Queue?
Definition: A PostgreSQL queue is a pattern where a database table acts as a job queue, using row-level locking (
SELECT FOR UPDATE SKIP LOCKED) to allow multiple workers to dequeue tasks concurrently without conflicts. Unlike dedicated message brokers, the queue lives inside your existing database, sharing its transactional guarantees and operational tooling.
The key insight is that a queue is just a table with a status column and a way to atomically claim rows. PostgreSQL's MVCC model and locking semantics give you exactly-once processing when combined with transactions. You don't need a separate system for this -- you need one SQL statement.
The Core Primitive: SELECT FOR UPDATE SKIP LOCKED
This single SQL clause is what makes Postgres queues viable. Introduced in PostgreSQL 9.5, SKIP LOCKED tells the query to skip any rows that are currently locked by another transaction instead of waiting for them. This turns a regular table into a concurrent work queue.
-- Dequeue a single job atomically
BEGIN;
SELECT id, payload, attempt
FROM jobs
WHERE status = 'pending'
AND scheduled_at <= NOW()
ORDER BY priority DESC, created_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED;
-- Process the job in your application...
-- Mark it done
UPDATE jobs
SET status = 'completed', completed_at = NOW()
WHERE id = ;
COMMIT;
Without SKIP LOCKED, two workers trying to dequeue simultaneously would block each other -- one waits while the other finishes. With SKIP LOCKED, each worker instantly grabs a different unlocked row. No contention, no waiting, no duplicate processing.
Building the Jobs Table
A production jobs table needs more than id and payload. Here's a schema that handles retries, priorities, scheduling, and dead-letter tracking:
CREATE TABLE jobs (
id BIGSERIAL PRIMARY KEY,
queue TEXT NOT NULL DEFAULT 'default',
payload JSONB NOT NULL,
status TEXT NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'running', 'completed', 'failed', 'dead')),
priority SMALLINT NOT NULL DEFAULT 0,
attempt SMALLINT NOT NULL DEFAULT 0,
max_attempts SMALLINT NOT NULL DEFAULT 5,
scheduled_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
failed_at TIMESTAMPTZ,
last_error TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- The critical index: only pending jobs, ordered by priority and age
CREATE INDEX idx_jobs_dequeue
ON jobs (priority DESC, created_at ASC)
WHERE status = 'pending';
-- Index for finding running jobs (heartbeat checks, stuck job recovery)
CREATE INDEX idx_jobs_running
ON jobs (started_at)
WHERE status = 'running';
-- Partition by queue name if you run multiple logical queues
-- CREATE INDEX idx_jobs_queue ON jobs (queue) WHERE status = 'pending';
Pro tip: The partial index on
WHERE status = 'pending'is essential. Without it, the dequeue query scans completed and failed rows too -- and those grow without bound. With the partial index, the planner only touches rows that are actually available for processing.
The Complete Worker Loop
Here's a worker implementation that handles claiming, processing, retrying, and dead-lettering in a single transaction:
-- Worker dequeue function
CREATE OR REPLACE FUNCTION dequeue_job(target_queue TEXT DEFAULT 'default')
RETURNS TABLE (
job_id BIGINT,
job_payload JSONB,
job_attempt SMALLINT
) AS $$
BEGIN
RETURN QUERY
UPDATE jobs
SET status = 'running',
started_at = NOW(),
attempt = attempt + 1
WHERE id = (
SELECT id FROM jobs
WHERE queue = target_queue
AND status = 'pending'
AND scheduled_at <= NOW()
ORDER BY priority DESC, created_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED
)
RETURNING id, payload, attempt;
END;
$$ LANGUAGE plpgsql;
-- Mark job as completed
CREATE OR REPLACE FUNCTION complete_job(job_id BIGINT)
RETURNS VOID AS $$
BEGIN
UPDATE jobs
SET status = 'completed', completed_at = NOW()
WHERE id = job_id;
END;
$$ LANGUAGE plpgsql;
-- Mark job as failed with retry or dead-letter
CREATE OR REPLACE FUNCTION fail_job(job_id BIGINT, error_message TEXT)
RETURNS VOID AS $$
DECLARE
job RECORD;
BEGIN
SELECT attempt, max_attempts INTO job FROM jobs WHERE id = job_id;
IF job.attempt >= job.max_attempts THEN
-- Dead-letter: exhausted all retries
UPDATE jobs
SET status = 'dead',
failed_at = NOW(),
last_error = error_message
WHERE id = job_id;
ELSE
-- Retry with exponential backoff: 2^attempt seconds
UPDATE jobs
SET status = 'pending',
failed_at = NOW(),
last_error = error_message,
scheduled_at = NOW() + (POWER(2, job.attempt) || ' seconds')::INTERVAL
WHERE id = job_id;
END IF;
END;
$$ LANGUAGE plpgsql;
Retry with Exponential Backoff
Naive retries that fire immediately after failure cause thundering herd problems. Exponential backoff spaces retries out progressively:
| Attempt | Delay | Next Retry After |
|---|---|---|
| 1 | 2 seconds | Near-instant recovery |
| 2 | 4 seconds | Brief pause |
| 3 | 8 seconds | Moderate backoff |
| 4 | 16 seconds | Gives downstream time to recover |
| 5 | 32 seconds | Final attempt before dead-letter |
The scheduled_at column is the trick. When a job fails, we set scheduled_at to a future timestamp. The dequeue query's WHERE scheduled_at <= NOW() clause naturally skips jobs that aren't ready for retry yet. No timers, no polling loops -- just a column and a WHERE clause.
Dead-Letter Handling
After exhausting all retry attempts, jobs move to status = 'dead'. These are jobs that need human attention -- a broken API endpoint, a malformed payload, a downstream service that's permanently unavailable. Don't silently discard them.
-- Query dead-letter jobs for investigation
SELECT id, queue, payload, last_error, attempt, created_at
FROM jobs
WHERE status = 'dead'
ORDER BY failed_at DESC
LIMIT 50;
-- Requeue a dead-letter job after fixing the underlying issue
UPDATE jobs
SET status = 'pending',
attempt = 0,
scheduled_at = NOW(),
last_error = NULL,
failed_at = NULL
WHERE id = 12345;
In production, set up an alert when dead-letter count exceeds a threshold. A dashboard showing dead-letter trends over time catches systemic failures before they become incidents.
LISTEN/NOTIFY: Avoiding Polling
Without notifications, workers must poll the jobs table on a loop. Polling every 100ms is wasteful; polling every 5 seconds adds latency. PostgreSQL's LISTEN/NOTIFY mechanism solves this by pushing a notification when new work arrives.
-- Create a trigger that fires on new job insertion
CREATE OR REPLACE FUNCTION notify_new_job()
RETURNS TRIGGER AS $$
BEGIN
PERFORM pg_notify('new_job', NEW.queue);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER jobs_notify
AFTER INSERT ON jobs
FOR EACH ROW
EXECUTE FUNCTION notify_new_job();
// Node.js worker using LISTEN/NOTIFY
const { Client } = require('pg');
const listener = new Client({ connectionString: process.env.DATABASE_URL });
await listener.connect();
await listener.query('LISTEN new_job');
listener.on('notification', async (msg) => {
// New job arrived -- dequeue it
const result = await pool.query('SELECT * FROM dequeue_job($1)', [msg.payload]);
if (result.rows.length > 0) {
await processJob(result.rows[0]);
}
});
// Fallback: poll every 30 seconds in case a notification was missed
setInterval(async () => {
const result = await pool.query('SELECT * FROM dequeue_job($1)', ['default']);
if (result.rows.length > 0) {
await processJob(result.rows[0]);
}
}, 30000);
Watch out:
LISTEN/NOTIFYrequires a persistent connection -- it does not work through connection poolers in transaction mode (PgBouncer, Supavisor). Dedicate a single long-lived connection for listening, and use your pooled connections for dequeue queries. Notifications are also not durable -- if no listener is connected whenNOTIFYfires, the message is lost. That's why the fallback poll is important.
Advisory Locks for Coordination
Sometimes you need to ensure that only one worker processes a specific type of job at a time -- for example, a daily report generator that should never run concurrently. PostgreSQL advisory locks provide this without touching the jobs table.
-- Try to acquire an advisory lock (non-blocking)
SELECT pg_try_advisory_lock(hashtext('daily_report_generator'));
-- Returns true if acquired, false if another worker holds it
-- The lock is released when the session disconnects or you call:
SELECT pg_advisory_unlock(hashtext('daily_report_generator'));
Advisory locks are lightweight and don't create any table rows or WAL entries. They exist only in shared memory. Use them for singleton workers, rate limiting, or preventing duplicate processing of the same logical entity.
Throughput: What to Expect
Benchmarks depend heavily on hardware, payload size, and processing time, but here are realistic numbers for a single PostgreSQL instance:
| Configuration | Enqueue/sec | Dequeue/sec | Notes |
|---|---|---|---|
| Single worker, 4-core VM | 5,000 | 1,000-2,000 | Bottleneck is single-threaded dequeue |
| 4 workers, 4-core VM | 5,000 | 3,000-5,000 | Near-linear scaling with workers |
| 8 workers, 8-core VM | 10,000 | 6,000-10,000 | Diminishing returns above core count |
| Batch dequeue (10 at a time) | 10,000 | 10,000-15,000 | Reduces round trips significantly |
Pro tip: Dequeue in batches to reduce round trips. Change
LIMIT 1toLIMIT 10in the dequeue query and process multiple jobs per transaction. This alone can double throughput. Keep batch sizes reasonable (10-50) to avoid holding locks too long.
Libraries and Extensions
You don't have to build from scratch. Several battle-tested tools implement Postgres queues with more features than a hand-rolled solution.
| Tool | Language | Key Feature | Maturity |
|---|---|---|---|
| PGMQ | PostgreSQL extension | SQL-native API, visibility timeout, VT extension | Production-ready (Tembo) |
| Graphile Worker | Node.js / TypeScript | Cron jobs, batch processing, plugin system | Production-ready |
| River | Go | Type-safe jobs, periodic scheduling, unique jobs | Production-ready |
| Oban | Elixir | Telemetry, pruning, web UI (Oban Pro) | Production-ready |
| good_job | Ruby | ActiveJob backend, concurrency control | Production-ready |
| Procrastinate | Python | Django integration, async support | Stable |
PGMQ: The Extension Approach
PGMQ is a PostgreSQL extension developed by Tembo that provides a clean SQL API modeled after AWS SQS. It handles visibility timeouts, message archival, and metrics out of the box.
-- Create a queue
SELECT pgmq.create('email_notifications');
-- Enqueue a message
SELECT pgmq.send('email_notifications',
'{"to": "user@example.com", "template": "welcome"}'::JSONB
);
-- Dequeue with a 30-second visibility timeout
SELECT * FROM pgmq.read('email_notifications', 30, 1);
-- Delete after successful processing
SELECT pgmq.delete('email_notifications', msg_id);
-- Archive instead of delete (for audit trails)
SELECT pgmq.archive('email_notifications', msg_id);
Graphile Worker: Node.js Integration
Graphile Worker is the strongest option for Node.js and TypeScript applications. It handles job scheduling, retries, cron tasks, and batch processing with minimal configuration.
// Define a task
import { run, Task } from 'graphile-worker';
const sendEmail: Task = async (payload, helpers) => {
const { to, template } = payload as { to: string; template: string };
await emailService.send(to, template);
helpers.logger.info(`Email sent to ${to}`);
};
// Start the worker
await run({
connectionString: process.env.DATABASE_URL,
taskList: { sendEmail },
concurrency: 10,
});
// Enqueue from your application
import { makeWorkerUtils } from 'graphile-worker';
const utils = await makeWorkerUtils({
connectionString: process.env.DATABASE_URL,
});
await utils.addJob('sendEmail', {
to: 'user@example.com',
template: 'welcome',
}, {
maxAttempts: 5,
runAt: new Date(Date.now() + 60_000), // delay 1 minute
});
Postgres Queue vs. Kafka vs. RabbitMQ
The choice isn't binary -- each tool optimizes for different things. Here's an honest comparison:
| Capability | Postgres Queue | RabbitMQ | Kafka |
|---|---|---|---|
| Throughput ceiling | 1K-10K/sec | 10K-50K/sec | 100K-1M+/sec |
| Message replay | Manual (keep completed rows) | No (consumed = gone) | Yes (log retention) |
| Fan-out (pub/sub) | LISTEN/NOTIFY (ephemeral) | Exchanges + bindings | Consumer groups |
| Ordering guarantees | Per-queue (via ORDER BY) | Per-queue | Per-partition |
| Delivery semantics | Exactly-once (transactional) | At-least-once | At-least-once (exactly-once with EOS) |
| Operational overhead | None (your existing DB) | Moderate (Erlang cluster) | High (ZooKeeper/KRaft + brokers) |
| Transactional enqueue | Native (same transaction as your data) | Requires outbox pattern | Requires outbox pattern |
| Best for | Background jobs, task queues | Routing, RPC, work queues | Event streaming, log aggregation |
The biggest advantage of a Postgres queue isn't performance -- it's transactional enqueue. When you insert an order and enqueue a "send confirmation email" job in the same transaction, either both happen or neither does. With an external broker, you need the outbox pattern or accept the risk of an order being saved without the email job being enqueued.
When Postgres Queues Break Down
Postgres is not the right queue for every scenario. Here's when to reach for a dedicated system:
- Fan-out / pub-sub patterns -- If multiple consumers need to independently process the same message, you need a real pub/sub system.
LISTEN/NOTIFYis ephemeral (no persistence) andSKIP LOCKEDgives each message to exactly one consumer. RabbitMQ fanout exchanges or Kafka consumer groups handle this natively. - Sustained throughput above 10K jobs/sec -- PostgreSQL can burst higher, but sustained high throughput creates WAL pressure, vacuum load, and index bloat on the jobs table. Kafka handles millions of messages per second because it's an append-only log with sequential I/O.
- Message replay -- If consumers need to reprocess historical messages (rebuilding a projection, replaying events), Kafka's log retention model is purpose-built for this. Keeping completed rows in a Postgres jobs table works but doesn't scale to billions of messages.
- Cross-service communication without a shared database -- If services own their own databases (as they should in a microservice architecture), a shared jobs table creates coupling. An external broker is the right integration point between independently deployed services.
- Complex routing logic -- RabbitMQ's exchange types (topic, headers, direct, fanout) provide routing capabilities that would require application-level logic with a Postgres queue.
Production Checklist
If you're deploying a Postgres queue to production, walk through this list:
- Partial index on pending jobs -- Without it, dequeue scans the entire table including completed rows.
- Heartbeat for running jobs -- Workers crash. Add a
heartbeat_atcolumn and a reaper process that requeues jobs with stale heartbeats (e.g., older than 5 minutes). - Archival strategy -- Move completed and dead jobs to a separate archive table or delete them on a schedule. A jobs table with millions of completed rows hurts vacuum performance.
- Connection management -- Dedicate a connection for
LISTEN. Use pooled connections for dequeue. Don't mix the two. - Monitoring -- Track queue depth (pending count), processing latency (time from creation to completion), failure rate, and dead-letter count.
- Graceful shutdown -- On SIGTERM, finish the current job before exiting. Don't abandon running jobs.
- Batch dequeue -- Process multiple jobs per transaction to reduce round trips when throughput matters.
Frequently Asked Questions
Can PostgreSQL really replace Kafka or RabbitMQ?
For background job processing and task queues, yes. PostgreSQL handles 1,000 to 10,000 jobs per second reliably, which covers the majority of web application workloads. Where PostgreSQL falls short is pub/sub fan-out, message replay, sustained throughput above 10K/sec, and cross-service messaging in microservice architectures. If your use case is "process this thing in the background," Postgres is the right default.
What is SELECT FOR UPDATE SKIP LOCKED?
SELECT FOR UPDATE SKIP LOCKED is a PostgreSQL clause that locks selected rows for the duration of the transaction while silently skipping any rows already locked by other transactions. This prevents multiple workers from claiming the same job. Without SKIP LOCKED, workers would block each other waiting for locks. It was introduced in PostgreSQL 9.5 and is the foundation of every Postgres queue implementation.
How do I handle failed jobs in a Postgres queue?
Track the attempt count and a max_attempts limit on each job. When a job fails, increment the attempt counter and reschedule it with exponential backoff (scheduled_at = NOW() + 2^attempt seconds). Once attempts are exhausted, move the job to a "dead" status for manual investigation. Never silently discard failed jobs -- dead-letter tracking is how you catch systemic failures.
What throughput can I expect from a Postgres queue?
On a single PostgreSQL instance with 4-8 cores, expect 1,000 to 10,000 dequeues per second depending on worker count and batch size. Batch dequeuing (10-50 jobs per transaction) roughly doubles throughput. This is sufficient for the vast majority of applications. If you need sustained throughput above 10K/sec, consider Kafka or a dedicated queue.
Should I use LISTEN/NOTIFY or polling?
Use both. LISTEN/NOTIFY provides near-instant wake-up when new jobs arrive, eliminating the latency of polling intervals. But notifications are ephemeral -- if no listener is connected, they're lost. Add a fallback poll (every 15-30 seconds) to catch any missed notifications. Keep the LISTEN connection separate from your pooled connections since it requires a persistent session.
What is PGMQ and should I use it?
PGMQ is a PostgreSQL extension by Tembo that provides an SQS-like SQL API for queue operations. It handles visibility timeouts, message archival, and queue metrics out of the box. Use it if you want a proven implementation without writing the queue machinery yourself. If you need deep customization (custom retry logic, priority schemes, multi-tenant queues), building on raw SKIP LOCKED gives you more control.
How does Graphile Worker compare to building my own queue?
Graphile Worker provides a complete job processing framework for Node.js: cron scheduling, batch processing, job deduplication, and a plugin system. It uses SKIP LOCKED under the hood but handles all the edge cases -- stuck job recovery, graceful shutdown, migration management. For Node.js applications, Graphile Worker is almost always better than a hand-rolled queue. You get months of battle-tested edge case handling for free.
Written by
Abhishek Patel
Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.
Related Articles
ClickHouse vs PostgreSQL for Analytics: When to Add an OLAP Database
Benchmark identical analytical queries on ClickHouse and PostgreSQL at 1M to 1B rows. Learn why columnar storage delivers 10-100x speedups, how to sync data with Debezium CDC, and when PostgreSQL is enough.
12 min read
SecuritySQL Injection in 2026: Still a Problem, Here's How to Stop It
SQL injection remains a top vulnerability. Learn how SQLi works, why ORMs are not enough, and how to prevent it with parameterized queries and defense in depth.
9 min read
DatabasesPostgreSQL vs MongoDB (2026): Which One Should You Choose?
A practical comparison of PostgreSQL and MongoDB covering performance benchmarks, schema design, scaling, pricing, and real-world use cases to help you choose the right database.
8 min read
Enjoyed this article?
Get more like this in your inbox. No spam, unsubscribe anytime.