I wasn't planning to write about payments. I was reading about Solid Queue — the database-backed job queue that ships as the default in Rails 8 — and it struck me as an almost perfect lens for a handful of JavaScript fundamentals that are easy to say you know and easy to get wrong. The whole pattern's correctness rests on timing, and in a worker that moves money, timing mistakes don't throw exceptions — they charge your customers twice.
So this is a tour of those fundamentals, built around one concrete, expensive bug. We'll cover, in order:
await doesn't mean
what a naive worker assumes;Solid Queue is the background-job system
37signals built for Basecamp and HEY, and that Rails 8 adopted as its default,
replacing Redis. Its premise is almost cheeky: you don't need a message broker at
all. Your database is the queue. Jobs are just rows in a table; workers poll
that table, claim a batch, run it, and mark it done. The lock that keeps two
workers off the same job is one SQL feature — FOR UPDATE SKIP LOCKED.
People reach for this on purpose, and for good reasons: no extra infrastructure to deploy and babysit, and — because the queue lives in the same database as your business data — you can enqueue a job in the same transaction that writes the data, with no "dual write" that can leave the two out of sync.
The traditional alternative is a dedicated broker — BullMQ on Redis, RabbitMQ, SQS, Kafka — which delivers each job to a worker and handles concurrency, locking and retries for you. That's the crucial difference, and the reason this article exists: a broker hands you delivery semantics; a database does not. The moment you choose "database as a queue," you inherit the two things the broker used to do for free — making sure a job runs once, and making sure two workers never grab the same one.
One clarification before we go further, because it matters: Solid Queue itself
is Ruby — it's a Rails library, you won't npm install it. But the pattern it
popularized isn't Rails-specific; it's just SQL. Node has its own production-grade
libraries built on the exact same FOR UPDATE SKIP LOCKED idea —
pg-boss and
Graphile Worker, both Postgres-backed job
queues — and if you want a broker instead, BullMQ
is the Redis one. In real life you'd reach for one of those. Here we deliberately
build the pattern by hand in NestJS and Drizzle, because the whole point is to
see the two correctness problems a library would otherwise hide for you — and
exactly where a misunderstanding of the event loop turns them into double charges.
Because in Node, whether you get those two things right comes down almost entirely to how well you understand the event loop. So let's start there.
Both bugs below are timing bugs, so it pays to get three words straight first:
macrotask, microtask, and process.nextTick.
Node runs your code in a loop. Each turn it runs a chunk of JavaScript, then —
before moving on — drains two queues in a fixed order: first everything queued
with process.nextTick, then the microtask queue (callbacks from
Promise.then and the continuations that resume after an await). Only then
does it pick up the next macrotask — a setTimeout / setInterval callback
or an I/O event.
console.log('1: sync');
setTimeout(() => console.log('5: timer — macrotask'), 0);
Promise.resolve().then(() => console.log('4: promise — microtask'));
process.nextTick(() => console.log('3: nextTick'));
console.log('2: sync');
It runs top to bottom only for the two plain console.logs; everything else is
deferred and comes back in queue order. The console prints, one line at a time:
1: sync
2: sync
3: nextTick
4: promise — microtask
5: timer — macrotask
Read it top to bottom: the order things run (1, 2, 3, 4, 5) is nothing like the
order you wrote them (1, 5, 4, 3, 2). Sync code first, then the process.nextTick
queue, then microtasks (promises / await), and — dead last — the timer.
There's a call stack that runs your synchronous code right now, and three queues that hold deferred work. The event loop only looks at them when the stack is empty, and it drains them in a strict priority:
%%{init: {'theme':'dark'}}%%
flowchart TD
Stack["Call Stack — runs sync code now"]
Stack -. "schedule work" .-> Q
subgraph Q["Deferred work, drained in this priority"]
direction TB
NTQ["1 · process.nextTick queue"]
MQ["2 · microtask queue (Promise.then / await)"]
MacroQ["3 · macrotask queue (setTimeout / setInterval / I/O)"]
NTQ --> MQ --> MacroQ
end
Q -. "event loop (stack empty)" .-> Stack
Every turn the loop empties the stack, then drains the whole nextTick queue,
then the whole microtask queue (re-checking nextTick in between, because
microtasks can schedule more), and only then runs exactly one macrotask —
after which it drains the micro-queues again. One timer per turn; all the promises
in between.
Flat examples are easy. Real code nests — a setTimeout that resolves a promise,
a promise callback that schedules a nextTick. Now the order is not obvious. Try
to predict this before reading on:
console.log('A');
setTimeout(() => {
console.log('B');
Promise.resolve().then(() => console.log('C')); // microtask, born inside a macrotask
}, 0);
Promise.resolve().then(() => {
console.log('D');
setTimeout(() => console.log('E'), 0); // macrotask, born inside a microtask
process.nextTick(() => console.log('F')); // nextTick, born inside a microtask
});
process.nextTick(() => console.log('G'));
console.log('H');
Node prints:
A
H
G
D
F
B
C
E
Queue by queue:
%%{init: {'theme':'dark'}}%%
flowchart LR
A["A · sync"] --> H["H · sync"]
H --> G["G · nextTick"]
G --> D["D · microtask"]
D --> F["F · nextTick (born inside D)"]
F --> B["B · macrotask 1"]
B --> C["C · microtask (born inside B)"]
C --> E["E · macrotask 2"]
A/H are synchronous. Before any timer, Node drains nextTick (G) then
microtasks (D) — and because D itself queued a nextTick (F), that F also
runs before any macrotask. Only now does the first timer fire (B), whose promise
(C) is drained before the second timer (E). That D → F → B ordering — a
macrotask stuck waiting behind a microtask's nextTick — is precisely the kind of
thing that's invisible in review and obvious only once it has double-charged
someone.
Here's the one consequence that matters: await does not pause a timer. When
your interval callback hits await gateway.charge(), it suspends and queues its
continuation as a microtask — but the interval is a macrotask on its own clock.
The next tick fires on schedule whether or not the previous one finished
awaiting:
%%{init: {'theme':'dark'}}%%
sequenceDiagram
participant EL as Event loop timer
participant A as Interval tick 1
participant B as Interval tick 2
EL->>A: fire at t=0
Note over A: await charge() suspends, continuation parked as a microtask
EL->>B: fire at t=1s, timer does not wait for tick 1
Note over B: selects the SAME pending rows
Note over A,B: both call charge() on one payment, so it is charged twice
So a setInterval / @Interval whose body awaits anything slow is, by
construction, allowed to run on top of itself. Now watch what that does to a
payment worker.
Here's the worker, written the idiomatic NestJS way — a scheduled method from
@nestjs/schedule. It uses an ORM, it's typed, it passes code review, and it
double-charges customers in production:
@Injectable()
export class PaymentWorker {
// @nestjs/schedule — the idiomatic way to poll on a timer in Nest.
@Interval(1000)
async processPending() {
const jobs = await this.db
.select()
.from(payments)
.where(eq(payments.status, 'pending'));
for (const job of jobs) {
await this.gateway.charge(job);
await this.db
.update(payments)
.set({ status: 'done' })
.where(eq(payments.id, job.id));
}
}
}
Nothing about it screams "bug" — and that's the point. The failure isn't in the
part you're looking at. It's in when @Interval fires (the timer we just
watched overlap itself) and in the gap between the select and the update.
(@Cron('* * * * * *') and a raw setInterval(async () => …, 1000) are the same
thing in different clothes.) This isn't a story about careless people — it's about
the traps that catch the ones who chose "database as a queue" on purpose and
wrote the most natural code for it. There are exactly two, and the lifecycle below
shows where each one strikes.
Every payment is one row whose status moves through a small state machine. The
label on each arrow is the function that drives the transition:
%%{init: {'theme':'dark'}}%%
stateDiagram-v2
[*] --> pending: POST /payments — insertPending()
pending --> processing: claimBatch() — FOR UPDATE SKIP LOCKED
processing --> done: charge() ok — markDone()
processing --> pending: charge() fails — scheduleRetry() (expBackoff)
processing --> failed: attempts exhausted — markFailed()
done --> [*]
failed --> [*]
Keep this picture in mind — the rest of the article is just making each arrow correct under concurrency.
First, a step back: why is there a worker at all? Why not charge the customer
directly when they hit POST /checkout?
Because the payment gateway is slow and unreliable. It takes seconds, it times
out, it fails intermittently, and your own process can crash mid-call. If you
await gateway.charge() inside the request, a timeout can fire after the
charge already went through, and you have no idea whether you charged or not.
There's no safe place to retry, and a burst of checkouts becomes a burst of
gateway calls.
So you split intent from execution — the outbox pattern. The request only
persists a pending row, atomically, and returns:
@Post()
@HttpCode(202) // accepted: queued, not charged yet
async create(@Body() dto: CreatePaymentDto): Promise<PaymentResponseDto> {
const payment = await this.service.createPayment(dto);
return PaymentResponseDto.fromEntity(payment);
}
Through the layers, the write path is short and fully synchronous — no gateway call in sight:
%%{init: {'theme':'dark'}}%%
sequenceDiagram
autonumber
actor Client
participant Ctrl as PaymentsController
participant Svc as PaymentsService
participant Repo as PaymentsRepository
participant DB as Postgres
Client->>Ctrl: POST /payments
Ctrl->>Svc: createPayment(dto)
Svc->>Repo: findByIdempotencyKey(key)
Repo->>DB: SELECT by idempotency_key
alt key already used
DB-->>Svc: existing payment (no duplicate)
else new payment
Svc->>Repo: insertPending(data)
Repo->>DB: INSERT status = pending
end
Ctrl-->>Client: 202 Accepted (queued, not charged)
If that commit succeeds, the charge will happen — reliably, out of band, by the worker. Which brings us back to the worker, and its two bugs.
@Interval(1000) — like the setInterval it wraps — fires processPending
every second regardless of whether the previous run finished. (@Cron is no
better: it fires on the schedule, not when your handler is done.) Timers are
macrotasks; the awaits inside your handler schedule microtasks — none of that
holds back the next tick.
So if a batch takes longer than the interval — and charging a batch of payments,
each a network round-trip to the gateway, easily blows past one second — the next
tick fires while the previous one is still running. Now two runs are processing
the same pending rows at the same time. Each calls charge(). The customer
pays twice.
The fix is to never schedule the next run until the current one has finished —
setTimeout recursion instead of setInterval:
private async tick(): Promise<void> {
if (this.stopping || this.running || !this.task) return;
this.running = true;
const startedAt = Date.now();
this.inFlight = (async () => {
try {
await this.task!();
} catch (err) {
this.logger.error('scheduled task failed', err as Error);
}
})();
await this.inFlight;
this.running = false;
const elapsed = Date.now() - startedAt;
// schedule the NEXT run only after this one settled, minus elapsed time so
// the cadence doesn't drift.
this.scheduleNext(Math.max(0, this.intervalMs - elapsed));
}
No overlap, no drift. But that alone is not enough — because the moment you run more than one worker instance (and you will, for throughput and availability), they'll still grab the same rows. The real fix is in the claim.
select then update is not atomicReading the pending rows and then marking them processing are two separate
statements. Between them, another worker runs the same select and gets the same
rows. The ORM happily handed you two statements; it never promised they'd be
atomic.
What you want is to claim rows: select and lock them in a single statement,
skipping any row another worker already holds. That's exactly what Postgres'
FOR UPDATE SKIP LOCKED does — and it's the line that makes this whole thing
work:
UPDATE payments
SET status = 'processing', updated_at = now()
WHERE id IN (
SELECT id FROM payments
WHERE status = 'pending' AND next_run_at <= now()
ORDER BY created_at
FOR UPDATE SKIP LOCKED -- lock these rows, skip ones already locked
LIMIT $1
)
RETURNING *;
One statement. Atomic. Run ten workers and each gets a different batch — no double-processing, no blocking each other waiting on locks. This is the heart of "Postgres as a queue".
%%{init: {'theme':'dark'}}%%
flowchart LR
subgraph T["payments (status = pending)"]
r1["row 1"]
r2["row 2"]
r3["row 3"]
r4["row 4"]
end
A["Worker A — claimBatch()"] -->|locks| r1
A -->|locks| r2
B["Worker B — claimBatch()"] -->|SKIP LOCKED → skips r1,r2| r3
B -->|locks| r4
Worker B doesn't wait on the rows A already holds — it skips them and takes the next free ones. Add a third worker and it just takes rows 5–6. That's horizontal scale with zero coordination code.
Here's the catch the title hinted at: you can't write SKIP LOCKED with the ORM
query builder. Prisma's updateMany returns a count, not the rows, so you
can't claim-and-fetch in one go. Drizzle is closer, but you still drop to raw
SQL for the locking clause. In the repository it lives behind one method:
async claimBatch(limit: number): Promise<Payment[]> {
const result = await this.db.execute(sql`
UPDATE payments SET status = 'processing', updated_at = now()
WHERE id IN (
SELECT id FROM payments
WHERE status = 'pending' AND next_run_at <= now()
ORDER BY created_at
FOR UPDATE SKIP LOCKED
LIMIT ${limit}
)
RETURNING *;
`);
return (result.rows as Record<string, unknown>[]).map((r) => this.mapRow(r));
}
The ORM is great for the other 95% of your data access. For the queue claim, you need to know the SQL it can't write for you.
Putting the scheduler, the claim and the gateway together, one tick looks like this — every box is a real function from the project:
%%{init: {'theme':'dark'}}%%
flowchart TD
A["ReliableScheduler.tick()"] --> B["PaymentProcessorService.processBatch()"]
B --> C["repo.claimBatch(limit) — FOR UPDATE SKIP LOCKED"]
C --> D{rows claimed?}
D -- no --> Z["scheduleNext() — setTimeout, no overlap"]
D -- yes --> E["runWithConcurrency(limit, rows, processOne)"]
E --> F["processOne(p) → gateway.charge(p)"]
F --> G{charge ok?}
G -- yes --> H["repo.markDone(id, chargeId)"]
G -- no --> I{attempts < maxAttempts?}
I -- yes --> J["repo.scheduleRetry(id, expBackoff(attempts))"]
I -- no --> K["repo.markFailed(id)"]
H --> Z
J --> Z
K --> Z
Z --> A
The loop closes on itself through scheduleNext() — never through setInterval
— so a slow batch delays the next tick instead of stacking a second one on top
of it.
Atomic claim and a non-overlapping scheduler get you correctness. Production needs three more things, and they're cheap once the foundation is right.
Idempotency. Even with all of the above, you want charging to be safe to
repeat — a crash after charge() but before markDone() should not charge
again. So the gateway is idempotent on a key, and POST /payments dedups on it:
async createPayment(dto: CreatePaymentDto): Promise<Payment> {
const existing = await this.repo.findByIdempotencyKey(dto.idempotencyKey);
if (existing) return existing; // same key never creates a second charge
return this.repo.insertPending({ /* … */ });
}
Retry with backoff. Transient gateway failures shouldn't be terminal. On
error, increment attempts and reschedule into the future; give up only after
maxAttempts:
const nextRunAt = new Date(Date.now() + expBackoff(p.attempts));
await this.repo.scheduleRetry(p.id, message, nextRunAt);
Graceful shutdown. On SIGTERM, stop pulling new work and let the in-flight
batch finish — don't kill a charge halfway:
async onApplicationShutdown(signal?: string): Promise<void> {
await this.scheduler.stop(); // clearTimeout + await the in-flight run
}
(That last one only works if you call app.enableShutdownHooks() in main.ts —
easy to forget, and then your "graceful" shutdown isn't.)
"Use your database as a queue" sounds like the kind of shortcut you'd be embarrassed to admit in an interview. It isn't. The pattern is old, boring, and runs at serious scale — and the people who used to argue against it are the ones building it now.
It's the default in Rails 8. Solid Queue,
built by 37signals (Basecamp, HEY), keeps jobs in your SQL database and takes
Redis out of the stack. On their own engineering blog they report running
millions of jobs a day on HEY through it, using exactly
SELECT ... FOR UPDATE SKIP LOCKED to "fetch and lock jobs without locking other
workers." A framework that tens of thousands of companies ship on made this the
recommended way to run background jobs.
(Introducing Solid Queue ·
Rails 8: No PaaS Required)
A payments company ran payment processing on it. GoCardless
— bank-to-bank payments, an API in Stripe's space — built, open-sourced and ran
in production Que, a Postgres-backed job queue. Their own repo states they
used it internally, their engineers gave conference talks on "Postgres in
Production at GoCardless," and one of them documented running the critical
daily pipeline that batches payments for submission to the banks — around
200,000 payments on peak days — on top of it.
(gocardless/que ·
a GoCardless engineer's account)
(Que predates SKIP LOCKED and uses Postgres advisory locks — a sibling
primitive; the same "the database is the queue" idea.)
The skeptics became the maintainers. Brandur Leach (ex-Stripe) had
publicly argued against Postgres queues — then built
River, a Postgres queue for Go that handles tens
of thousands of jobs per second with full ACID guarantees. What changed his
mind was one feature: SKIP LOCKED, added in Postgres 9.5 (2016). Whatever
objection you're feeling is, almost word for word, the one he retracted.
The whole category is this pattern — Solid Queue (Rails), River (Go), Oban (Elixir), Que (Ruby), pg-boss and graphile-worker (Node) — and the lock mode is documented right in the PostgreSQL manual. You're not being clever. You're using a primitive boring enough that a framework made it the default and a payments company trusted it with money.
Everything above is the pattern from the inside, so you can see exactly where the event loop bites. But you already saw the list of names — Solid Queue, River, Oban, pg-boss, graphile-worker. Those exist precisely so you don't write the scheduler, the claim, the backoff and the shutdown drain yourself. In Node the database-backed options are pg-boss and Graphile Worker; the Redis broker is BullMQ. For anything that moves money, reach for one of them.
Here's the same service on pg-boss. The HTTP handler still only persists the intent — it just hands the job to the library instead of leaving a worker to poll for it:
// PgBossEnqueuer — called right after the pending row is written.
// singletonKey makes the enqueue itself idempotent.
await this.boss.send('charge-payment', { paymentId }, { singletonKey: paymentId });
And the worker stops being a scheduler at all. You declare the queue's retry
policy once, then register a handler — pg-boss claims jobs with SKIP LOCKED,
runs them N at a time, and reschedules failures with backoff:
await boss.createQueue('charge-payment', {
retryLimit: 5, retryDelay: 1, retryBackoff: true, // retries + exponential backoff, for free
});
await boss.work('charge-payment', { localConcurrency: 5 }, async ([job]) => {
const payment = await repo.findById(job.data.paymentId);
if (!payment || payment.status === 'done') return; // idempotent: never charge twice
const { chargeId } = await gateway.charge(payment); // throw -> pg-boss retries it
await repo.markDone(payment.id, chargeId);
});
Notice what's gone: no setInterval, no setTimeout recursion, no "is the
previous tick still running?". There's no timer to overlap, so Bug #1 can't
happen; pg-boss does the atomic claim, so Bug #2 can't happen. Both bugs
that cost real money were consequences of hand-writing the loop — so the surest
fix is to not hand-write it.
The repo runs this exact path under WORKER_MODE=pgboss, alongside the
hand-rolled reliable worker and the buggy naive one, so you can switch among
all three and compare:
docker run --rm -p 3000:3000 -e WORKER_MODE=pgboss payments-queue
So why build it by hand at all? Because "just use pg-boss" is only reassuring if
you know what it's doing for you. Now you do: it's the SKIP LOCKED claim and
the no-overlap scheduling from the sections above, wrapped in a library that
already got the event-loop details right.
No Redis, no RabbitMQ, no SQS. The queue is just a table, and SKIP LOCKED gives
it real queue semantics. The payoff:
You'd reach for a dedicated broker when you outgrow it: very high throughput (Kafka), fan-out / pub-sub (RabbitMQ), or managed cross-service queues (SQS). For a payment queue at most companies' scale, Postgres is plenty.
The repo ships an all-in-one Docker image — Postgres, Node, the app and the
worker in one container — so every step below is reproducible with curl.
(Prefer clicking? Swagger UI is at http://localhost:3000/docs.)
docker build -t payments-queue .
docker run --rm -p 3000:3000 payments-queue # reliable worker (the default)
1. Queue a payment. The response is 202 Accepted with a pending payment —
nothing has been charged yet, it's only in the queue. Copy the id.
curl -s -X POST http://localhost:3000/payments \
-H 'Content-Type: application/json' \
-d '{"amount":1999,"currency":"USD","customerId":"cus_1","idempotencyKey":"order-1"}'
2. Watch the worker finish it. Within a second or two the status goes
pending → processing → done, externalChargeId fills in, and the field that
matters reads "duplicateCharges": 0 — charged exactly once.
curl -s http://localhost:3000/payments/<id>
3. Prove idempotency. Re-POST the same idempotencyKey — the same id
comes back, no second payment is created.
curl -s -X POST http://localhost:3000/payments \
-H 'Content-Type: application/json' \
-d '{"amount":1999,"currency":"USD","customerId":"cus_1","idempotencyKey":"order-1"}'
4. Now break it on purpose. Restart in naive mode with a short poll interval
so the @Interval ticks are guaranteed to overlap, then fire a burst:
docker run --rm -p 3000:3000 \
-e WORKER_MODE=naive -e POLL_INTERVAL_MS=300 \
payments-queue
for i in $(seq 1 15); do
curl -s -X POST http://localhost:3000/payments \
-H 'Content-Type: application/json' \
-d "{\"amount\":1000,\"currency\":\"USD\",\"customerId\":\"c$i\",\"idempotencyKey\":\"naive-$i\"}" &
done; wait
5. See the double charge — with a single GET. List every payment and look at
duplicateCharges:
curl -s http://localhost:3000/payments
Several rows now report "duplicateCharges": 1 or more — the queue called the
gateway again on payments it had already charged. The container logs fill
with ⚠️ DOUBLE-CHARGE attempt. Switch back to the default (reliable) worker,
repeat the burst, and every duplicateCharges stays 0 — even when a payment
was retried after a transient failure (chargeAttempts may be 2, but
duplicateCharges is still 0; a retry is not a double charge).
The lesson isn't "Postgres has a cool lock mode." It's that the bug was never in
the code you were reading — it was in when @Interval runs and in the gap
between two ORM calls. Concurrency bugs hide in the timing, not the syntax.
Want to see the code from this tutorial in action? PULL the complete working example from my GitHub repository!
![]()
© 2024 PullStackDeveloper. All rights reserved.