Fernando

Fernando
Hi, my name is Fernando and I'm a senior software engineer. Welcome to my personal website!

setInterval Will Double-Charge Your Customers: A Reliable Payment Queue with Postgres

Published
1 day ago
• 21 min read
The naive background worker everyone writes — setInterval plus an ORM query — has two bugs that nobody catches in review and that cost real money. Here's why, and the reliable queue that replaces it. setInterval Will Double-Charge Your Customers: A Reliable Payment Queue with Postgres

setInterval Will Double-Charge Your Customers

Introduction

I wasn't planning to write about payments. I was reading about Solid Queue — the database-backed job queue that ships as the default in Rails 8 — and it struck me as an almost perfect lens for a handful of JavaScript fundamentals that are easy to say you know and easy to get wrong. The whole pattern's correctness rests on timing, and in a worker that moves money, timing mistakes don't throw exceptions — they charge your customers twice.

So this is a tour of those fundamentals, built around one concrete, expensive bug. We'll cover, in order:

  • what Solid Queue is, and why "the database is the queue" is a real pattern;
  • microtasks, macrotasks and the event loop — and why await doesn't mean what a naive worker assumes;
  • how a wrong implementation turns all of that into double charges — which we'll then fix, run, and break on purpose;
  • and what you'd actually ship — the same queue handed to pg-boss, where the timing traps disappear because you never wrote the loop.

What is Solid Queue?

Solid Queue is the background-job system 37signals built for Basecamp and HEY, and that Rails 8 adopted as its default, replacing Redis. Its premise is almost cheeky: you don't need a message broker at all. Your database is the queue. Jobs are just rows in a table; workers poll that table, claim a batch, run it, and mark it done. The lock that keeps two workers off the same job is one SQL feature — FOR UPDATE SKIP LOCKED.

People reach for this on purpose, and for good reasons: no extra infrastructure to deploy and babysit, and — because the queue lives in the same database as your business data — you can enqueue a job in the same transaction that writes the data, with no "dual write" that can leave the two out of sync.

The traditional alternative is a dedicated broker — BullMQ on Redis, RabbitMQ, SQS, Kafka — which delivers each job to a worker and handles concurrency, locking and retries for you. That's the crucial difference, and the reason this article exists: a broker hands you delivery semantics; a database does not. The moment you choose "database as a queue," you inherit the two things the broker used to do for free — making sure a job runs once, and making sure two workers never grab the same one.

One clarification before we go further, because it matters: Solid Queue itself is Ruby — it's a Rails library, you won't npm install it. But the pattern it popularized isn't Rails-specific; it's just SQL. Node has its own production-grade libraries built on the exact same FOR UPDATE SKIP LOCKED idea — pg-boss and Graphile Worker, both Postgres-backed job queues — and if you want a broker instead, BullMQ is the Redis one. In real life you'd reach for one of those. Here we deliberately build the pattern by hand in NestJS and Drizzle, because the whole point is to see the two correctness problems a library would otherwise hide for you — and exactly where a misunderstanding of the event loop turns them into double charges.

Because in Node, whether you get those two things right comes down almost entirely to how well you understand the event loop. So let's start there.

Microtasks, macrotasks, and the event loop

Both bugs below are timing bugs, so it pays to get three words straight first: macrotask, microtask, and process.nextTick.

Node runs your code in a loop. Each turn it runs a chunk of JavaScript, then — before moving on — drains two queues in a fixed order: first everything queued with process.nextTick, then the microtask queue (callbacks from Promise.then and the continuations that resume after an await). Only then does it pick up the next macrotask — a setTimeout / setInterval callback or an I/O event.

console.log('1: sync');
setTimeout(() => console.log('5: timer    — macrotask'), 0);
Promise.resolve().then(() => console.log('4: promise  — microtask'));
process.nextTick(() => console.log('3: nextTick'));
console.log('2: sync');

It runs top to bottom only for the two plain console.logs; everything else is deferred and comes back in queue order. The console prints, one line at a time:

1: sync
2: sync
3: nextTick
4: promise  — microtask
5: timer    — macrotask

Read it top to bottom: the order things run (1, 2, 3, 4, 5) is nothing like the order you wrote them (1, 5, 4, 3, 2). Sync code first, then the process.nextTick queue, then microtasks (promises / await), and — dead last — the timer.

There's a call stack that runs your synchronous code right now, and three queues that hold deferred work. The event loop only looks at them when the stack is empty, and it drains them in a strict priority:

%%{init: {'theme':'dark'}}%%
flowchart TD
  Stack["Call Stack — runs sync code now"]
  Stack -. "schedule work" .-> Q
  subgraph Q["Deferred work, drained in this priority"]
    direction TB
    NTQ["1 · process.nextTick queue"]
    MQ["2 · microtask queue (Promise.then / await)"]
    MacroQ["3 · macrotask queue (setTimeout / setInterval / I/O)"]
    NTQ --> MQ --> MacroQ
  end
  Q -. "event loop (stack empty)" .-> Stack

Every turn the loop empties the stack, then drains the whole nextTick queue, then the whole microtask queue (re-checking nextTick in between, because microtasks can schedule more), and only then runs exactly one macrotask — after which it drains the micro-queues again. One timer per turn; all the promises in between.

When tasks nest, the output is genuinely tricky

Flat examples are easy. Real code nests — a setTimeout that resolves a promise, a promise callback that schedules a nextTick. Now the order is not obvious. Try to predict this before reading on:

console.log('A');
setTimeout(() => {
  console.log('B');
  Promise.resolve().then(() => console.log('C')); // microtask, born inside a macrotask
}, 0);
Promise.resolve().then(() => {
  console.log('D');
  setTimeout(() => console.log('E'), 0);          // macrotask, born inside a microtask
  process.nextTick(() => console.log('F'));       // nextTick, born inside a microtask
});
process.nextTick(() => console.log('G'));
console.log('H');

Node prints:

A
H
G
D
F
B
C
E

Queue by queue:

%%{init: {'theme':'dark'}}%%
flowchart LR
  A["A · sync"] --> H["H · sync"]
  H --> G["G · nextTick"]
  G --> D["D · microtask"]
  D --> F["F · nextTick (born inside D)"]
  F --> B["B · macrotask 1"]
  B --> C["C · microtask (born inside B)"]
  C --> E["E · macrotask 2"]

A/H are synchronous. Before any timer, Node drains nextTick (G) then microtasks (D) — and because D itself queued a nextTick (F), that F also runs before any macrotask. Only now does the first timer fire (B), whose promise (C) is drained before the second timer (E). That D → F → B ordering — a macrotask stuck waiting behind a microtask's nextTick — is precisely the kind of thing that's invisible in review and obvious only once it has double-charged someone.

Here's the one consequence that matters: await does not pause a timer. When your interval callback hits await gateway.charge(), it suspends and queues its continuation as a microtask — but the interval is a macrotask on its own clock. The next tick fires on schedule whether or not the previous one finished awaiting:

%%{init: {'theme':'dark'}}%%
sequenceDiagram
  participant EL as Event loop timer
  participant A as Interval tick 1
  participant B as Interval tick 2
  EL->>A: fire at t=0
  Note over A: await charge() suspends, continuation parked as a microtask
  EL->>B: fire at t=1s, timer does not wait for tick 1
  Note over B: selects the SAME pending rows
  Note over A,B: both call charge() on one payment, so it is charged twice

So a setInterval / @Interval whose body awaits anything slow is, by construction, allowed to run on top of itself. Now watch what that does to a payment worker.

How a wrong implementation double-charges

Here's the worker, written the idiomatic NestJS way — a scheduled method from @nestjs/schedule. It uses an ORM, it's typed, it passes code review, and it double-charges customers in production:

@Injectable()
export class PaymentWorker {
  // @nestjs/schedule — the idiomatic way to poll on a timer in Nest.
  @Interval(1000)
  async processPending() {
    const jobs = await this.db
      .select()
      .from(payments)
      .where(eq(payments.status, 'pending'));

    for (const job of jobs) {
      await this.gateway.charge(job);
      await this.db
        .update(payments)
        .set({ status: 'done' })
        .where(eq(payments.id, job.id));
    }
  }
}

Nothing about it screams "bug" — and that's the point. The failure isn't in the part you're looking at. It's in when @Interval fires (the timer we just watched overlap itself) and in the gap between the select and the update. (@Cron('* * * * * *') and a raw setInterval(async () => …, 1000) are the same thing in different clothes.) This isn't a story about careless people — it's about the traps that catch the ones who chose "database as a queue" on purpose and wrote the most natural code for it. There are exactly two, and the lifecycle below shows where each one strikes.

The payment lifecycle at a glance

Every payment is one row whose status moves through a small state machine. The label on each arrow is the function that drives the transition:

%%{init: {'theme':'dark'}}%%
stateDiagram-v2
  [*] --> pending: POST /payments — insertPending()
  pending --> processing: claimBatch() — FOR UPDATE SKIP LOCKED
  processing --> done: charge() ok — markDone()
  processing --> pending: charge() fails — scheduleRetry() (expBackoff)
  processing --> failed: attempts exhausted — markFailed()
  done --> [*]
  failed --> [*]

Keep this picture in mind — the rest of the article is just making each arrow correct under concurrency.

Why you don't charge inside the request

First, a step back: why is there a worker at all? Why not charge the customer directly when they hit POST /checkout?

Because the payment gateway is slow and unreliable. It takes seconds, it times out, it fails intermittently, and your own process can crash mid-call. If you await gateway.charge() inside the request, a timeout can fire after the charge already went through, and you have no idea whether you charged or not. There's no safe place to retry, and a burst of checkouts becomes a burst of gateway calls.

So you split intent from execution — the outbox pattern. The request only persists a pending row, atomically, and returns:

@Post()
@HttpCode(202) // accepted: queued, not charged yet
async create(@Body() dto: CreatePaymentDto): Promise<PaymentResponseDto> {
  const payment = await this.service.createPayment(dto);
  return PaymentResponseDto.fromEntity(payment);
}

Through the layers, the write path is short and fully synchronous — no gateway call in sight:

%%{init: {'theme':'dark'}}%%
sequenceDiagram
  autonumber
  actor Client
  participant Ctrl as PaymentsController
  participant Svc as PaymentsService
  participant Repo as PaymentsRepository
  participant DB as Postgres
  Client->>Ctrl: POST /payments
  Ctrl->>Svc: createPayment(dto)
  Svc->>Repo: findByIdempotencyKey(key)
  Repo->>DB: SELECT by idempotency_key
  alt key already used
    DB-->>Svc: existing payment (no duplicate)
  else new payment
    Svc->>Repo: insertPending(data)
    Repo->>DB: INSERT status = pending
  end
  Ctrl-->>Client: 202 Accepted (queued, not charged)

If that commit succeeds, the charge will happen — reliably, out of band, by the worker. Which brings us back to the worker, and its two bugs.

Bug #1 — the timer overlaps itself

@Interval(1000) — like the setInterval it wraps — fires processPending every second regardless of whether the previous run finished. (@Cron is no better: it fires on the schedule, not when your handler is done.) Timers are macrotasks; the awaits inside your handler schedule microtasks — none of that holds back the next tick.

So if a batch takes longer than the interval — and charging a batch of payments, each a network round-trip to the gateway, easily blows past one second — the next tick fires while the previous one is still running. Now two runs are processing the same pending rows at the same time. Each calls charge(). The customer pays twice.

The fix is to never schedule the next run until the current one has finished — setTimeout recursion instead of setInterval:

private async tick(): Promise<void> {
  if (this.stopping || this.running || !this.task) return;
  this.running = true;
  const startedAt = Date.now();

  this.inFlight = (async () => {
    try {
      await this.task!();
    } catch (err) {
      this.logger.error('scheduled task failed', err as Error);
    }
  })();
  await this.inFlight;

  this.running = false;
  const elapsed = Date.now() - startedAt;
  // schedule the NEXT run only after this one settled, minus elapsed time so
  // the cadence doesn't drift.
  this.scheduleNext(Math.max(0, this.intervalMs - elapsed));
}

No overlap, no drift. But that alone is not enough — because the moment you run more than one worker instance (and you will, for throughput and availability), they'll still grab the same rows. The real fix is in the claim.

Bug #2 — select then update is not atomic

Reading the pending rows and then marking them processing are two separate statements. Between them, another worker runs the same select and gets the same rows. The ORM happily handed you two statements; it never promised they'd be atomic.

What you want is to claim rows: select and lock them in a single statement, skipping any row another worker already holds. That's exactly what Postgres' FOR UPDATE SKIP LOCKED does — and it's the line that makes this whole thing work:

UPDATE payments
SET status = 'processing', updated_at = now()
WHERE id IN (
  SELECT id FROM payments
  WHERE status = 'pending' AND next_run_at <= now()
  ORDER BY created_at
  FOR UPDATE SKIP LOCKED   -- lock these rows, skip ones already locked
  LIMIT $1
)
RETURNING *;

One statement. Atomic. Run ten workers and each gets a different batch — no double-processing, no blocking each other waiting on locks. This is the heart of "Postgres as a queue".

%%{init: {'theme':'dark'}}%%
flowchart LR
  subgraph T["payments (status = pending)"]
    r1["row 1"]
    r2["row 2"]
    r3["row 3"]
    r4["row 4"]
  end
  A["Worker A — claimBatch()"] -->|locks| r1
  A -->|locks| r2
  B["Worker B — claimBatch()"] -->|SKIP LOCKED → skips r1,r2| r3
  B -->|locks| r4

Worker B doesn't wait on the rows A already holds — it skips them and takes the next free ones. Add a third worker and it just takes rows 5–6. That's horizontal scale with zero coordination code.

Your ORM can't express this

Here's the catch the title hinted at: you can't write SKIP LOCKED with the ORM query builder. Prisma's updateMany returns a count, not the rows, so you can't claim-and-fetch in one go. Drizzle is closer, but you still drop to raw SQL for the locking clause. In the repository it lives behind one method:

async claimBatch(limit: number): Promise<Payment[]> {
  const result = await this.db.execute(sql`
    UPDATE payments SET status = 'processing', updated_at = now()
    WHERE id IN (
      SELECT id FROM payments
      WHERE status = 'pending' AND next_run_at <= now()
      ORDER BY created_at
      FOR UPDATE SKIP LOCKED
      LIMIT ${limit}
    )
    RETURNING *;
  `);
  return (result.rows as Record<string, unknown>[]).map((r) => this.mapRow(r));
}

The ORM is great for the other 95% of your data access. For the queue claim, you need to know the SQL it can't write for you.

The worker loop, end to end

Putting the scheduler, the claim and the gateway together, one tick looks like this — every box is a real function from the project:

%%{init: {'theme':'dark'}}%%
flowchart TD
  A["ReliableScheduler.tick()"] --> B["PaymentProcessorService.processBatch()"]
  B --> C["repo.claimBatch(limit) — FOR UPDATE SKIP LOCKED"]
  C --> D{rows claimed?}
  D -- no --> Z["scheduleNext() — setTimeout, no overlap"]
  D -- yes --> E["runWithConcurrency(limit, rows, processOne)"]
  E --> F["processOne(p) → gateway.charge(p)"]
  F --> G{charge ok?}
  G -- yes --> H["repo.markDone(id, chargeId)"]
  G -- no --> I{attempts < maxAttempts?}
  I -- yes --> J["repo.scheduleRetry(id, expBackoff(attempts))"]
  I -- no --> K["repo.markFailed(id)"]
  H --> Z
  J --> Z
  K --> Z
  Z --> A

The loop closes on itself through scheduleNext() — never through setInterval — so a slow batch delays the next tick instead of stacking a second one on top of it.

The rest of "reliable"

Atomic claim and a non-overlapping scheduler get you correctness. Production needs three more things, and they're cheap once the foundation is right.

Idempotency. Even with all of the above, you want charging to be safe to repeat — a crash after charge() but before markDone() should not charge again. So the gateway is idempotent on a key, and POST /payments dedups on it:

async createPayment(dto: CreatePaymentDto): Promise<Payment> {
  const existing = await this.repo.findByIdempotencyKey(dto.idempotencyKey);
  if (existing) return existing; // same key never creates a second charge
  return this.repo.insertPending({ /* … */ });
}

Retry with backoff. Transient gateway failures shouldn't be terminal. On error, increment attempts and reschedule into the future; give up only after maxAttempts:

const nextRunAt = new Date(Date.now() + expBackoff(p.attempts));
await this.repo.scheduleRetry(p.id, message, nextRunAt);

Graceful shutdown. On SIGTERM, stop pulling new work and let the in-flight batch finish — don't kill a charge halfway:

async onApplicationShutdown(signal?: string): Promise<void> {
  await this.scheduler.stop(); // clearTimeout + await the in-flight run
}

(That last one only works if you call app.enableShutdownHooks() in main.ts — easy to forget, and then your "graceful" shutdown isn't.)

This isn't a hack — it's the default

"Use your database as a queue" sounds like the kind of shortcut you'd be embarrassed to admit in an interview. It isn't. The pattern is old, boring, and runs at serious scale — and the people who used to argue against it are the ones building it now.

It's the default in Rails 8. Solid Queue, built by 37signals (Basecamp, HEY), keeps jobs in your SQL database and takes Redis out of the stack. On their own engineering blog they report running millions of jobs a day on HEY through it, using exactly SELECT ... FOR UPDATE SKIP LOCKED to "fetch and lock jobs without locking other workers." A framework that tens of thousands of companies ship on made this the recommended way to run background jobs. (Introducing Solid Queue · Rails 8: No PaaS Required)

A payments company ran payment processing on it. GoCardless — bank-to-bank payments, an API in Stripe's space — built, open-sourced and ran in production Que, a Postgres-backed job queue. Their own repo states they used it internally, their engineers gave conference talks on "Postgres in Production at GoCardless," and one of them documented running the critical daily pipeline that batches payments for submission to the banks — around 200,000 payments on peak days — on top of it. (gocardless/que · a GoCardless engineer's account) (Que predates SKIP LOCKED and uses Postgres advisory locks — a sibling primitive; the same "the database is the queue" idea.)

The skeptics became the maintainers. Brandur Leach (ex-Stripe) had publicly argued against Postgres queues — then built River, a Postgres queue for Go that handles tens of thousands of jobs per second with full ACID guarantees. What changed his mind was one feature: SKIP LOCKED, added in Postgres 9.5 (2016). Whatever objection you're feeling is, almost word for word, the one he retracted.

The whole category is this pattern — Solid Queue (Rails), River (Go), Oban (Elixir), Que (Ruby), pg-boss and graphile-worker (Node) — and the lock mode is documented right in the PostgreSQL manual. You're not being clever. You're using a primitive boring enough that a framework made it the default and a payments company trusted it with money.

In production, don't hand-roll it — use pg-boss

Everything above is the pattern from the inside, so you can see exactly where the event loop bites. But you already saw the list of names — Solid Queue, River, Oban, pg-boss, graphile-worker. Those exist precisely so you don't write the scheduler, the claim, the backoff and the shutdown drain yourself. In Node the database-backed options are pg-boss and Graphile Worker; the Redis broker is BullMQ. For anything that moves money, reach for one of them.

Here's the same service on pg-boss. The HTTP handler still only persists the intent — it just hands the job to the library instead of leaving a worker to poll for it:

// PgBossEnqueuer — called right after the pending row is written.
// singletonKey makes the enqueue itself idempotent.
await this.boss.send('charge-payment', { paymentId }, { singletonKey: paymentId });

And the worker stops being a scheduler at all. You declare the queue's retry policy once, then register a handler — pg-boss claims jobs with SKIP LOCKED, runs them N at a time, and reschedules failures with backoff:

await boss.createQueue('charge-payment', {
  retryLimit: 5, retryDelay: 1, retryBackoff: true, // retries + exponential backoff, for free
});

await boss.work('charge-payment', { localConcurrency: 5 }, async ([job]) => {
  const payment = await repo.findById(job.data.paymentId);
  if (!payment || payment.status === 'done') return;  // idempotent: never charge twice
  const { chargeId } = await gateway.charge(payment); // throw -> pg-boss retries it
  await repo.markDone(payment.id, chargeId);
});

Notice what's gone: no setInterval, no setTimeout recursion, no "is the previous tick still running?". There's no timer to overlap, so Bug #1 can't happen; pg-boss does the atomic claim, so Bug #2 can't happen. Both bugs that cost real money were consequences of hand-writing the loop — so the surest fix is to not hand-write it.

The repo runs this exact path under WORKER_MODE=pgboss, alongside the hand-rolled reliable worker and the buggy naive one, so you can switch among all three and compare:

docker run --rm -p 3000:3000 -e WORKER_MODE=pgboss payments-queue

So why build it by hand at all? Because "just use pg-boss" is only reassuring if you know what it's doing for you. Now you do: it's the SKIP LOCKED claim and the no-overlap scheduling from the sections above, wrapped in a library that already got the event-loop details right.

Why Postgres, and when not to

No Redis, no RabbitMQ, no SQS. The queue is just a table, and SKIP LOCKED gives it real queue semantics. The payoff:

  • Zero extra infrastructure to run and monitor.
  • Transactional consistency for free — the intent is written in the same database as your business data, so there's no "dual write" to keep in sync.

You'd reach for a dedicated broker when you outgrow it: very high throughput (Kafka), fan-out / pub-sub (RabbitMQ), or managed cross-service queues (SQS). For a payment queue at most companies' scale, Postgres is plenty.

Run it yourself — copy-paste

The repo ships an all-in-one Docker image — Postgres, Node, the app and the worker in one container — so every step below is reproducible with curl. (Prefer clicking? Swagger UI is at http://localhost:3000/docs.)

docker build -t payments-queue .
docker run --rm -p 3000:3000 payments-queue   # reliable worker (the default)

1. Queue a payment. The response is 202 Accepted with a pending payment — nothing has been charged yet, it's only in the queue. Copy the id.

curl -s -X POST http://localhost:3000/payments \
  -H 'Content-Type: application/json' \
  -d '{"amount":1999,"currency":"USD","customerId":"cus_1","idempotencyKey":"order-1"}'

2. Watch the worker finish it. Within a second or two the status goes pending → processing → done, externalChargeId fills in, and the field that matters reads "duplicateCharges": 0 — charged exactly once.

curl -s http://localhost:3000/payments/<id>

3. Prove idempotency. Re-POST the same idempotencyKey — the same id comes back, no second payment is created.

curl -s -X POST http://localhost:3000/payments \
  -H 'Content-Type: application/json' \
  -d '{"amount":1999,"currency":"USD","customerId":"cus_1","idempotencyKey":"order-1"}'

4. Now break it on purpose. Restart in naive mode with a short poll interval so the @Interval ticks are guaranteed to overlap, then fire a burst:

docker run --rm -p 3000:3000 \
  -e WORKER_MODE=naive -e POLL_INTERVAL_MS=300 \
  payments-queue
for i in $(seq 1 15); do
  curl -s -X POST http://localhost:3000/payments \
    -H 'Content-Type: application/json' \
    -d "{\"amount\":1000,\"currency\":\"USD\",\"customerId\":\"c$i\",\"idempotencyKey\":\"naive-$i\"}" &
done; wait

5. See the double charge — with a single GET. List every payment and look at duplicateCharges:

curl -s http://localhost:3000/payments

Several rows now report "duplicateCharges": 1 or more — the queue called the gateway again on payments it had already charged. The container logs fill with ⚠️ DOUBLE-CHARGE attempt. Switch back to the default (reliable) worker, repeat the burst, and every duplicateCharges stays 0 — even when a payment was retried after a transient failure (chargeAttempts may be 2, but duplicateCharges is still 0; a retry is not a double charge).

The lesson isn't "Postgres has a cool lock mode." It's that the bug was never in the code you were reading — it was in when @Interval runs and in the gap between two ORM calls. Concurrency bugs hide in the timing, not the syntax.

Get the Working Code

Want to see the code from this tutorial in action? PULL the complete working example from my GitHub repository!

download code

© 2024 PullStackDeveloper. All rights reserved.