POS Systems for High-Volume Venues: Lessons Learned from Burst Loads and Offline Resilience
POS Systems for High-Volume Venues: Lessons Learned from Burst Loads and Offline Resilience
The first time we pushed a new POS release into a 42,000-seat arena during a sold-out concert, the system handled ~180 transactions per minute for about eleven minutes. Then the first goal was scored in the pre-show friendly and beer sales roughly tripled in under ninety seconds. Within three minutes we were seeing 620–680 tx/min sustained, with peaks touching 820. Nothing in our load tests had prepared us for that shape of traffic.
Burst Load Realities
Most venue operators don’t care about your p95 latency under constant 200 tx/min. They care whether the tills stay responsive when 8,000 people simultaneously decide they want a drink during half-time or after a touchdown. That creates traffic shapes that look like this in practice:
- Quiet periods: 40–120 tx/min
- Innings / quarters / sets: 180–350 tx/min
- Halftime / intermission / encore rush: 550–950 tx/min (sometimes higher)
We learned the hard way that horizontal scaling alone is not enough if your database or message broker chokes on the sudden connection spike. The winning pattern turned out to be:
- Thick client-side optimistic UI + local queue
- Exponential backoff + jitter on every network call
- Per-lane circuit breakers that fall back to offline mode independently
- Aggregated background reconciliation rather than per-transaction sync
In one particularly ugly incident, we had a network hiccup right as gates opened. About 22 lanes went offline for 7 minutes. Because we had already moved to local SQLite queuing with idempotency keys, those lanes kept selling the entire time. When connectivity returned, the reconciliation engine deduplicated ~1,400 transactions automatically. No double-charging, no lost revenue.
The Offline Resilience Trade-Offs
Offline-first sounds great until you realize that true offline operation means eventually accepting some forms of inconsistency. We chose the following compromises:
- Price changes do not propagate during outages (stale prices are displayed and honored)
- Inventory is soft-reserved locally and only hard-checked on sync
- Void / refund operations are queued with strict causal ordering per lane
- Global stock exhaustion is communicated via a “low stock” flag that is advisory only
This decision tree came from real conversations with venue GMs: “Would you rather sell the last three beers twice and comp them later, or have the register freeze and create a line of 80 angry fans?” The answer was consistent across properties.
Idempotency as the Foundation
Every transaction carries a client-generated UUID + lane ID + incrementing sequence number. On the server we store seen idempotency keys for 72 hours. That simple table has prevented countless duplicates during flaky LTE handoffs and Wi-Fi roaming failures.
interface TransactionEnvelope {
idempotencyKey: string; // UUID v7 preferred
laneId: string;
sequence: number; // monotonic per lane
payload: SignedTransaction;
clientTimestamp: string;
schemaVersion: number;
}
The server rejects anything with a duplicate key unless the payload SHA-256 matches exactly (allowing safe retries). Mismatches trigger manual review alerts because they usually indicate a serious client bug.
Hardware and Connectivity Constraints
Many venues still run on 2.4 GHz-only Wi-Fi with 80+ devices per AP. Handheld terminals frequently drop to 1–2 Mbps effective throughput. We learned to keep payload sizes tiny:
- Initial transaction POST: < 1.2 KB
- Background sync batch: < 40 KB (batched 50–200 tx)
- Price & inventory deltas: delta-encoded, usually < 800 bytes
We also stopped trying to push rich images or deep product hierarchies to the client in real time. Static assets live on the device; only SKUs, prices, and stock counters change frequently.
What We Would Do Differently Next Time
If I were starting over today I would push harder for:
- Local-first CRDTs for inventory counters (we’re only now migrating to them)
- Per-lane Redis streams as a secondary durable queue before SQLite
- Stronger chaos testing that simulates 60–80% packet loss instead of just full partitions
- Earlier investment in lane-level metrics (we were blind to per-lane offline duration for far too long)
The biggest lesson isn’t technical. It’s that operators will forgive strange-looking inventory numbers for a few minutes far more readily than they will forgive a queue at the stand during a break. Build for availability first, then consistency.