engineering

Inside the TX24 Matching Engine: Engineering an Exchange in Go at 300,000 TPS

TX24 Editorial·04 May 2026·9 min read

Most cryptocurrency exchanges did not start as exchanges. They started as websites with an order book bolted on, then they grew, and the order book was patched until it could absorb whatever volume the front door let through. The result is a generation of platforms that work well enough on a normal day and visibly buckle on the days that matter — listings, liquidations, macro prints, ETF approvals. The institutions we serve at TX24 have learned, expensively, to read those signals: a venue that widens spreads under stress is a venue you cannot rely on for execution.

We started TX24 from the opposite premise. The exchange is the matching engine. Everything else — the API gateways, the risk overlays, the custody integration with GK8, the user experience — exists to feed and protect that engine. This article walks through how we built it, the trade-offs we made, and why a 5.6 ms median latency at 300,000 transactions per second is not a vanity metric but the prerequisite for the kind of business we want to run.

Why Go, and why now

When we sat down to choose the language for the engine, the conventional wisdom was C++ or Rust. Both deliver predictable latency profiles and tight memory control. We considered both, prototyped in both, and chose Go. The decision was not driven by ideology. It was driven by the observation that an exchange is not a single hot loop — it is a fleet of cooperating systems, and the bottleneck in production is rarely raw arithmetic. It is concurrency, garbage-collection jitter, deployment cadence, and the speed at which a small team can ship a fix when something is wrong at 03:00.

Go's runtime gives us cheap goroutines, a GC that has improved every release for the last decade, a standard library that takes networking seriously, and a build system that produces a single static binary. That last property matters more than people admit: when an incident happens, we want to know exactly which binary is running on every node, with no library-version skew, no JVM tuning surface, no "it works on staging" surprises. Go gives us that for free.

The trade-off is the garbage collector. We treated GC pauses as a first-class engineering problem from day one. The hot path of the matching engine allocates almost nothing — order objects come from sync.Pool, the order book is a flat array indexed by price level, and the audit trail is written to a pre-allocated ring buffer that a separate goroutine drains to disk. In production we observe sub-millisecond GC pauses on the matching nodes, and the 99.9th-percentile latency stays inside our SLO even during scheduled maintenance windows.

The order book: flat, hot, and boring

Order books are a solved problem in theory and a source of subtle bugs in practice. Tree-based implementations have elegant Big-O properties and disappointing cache behavior. We chose a price-level array indexed by an integer representation of price, with a doubly-linked list of orders at each level. Inserts and cancels are O(1) in the common case. Walks across price levels are predictable in cost and friendly to the CPU prefetcher.

The boring part is deliberate. A matching engine should be the part of the system you stop thinking about. When we onboard a new market — a new pair, a new asset class, a new perpetual — we do not redesign the engine. We instantiate another book with the same primitives. The boring engine is what lets us move fast everywhere else.

Concurrency without contention

The naïve way to scale an exchange is to put a mutex around the order book and add nodes. That stops scaling at one mutex. The way we scale is shared-nothing per market: each trading pair owns its own goroutine, its own book, its own sequence number generator, and its own append-only log. Cross-market operations — risk checks, margin calculations, fee computation — happen on adjacent goroutines that consume from the trade stream.

This is not novel. It is the LMAX disruptor pattern, the actor model, the staged event-driven architecture — the same idea wearing different hats for twenty years. What is novel, in our domain, is the discipline to apply it consistently. Many crypto exchanges advertise "high performance" while their settlement layer takes a database lock on a global ledger table. We do not. The settlement layer reads from per-market trade logs and updates balances through a CRDT-style merge process that never blocks the engine.

What 300,000 TPS actually means

Throughput numbers from exchanges are notoriously hard to compare. Some count messages, some count trades, some count order events including cancels and modifies, some report peak burst capacity, some report sustained capacity. We report sustained capacity at the matching layer: 300,000 order events per second, sustained, on a single market, on commodity hardware (current-generation AMD EPYC, 64 cores, NVMe-backed write-ahead log). Across the venue we have many markets running in parallel, so the platform-level number is considerably higher, but we publish the per-market number because that is what determines whether your strategy clears or queues during a stress event.

Median latency, measured from the moment an order arrives at the gateway to the moment a trade or rejection is acknowledged on the wire, is 5.6 ms. The 99th percentile sits below 12 ms. These numbers are measured in production, not in a lab, and they include every layer a real order traverses: TLS termination, authentication, risk check, matching, settlement entry, audit log fsync, and acknowledgement.

Determinism over speed

If we were optimizing only for speed, we could go faster. We could move authentication to a separate path, batch the audit log instead of fsyncing each entry, or skip the risk check for whitelisted accounts. We do none of these things. Determinism — the property that the same sequence of inputs always produces the same sequence of outputs — is non-negotiable for an institutional venue. It is what allows us to replay history, to reconcile with custodians, to prove to a regulator that a contested trade happened the way the tape says it happened.

Determinism shapes the design at every layer. Sequence numbers are assigned at ingress, not at matching. The audit log is the source of truth, not the database. Replays of the log on a clean node produce a byte-identical book. We test this. Every release ships with a 24-hour replay against the previous day's tape; if the resulting book differs in a single field, the release does not go out.

What this enables for our clients

Engineering choices are interesting only in proportion to the user outcomes they unlock. The reason we obsess over the engine is that institutional clients do four things that retail clients do not: they post very large orders, they cancel and replace very frequently, they care about microstructure (queue position, partial fill behavior, self-trade prevention), and they care about post-trade evidence.

Large orders need a book that can absorb them without leaking information. High cancel-replace ratios need a venue that prices cancels at the same priority as inserts and does not throttle clients who behave correctly. Microstructure-aware traders need an engine that documents its rules, applies them consistently, and exposes the data to verify them. Post-trade evidence needs an audit chain that survives audit, including from regulators we do not yet know about.

The TX24 engine was built for those four behaviors. Everything else — the front-end, the AI agent, the social trading layer, the RWA marketplace — sits on top of that foundation. Without the foundation, the rest is a feature list. With it, it is an exchange.

Self-trade prevention, fee tiers, and the parts nobody sees

The visible work of an exchange is the order book. The invisible work — and the work that determines whether sophisticated traders trust the venue — happens around it. Self-trade prevention is one of those invisible features. A market maker running multiple strategies on the same account does not want their own bids matching their own asks; even if it would not be a regulatory issue, it generates noise, distorts internal P&L attribution, and burns fees. We implement self-trade prevention at the matching layer with three configurable behaviors per account: cancel-newest, cancel-oldest, and cancel-both. The behavior is selected at the account level, applied deterministically inside the engine, and logged with the same audit guarantees as any other action.

Fee tiers, similarly, are computed inside the engine, not in a downstream batch process. A taker fill applies the right tier the moment it happens, based on the account's rolling 30-day volume, and the resulting fee is recorded atomically with the trade. This sounds like a small thing. It becomes a large thing the first time a client tries to reconcile their internal P&L against a venue that adjusts fees retroactively at the end of the day; the operational cost of explaining mismatches to a back-office team is real, and we eliminate it by making the fee deterministic at the moment of execution.

Other invisible features fall into the same category. Iceberg orders that disclose only a fraction of size at a time, post-only orders that are rejected if they would cross, time-in-force semantics (IOC, FOK, GTD), and pegged orders that track the best bid or ask — all of these are implemented as first-class primitives in the engine, with documented behavior under stress conditions. The documentation is the product. A trader who wants to know what happens to a post-only order in a thin book during a 2% candle does not have to guess; the spec is published, and the engine behaves the way the spec says.

Observability: what we measure and why

An exchange that cannot see what is happening inside its own engine cannot improve it, and cannot diagnose incidents quickly. We treat observability as an engineering deliverable on the same priority tier as correctness. Every order event emits structured telemetry — ingress timestamp, gateway latency, risk-check latency, matching latency, audit-log fsync latency, egress timestamp — and the resulting time series feed dashboards that the engineering team watches in real time. Alerting is configured against percentile breaches, not averages, because averages hide the failure modes that matter for institutional traders.

We also publish a subset of these metrics externally, in a status page that exposes per-market median latency and uptime over rolling windows. Doing this publicly is uncomfortable in the way that publishing any production number is uncomfortable: it commits us to a standard. We accept that commitment because the alternative — a venue that refuses to publish operational metrics — is a venue that asks its clients to take performance on faith. Institutional clients do not take performance on faith. They take it on telemetry.

What is next

We are not done. The next generation of the engine is moving toward sub-millisecond median latency through a kernel-bypass networking path on the colocation pods, and toward stronger formal verification of the matching rules. We are also extending the engine's capability surface around derivatives — funding rate calculation moving inside the engine rather than in an adjacent service, cross-margin computation evaluated atomically with order acceptance, and a mark-price oracle architecture that derives a single venue-internal mark from multiple independent feeds. None of these changes will alter what TX24 is — a regulated, institutional venue with a bias toward boring engineering. They will change how much headroom we have when the next stress event arrives, and on this venue, the next stress event is always the one we are preparing for.

Field Notes

More signal, less noise.

Read the rest of the desk notes — or get on the waitlist and trade on the engine we keep writing about.

All posts