Achieving Consistent Low Latency on an Exchange

8 min read
Achieving Consistent Low Latency on an Exchange
Achieving Low Latency on Exchange
Speed:
0:00 / 0:00

Low latency is indispensable for algorithmic trading and numerous market participants — the lower the better. However, achieving it is not enough: we must also work for its consistency, which is a much more technologically complex process.

We here at Devexperts recently built an exchange from the ground up, so we know a thing or two about setting up state-of-the-art network stacks. We accumulated first-hand experience with all factors that impact latency and in this article, we’ll talk about them.

What I Talk About When I Talk About Low Latency

“Low latency” is somewhat of a hype: everyone strives for it and it’s often used as a marketing buzzword but what does it actually mean? So before we discuss it, let’s specify what we mean by “low”. “Low” typically means “sub-millisecond”, so we’ll stick to this definition.

The lower the latency, the better it is for algorithmic trading and market makers. For retail clients, however, latency is less important.

It makes sense to first understand how latency is measured. To be able to compare different technologies, we must compare apples to apples and oranges to oranges. Every exchange has some core “internal” latency (e.g., measured in the matching engine since the order is received until the execution is sent). However, this might not include additional network hops between different components and network infrastructure in the exchange datacenter, converting the data to the output protocol and sending it back to the client. The client’s connectivity to the exchange also matters. That’s where most of the latency can hide. So, the most important order latency value is the one measured from the moment the order is sent until the corresponding report is received. 

It’s also advisable to avoid the term “average” and stick with statistical data and percentiles. We can have a really good average latency even if a significant number of orders is executed within an unacceptable time frame. It’s always important to set the latency goal in quantiles, for example “99% of all orders should be executed in less than 100 microseconds”.

Abstract image with flying candlestick charts and lights
The most important order latency value is the one measured from the moment the order is sent until the corresponding report is received

First but Not Foremost: Eliminate Jitter

Achieving consistent latency is much more technologically complex than having some good average numbers.

The OS, hardware and networking stack, the virtual machine of your selected programming language, garbage collection, etc., can all cause jitter. Eliminating  jitter requires a very careful and sophisticated approach to programming and tuning.

Businesses also prefer higher but consistent latency to a lower but uneven one. To be successful, a trading algorithm should be able to predict the impact of latency. And this is only possible if there are no latency spikes.

To better understand jitter, consider this hypothetical scenario: suppose we have a steady order rate of 500 orders per second, and 99% of these orders are executed within 100 microseconds with one percent executed around 10 milliseconds (a hundred times worse). The average latency would be 199 microseconds; but within an 8-hour trading session, about 150 thousand orders would have unacceptable latency.

Designing the Network Protocol

Network protocol design is among the factors that heavily affect latency. Exchanges usually expose two different protocol sets: one for trading and another for market data. Latency matters in both: You should be able to send an order as fast as possible and always see the most up-to-date market picture so that your algorithms can react. Again, this is less important for retail customers. Classic research indicates that humans can only perceive latency of 13ms or more (as in video games or movies). In trading, it also takes considerable time to gauge changing market conditions, make decisions, and click UI buttons. This can take seconds. That’s why we don’t usually see retail-oriented exchanges (say, cryptocurrency venues) or brokers offering any high-performance protocols. On those platforms, usability, simplicity, and ease of integration are much more important than latency.

The Challenges of Low-Latency FIX Implementation 

FIX protocol is the ‘lingua franca’ for modern exchange connectivity. It underwent several revisions since its original design in 1992, and remains the most widely adopted protocol in the industry.  Today, the protocol vet remains the ubiquitous workhorse of financial integration. But building a truly low-latency FIX implementation is challenging  for a couple reasons:

1. “Standard” FIX protocol is usually text-based. Text is not very effective on the wire, and it’s much slower to parse than a special binary representation. Luckily, more and more exchanges have started adopting binary FIX implementations or FIX-like binary protocols (for example, based on SBE). We can take CME’s iLink as an example that allows for representation of a compact and efficient message. 

2. FIX protocol is usually TCP-based. TCP is a universal network protocol that orders and retransmits packets, including lost ones, providing a reliable stream of data between two connected endpoints. However, the protocol requires careful tuning, otherwise, it may cause latency in the range of dozens of milliseconds (in case of a packet loss, for example). Issues with TCP tuning is one of the main reasons exchanges employ proprietary UDP-based transports for market data distribution.

To achieve the lowest possible latency in the market data dissemination, exchanges opt to use UDP for their market data protocols. Unlike TCP, UDP doesn’t guarantee that network packets are delivered or that they’re delivered in order. But, being a much simpler protocol, it’s also faster. UDP also offers some unique capabilities such as multicast distribution where the traffic is replicated to all parties by network equipment hardware. This allows for some very efficient market data protocols (take CME’s MDP 3.0 or Nasdaq’s ITCH as an example). One example that explains why these protocols are so fast is the ‘arbitrage’ approach: consumers might listen to several identical channels simultaneously and use the first message they see regardless of the source channel. This helps avoid temporary network, hardware, or OS hiccups.

The Limitations of Low-Level Market Data Protocols

Low-level market data protocols are not without caveats.

1. There is no “gold standard” in how the data should be distributed, so the implementations are not compatible. This requires all the connecting parties to roll out their implementations for each exchange, and this limits the adoption.

2. To overcome UDP limitations and retain performance, the protocols employ some very sophisticated algorithms, and it’s the responsibility of the consumer to properly handle all the possible situations that might occur in such a protocol. It requires very careful engineering but the benefits can be huge. 

A future opportunity for the exchanges could be in offering two market data protocols: low-level protocol for those consumers who need it (market-makers or algorithmic trading shops) and a simple higher-level TCP-based protocol for retail-oriented consumers (thus driving adoption). 

Last but Not Least: Exchange Colocation and Network Stack

People often mention an exchange colocation as the only way to achieve the lowest latency possible. Indeed, if your infrastructure is close to the exchange servers (preferably in the same datacenter) and your connectivity and network equipment are superior, you might achieve much better latency than the competition. In the low latency world, even the distance from the exchange server in the same datacenter might matter. Sometimes an exchange can roll out some more performant hardware just for a single “important” connection partner – yielding performance benefits. However, we see a growing demand for “equidistance” and there are some venues in the market that declare that all the connecting parties receive the same quality of service by design. Such venues may even embed throttling in their protocols to worsen the latency so that everyone is in the same equal position.

An important way of decreasing latency is utilizing a state of the art network stack. All the systems at exchanges and brokers are usually distributed, with multiple independent components connected with a network – so network speed is crucial. There are market vendors such as Mellanox and SolarFlare that offer high-performance networking equipment. Their adapters usually provide DMA capabilities (meaning the data received from the wire is immediately put into the shared memory for the consumer process to read it, involving no copies) and may include their own heavily optimized software network stacks that bypass the OS implementation completely. Such adapters indeed may provide end-to-end network latency of 1-2 microseconds for a network hop, but working with them requires careful tuning and OS configuration.

Recap

We hope we provided insight into how to achieve not only low but also consistent latency. If you also have experience in decreasing latency or want to talk about building exchanges, write us! In the meantime, check out some of our recent work.