Achieving Consistent Low Latency on an Exchange
Low latency is indispensable for algorithmic trading and numerous market participants—the lower, the better. However, achieving it is not enough: we must also work for its consistency, which is more technologically complex.
Here at Devexperts, we built an exchange from the ground up, so we know a thing or two about setting up state-of-the-art network stacks. We accumulated first-hand experience with all factors that impact latency, and we’ll discuss them in this article.
What I talk about when I talk about low latency
“Low latency” is somewhat of a hype: everyone strives for it, and it’s often used as a marketing buzzword. But what does it actually mean? So, before we discuss it, let’s specify what we mean by “low.” “Low” typically means “sub-millisecond,” so we’ll stick to this definition.
The lower the latency, the better for algorithmic trading and market makers. However, latency is less important for retail clients.
It makes sense to understand how latency is measured first. To compare different technologies, we must compare apples to apples and oranges to oranges. Every exchange has some core “internal” latency (e.g., measured in the matching engine from when the order is received until its execution).
However, this might not include additional network hops between different components and network infrastructure in the exchange data center, converting the data to the output protocol and sending it back to the client. The client’s connectivity to the exchange also matters.
That’s where most of the latency can hide. So, the most critical order latency value is measured from when the order is sent until the corresponding report is received.
Avoiding the term “average” and sticking with statistical data and percentiles is also advisable. We can have an excellent average latency even if a significant number of orders are executed within an unacceptable time frame. It’s always important to set the latency goal in quantiles. For example, “99% of all orders should be executed in less than 100 microseconds.”
Real-world example: Building a high-frequency trading platform for a prop firm
One scenario that demands low and consistent latency is high-frequency trading (HFT), where a few microseconds can mean significant profit or loss. At Devexperts, we tackled an HFT case for one of our clients, India’s top proprietary trading firm.
The client needed a high-frequency trading platform to execute strategies with sub-microsecond precision, and they turned to us for a solution that could handle the speed and complexity of the market. The challenge? Develop a platform capable of running strategies like futures-to-futures arbitrage and cash-futures arbitrage at speeds that leave no room for error. Devexperts delivered a highly customized system that leveraged advanced optimizations, from fine-tuning the TCP stack to eliminating jitter, all while colocating with the National Stock Exchange (NSE) to minimize latency.
The results spoke for themselves. The client received an ultra-low-latency HFT solution that not only matched the performance of their biggest competitors but also solidified their position as one of the top three stock traders in India. By engineering a platform built for speed and reliability, Devexperts helped the firm gain a competitive edge in an industry where milliseconds matter, all while ensuring full compliance with NSE regulations.
First but not foremost, eliminate jitter
Achieving consistent latency is much more technologically complex than having good average numbers. The OS, hardware, networking stack, the virtual machine of your selected programming language, garbage collection, etc., can all cause jitter. Eliminating jitter requires a cautious and sophisticated approach to programming and tuning.
Businesses also prefer higher but consistent latency to a lower but uneven one. For the sake of efficiency, a trading algorithm should be able to predict the impact of latency. And this is only possible if there are no latency spikes.
To better understand jitter, consider this hypothetical scenario: suppose we have a steady order rate of 500 orders per second, and 99% of these orders are executed within 100 microseconds, with one percent executed around 10 milliseconds (a hundred times worse). The average latency would be 199 microseconds, but within an 8-hour trading session, about 150 thousand orders would have unacceptable latency.
Designing the network protocol
Network protocol design is among the factors that heavily affect latency. Exchanges usually expose two different protocol sets: one for trading and another for market data. Latency matters in both: You should be able to send an order as fast as possible and always see the most up-to-date market picture so that your algorithms can react. Again, this is less important for retail customers. Classic research indicates that humans can only perceive latency of 13ms or more (as in video games or movies). In trading, it also takes considerable time to gauge changing market conditions, make decisions, and click UI buttons. This can take seconds. That’s why we don’t usually see retail-oriented exchanges (say, cryptocurrency venues) or brokers offering any high-performance protocols. On those platforms, usability, simplicity, and ease of integration are much more important than latency.
The challenges of low-latency FIX implementation
FIX protocol is the ‘lingua franca’ for modern exchange connectivity. It has undergone several revisions since its original design in 1992, and it remains the most widely adopted protocol in the industry. Today, the protocol vet remains the ubiquitous workhorse of financial integration. But building a genuinely low-latency FIX implementation is challenging for a couple of reasons:
1. The “Standard” FIX protocol is usually text-based. Text is not very effective on the wire and is much slower to parse than a special binary representation. Luckily, more and more exchanges have adopted binary FIX implementations or FIX-like protocols (for example, based on SBE). We can take CME’s iLink as an example of a compact and efficient message.
2. FIX protocol is usually TCP-based. TCP is a universal network protocol that orders and retransmits packets, including lost ones, providing a reliable data stream between two connected endpoints. However, the protocol requires careful tuning. Otherwise, it may cause latency in the range of dozens of milliseconds (in case of a packet loss, for example). Issues with TCP tuning are one of the main reasons exchanges employ proprietary UDP-based transports for market data distribution.
To achieve the lowest possible latency in market data dissemination, exchanges use UDP for their market data protocols. Unlike TCP, UDP doesn’t guarantee that network packets are delivered or that they’re delivered in order. But, being a much simpler protocol, it’s also faster. UDP also offers some unique capabilities, such as multicast distribution, where the traffic is replicated by network equipment and hardware to all parties. This allows for some very efficient market data protocols (take CME’s MDP 3.0 or Nasdaq’s ITCH as an example). One example that explains why these protocols are so fast is the ‘arbitrage’ approach: consumers might listen to several identical channels simultaneously and use the first message they see regardless of the source channel. This helps avoid temporary network, hardware, or OS hiccups.
The limitations of low-level market data protocols
Low-level market data protocols are not without caveats.
1. There is no “gold standard” for data distribution, so the implementations are incompatible. This requires all the connecting parties to roll out their implementations for each exchange, which limits adoption.
2. To overcome UDP limitations and retain performance, the protocols employ some very sophisticated algorithms. It’s the consumer’s responsibility to properly handle all the possible situations that might occur in such a protocol. It requires meticulous engineering, but the benefits can be huge.
A future opportunity for the exchanges could be in offering two market data protocols: a low-level protocol for those consumers who need it (market-makers or algorithmic trading shops) and a simple higher-level TCP-based protocol for retail-oriented consumers (thus driving adoption).
Regulatory compliance and industry standards
Speed is critical, but it’s not the only factor that matters. Regulatory compliance and adherence to industry standards are just as essential for ensuring long-term success. At Devexperts, we don’t just focus on low latency and system performance; we guarantee that every solution meets the strict regulatory requirements of the markets in which it operates. Whether it’s complying with National Stock Exchange (NSE) regulations in India or adhering to global standards for financial data handling and security, our approach is meticulous.
With updates targeting MiFID II requirements, we ensure that our solutions not only meet the speed demands of high-frequency trading but also uphold strict regulatory standards for transparency, data management, and reporting. By integrating MiFID II-compliant updates, we enable financial institutions to meet evolving regulatory challenges without sacrificing agility, positioning them to thrive in a market where compliance and competitiveness go hand in hand.
Last but not least: exchange colocation and network stack
People often mention an exchange colocation as the only way to achieve the lowest latency possible. Indeed, suppose your infrastructure is close to the exchange servers (preferably in the same data center), and your connectivity and network equipment are superior. In that case, you might achieve much better latency than the competition. Even the distance from the exchange server in the same data center might matter in the low-latency world. Sometimes an exchange can roll out some more performant hardware just for a single “important” connection partner – yielding performance benefits. However, we see a growing demand for “equidistance,” and some venues in the market declare that all the connecting parties receive the same quality of service by design. Such venues may even embed throttling in their protocols to worsen the latency so that everyone is in the same equal position.
An important way of decreasing latency is by utilizing a state-of-the-art network stack. All the systems at exchanges and brokers are usually distributed, with multiple independent components connected with a network – so network speed is crucial. Market vendors such as Mellanox and SolarFlare offer high-performance networking equipment. Their adapters usually provide DMA capabilities (meaning the data received from the wire is immediately put into the shared memory for the consumer process to read it, involving no copies). They may include their own heavily optimized software network stacks that bypass the OS implementation completely. Such adapters indeed may provide end-to-end network latency of 1-2 microseconds for a network hop, but working with them requires careful tuning and OS configuration.
Recap
We hope we have provided insight into how to achieve low and consistent latency. If you have experienced decreasing latency or want to discuss building exchanges, write to us! In the meantime, check out some of our recent work.