Last night on a radio show I heard a comment from trader Basil Oleynik about the failure on the MICEX-RTS exchange, “Technical glitches happen at precisely those important moments when conditions on foreign markets are changing.” Let me make a disclaimer first. Despite the fact that the RTS Stock Exchange and MICEX were our customers even before the merger, and remain so afterwards, we [Devexperts] have nothing to do with the code of their trading systems, so I can’t say anything about the technical details of this particular crash (but even if we had had something to do with this code, I still wouldn’t be able to say anything because of the usual NDA in such cases). But I can talk about technical failures in general in more detail.

Trading systems of stock exchanges, brokers, and traders are most heavily loaded at exactly such moments of abrupt changes in the market. People panic, and automated systems are programmed to react to certain events, which causes a chain reaction with other automated systems. There is a snowball effect, and even when not leading to crashes, it can still have an unobvious impact on the market. One such well-known incident, called Flash Crash occurred on May 6, 2010. Accenture fell from $38 per share to a penny in less than a minute, and then recovered its value again. Also, if you request data on trades during this time period, you can see that trades at one cent had actually occurred. By the way, the incident was not the consequence and did not lead to any failures on the exchanges of the United States, but a number of trades during this time period were canceled.

The system can be tested under heavy loads (aka stress test) to make sure that it can withstand them. But this is just one type of software testing. In a nontrivial system it is simply impossible to identify all errors in advance. To reduce the number of remaining bugs you need to a lot of resources and to use different types of testing, as each specific type of test tends to find some types of bugs well, but hardly helps with other types of bugs.

Increased load increases the probability of remaining bugs manifesting themselves, especially in the modern world where any kind of minimally complicated system consists of multiple processes or threads interacting with each other via messages or via shared memory. This gives a huge number of system execution histories, depending on the order of execution of various operations in different parts of the system. Sometimes a stress test can reproduce an error only after weeks of work, but a small difference in the load pattern during the actual operation is enough to cause a crash in the first few.

Very high quality software is difficult to make and is very expensive. Of course, international trading platforms, brokers, and traders have large turnover and profits. They can afford to invest huge sums of money into creation of their own software systems and to ensure their quality. The future of the Russian financial market is still ahead, and occasional technical glitches are inevitable growth problems.