In the past 60 years, information technology has evolved on a one-decade-long innovation cycle1. A new cycle has started, which is centered around Data and Artificial Intelligence / Machine Learning and is forcefully gaining momentum. We suspect this cycle will last longer than 10 years and will have a more profound impact on all aspects of our work and lives.
This new evolution has two tenets:
- Data (or Big Data): It is the ability to capture and store vast amounts of structured and unstructured data in a cost-efficient way. This data will be reused in traditional forms (reporting, analytics, use in systems), but more importantly it is the fuel for new classes of machine learning agents.
- Machine Learning2: It is the range of techniques designed to extract patterns from big data and to implicitly program (behaviors and workflows) based on these patterns.
Pricing data in FX etrading is private, so no two clients have the same view of the market and rarely two clients will be trading with the same pools. Furthermore, there is no accepted benchmark rate (‘Universal’ tape) to analyze a client’s own data against. This creates the necessity to capture and store the pricing data, instead of having access to public data.
With 50 currency pairs being streamed by 10 LPs and 2 ECNs a customer can expect to see an update rate of 20,000-60,000 updates per second which can represent 12+Tb of data per year.
This collected data can be used for:
- Real-time and end-of-day analytics (see below for some of the examples); which does not require tick-by-tick data points to be representative.
- Post-mortem analysis of issues, behavior or crashes; which require as granular as possible data set, to detect some of the actual actions by the clients/market participants.
- And finally, this data is the fuel of the current and future Machine Learning agents.
- To make effective use of this market data, other transactional/executions and algorithmic data must be aggregated to have a holistic view of FX trading. Platforms must therefore be capable of combining different streaming data-sets together to generate a single unified view of activity.
- With the amount of data and the different levels of granularity for each use case, the economics of storage and processing become incredibly challenging. As a result, the following multi-layer architecture is emerging:
- Data Ingestion: This is the data capture, and in this case, which is mostly all the market data. Ingestion must happen as early as possible, which is designed to avoid direct impact to the FX trading system. At this stage, little processing of the data is performed to minimize cost while handling large volumes. The fully captured market data and all transactional data is then stored in high-compression/efficient data retrieval, such as time-series, databases3,4.
- Real-time Data: This data set is the same as above, but at a lower granularity. In fact, capturing all data will not provide much more intelligence, but will require significantly higher processing power, and can make downstream analytical tools inoperative in high-load events. Today, there are a large number of newer technologies which provide the ability to rapidly build visualization and analytics. The same tools will be used for end-of-day analytics and reporting.
- Data rehydration: This is the retrieval of data used for ad-hoc event analysis e.g. analysis of a flash crash, or a specific set of trades. More importantly this mechanism is used for Machine Learning training.
This multi-layered architecture is the backbone of heterogeneous and economically viable data management.
Finally, this architecture can leverage all-cloud or hybrid (cloud/in-house) set ups. It can benefit from the reuse of a plethora of open source data management, data storage and data analytics tools.
Real-time analytics; a necessity for a real-time business
A range of analytics have become the ‘norm’ for real-time assessment of Liquidity and Markets. However, just as no two clients see the same data; no two clients have the same understanding of the calculations that drive these metrics.
Current industry Metrics:
- The prevalent metrics used revolve around simple measures of order or market behavior, including:
- Arrival Price / Shortfall / Slippage / Improvement: The change of price over time from a defined point in the order lifecycle to the fully completed trade.
- Hit Ratios: A measure of the likelihood of a trade being executed.
- Algo Benchmark: A comparison between a trade / series of trades and the predicted outcome of an algo against market data
- Market Impact: The movement of the market subsequent to any trade in a defined time window
- Spread: a measurement of the difference between prices over time and in different market conditions.
- As the complexity and availability of data has evolved, TCA and Benchmarking have tried to keep up with ever increasing complexities of defining good and bad flows and behaviors. New concepts have been introduced such as:
- Symmetry: Many analytics revolve around symmetry of behavior. Commonly the symmetry of Last Look, and if a trade is accepted and rejected in market conditions that favor or penalize the LP. However, symmetry extends far beyond this, where an LP may hold messages informing clients of ‘Rejects’ longer than holding information on ‘Fills’.
- Cost of Rejects: The loss as a result of a trade rejected by an LP. This is a complex measure that looks at when an LP either rejects you, sends alternate pricing that could have been executed on and/or the final fill price following the rejection.
- Last Look & Hold Times: An analysis of the time observed between placing the trade and receiving a fill or reject; the conditions that influence this and the impact on executions and prices.
- LP analytics: Real-time analysis of LP toxicity and selection. Hold times, latencies, spread analytics, decay and pricing models
Why TCA has Failed and the future of ML-driven analysis:
The explosion of TCA products and companies highlights the incredible focus of the financial industry on innovation to bring transparency to every complex and fragmented market. Despite all the advances in metrics and measures the reality of TCA is that it has largely failed the industry.
Deeper insight into the complexity of making a decision highlights how TCA is only scratching the surface of how to predict and improve trading in such complex markets (with the presence of some bad players). While the ability to monitor the behavior of an LP and general market trends has steadily grown into the most recent real-time analytical tools, this is not easily translated into actionable decisions within most FX etrading systems.
Fundamentally, TCA and analytics are today a tactical tool, which is disconnected from the actual trading systems.
Machine learning and e-Trading
Machine Learning (ML) and data-driven software are big leaps in computing. More critically, ML brings Implicit programming to the fore, enabling coding of new programs without the necessity to code (or explicitly know) all the behavior of a machine.
FX etrading is at an early stage of adopting ML. The current applications of ML in this domain are:
- Analytics of big data for both takers and makers: The behavior of LPs, such as the pricing/spread management in volatile markets, or Last Look/Rejects are patterns that are well suited to a ML-type of techniques. Similarly, for a liquidity provider, the client trading behavior, for instance detecting ‘bad behavior’, can be a good candidate.
- Automated trading: This is the automation of certain trading workflows and logic of execution algorithms, such as a SOR. One simple example is adjusting offsets of a peg algorithm using ML by predicting short term volatility.
What ML provides in the FX trading context is a low cost and efficient mechanism to extract data patterns and implicitly find new trading workflows. It means that a firm can monitor 100’s to 10,000’s of patterns or behavior with automated decisions built into the FX trading.
Change in paradigm
One of the consequences of such a shift for the incorporation of ML into the FX trading technology is the need to move to a technology which is open and can be reconfigured in real-time to take dynamic parameters for its decision-making. For instance, an ML agent can detect if an LP has a high Last Look/Reject rate (despite showing better prices). A SOR could then use this information inside the decision-making process to favour an LP with lower Reject rates (despite the fact that the price is slightly worse). In this case, the cost/probability of rejects may be higher than the more visible price improvement.
We foresee a more radical evolution, which we call the Data-driven Intelligent Architecture. This New Architecture will then be constructed around:
- A set of price and liquidity Predictive Agents (which today would be the rudimentary pre-trade TCA) which is designed to predict the outcome of the execution. The prediction of deals simultaneously with the price optimization for timing / impact, and liquidity optimization for the selection of pools and levels of liquidity.
- A set of Execution Agents which are the execution methods used, including the algorithmic trading / SOR capabilities,
- A calculation and data presentation set of Data Agents, which work in a feedback loop. They aim to quickly update the predictive models, as new data points become available. To be clear, a “dirty” but fast and frequently updated prediction, is far better than a superior but slow model.
The whole process is tied together by Actors, which can / will be human users, or other machines in conjunction with human actors, taking Meta-decisions on the path of the execution.
The above proposed New Architecture is possible because Machine Learning allows the migration away from the current expensive demand of explicitly building and updating all the predictive models. Instead, the new ML techniques can implicitly update the models that are first trained with big data sets. Then the trained models are tightly integrated into the execution software to provide new predictions but can also be re-trained with each execution.
FX etrading has entered a new phase of innovation which is based on the radical technology changes driven by Artificial Intelligence and more specifically Machine Learning. FX etrading systems have gained in speed and capacity but most of the data is unused today. Machine learning provides a low cost and highly scalable set of technologies to use the huge amount of data to find patterns and implicitly program our systems.
The proposed new Intelligent Data-driven architecture, which will emerge in the next few years, requires in addition, that the FX etrading system is tightly integrated to the ML agents to reprogram themselves with new real-times predictions. It, more importantly, is the provider of data to the ML agent for their training and future evolution. The current discussions on AI and ML are either overly negative; viewed as ‘killing’ jobs . Or overly optimistic; seen as the panacea of all unsolved problems. Both assertions are false. FX etrading has already automated a great number of routine jobs.
The yesteryear trading floors with hundreds of traders on the phone are today replaced by fewer, more silent traders manning exceptions in systems that process, in milli-to-microseconds, tens of thousands (and more) of quotes and orders each second. In reality, ML will bring back some intelligence to systems which for a long time aimed only to become faster, but not always smarter. ML will allow near time predictions and analysis which will drive systems and decisions. It means that the silent trader will now have a better understanding of what the system, and overall market is doing. Any company not investing today in this area, will be at the risk of not being competitive in the near future.
The other side of the coin is that ML is today, and for the foreseeable future (10-15 years), a pattern extraction mechanism, the natural evolution of IT and data. The Human brain is still far more versatile in unknown situations and fuzzy logic (the 50-50 choice), and this will remain the case for a long time.
1.The previous IT cycle, from 2007 to 2017 is the Mobile age.
2.Refer to Quod Financial Whitepapers on Machine Learning - ‘Machine learning & AI for Trading’ and ‘The future of TCA and Machine learning’ - www.quodfinancial.com/white-papers-downloads/
3. A time-series database is a database with the principle data index is time.
4.The recommendation for time-series is based on the fact that most trading activity is ordered by time and analysis/machine learning is best suited with a time logic for ordering the different events.