Recently, I worked on market microstructure simulation to backtest a HFT strategy. In this post, I will show you how to build a simple but effective simulator. Hope you find it helpful.
In the past, I worked on strategies whose inputs are based on OHLC candles, but developing microstructure based strategies requires microstructure level simulation: post LOs to historical orderbook states, replay subsequent events and get execution info.
The first simulator: Poisson Process
Avellaneda-Stoikov’s model of orderflow intensity is used to estimate probability of execution. It doesn’t simulate microstructure but works for backtests. The model is based on market dynamics and requires 2 parameters that can be estimated using tick data.
The idea is: arrival rate of MOs matching posted depth at time iis modeled by a Poisson Process with intensity
where is the current fill probability at best price, the arrival rate of MO at time and is probability that the size of the MO would be greater than the size of all LOs of price less than combined. To see how it is derived you need to look at bottom left of page 220.
Note: the parameters and are estimated for each of 10 minute segments in backtest period.
The binary is sampled from a Bernoulli distribution: where is the strategy-dependent maximum time interval before cancellation.
It’s very fast to compute - it takes ~20 seconds to backtest a month’s data using 32 threads on a Threadripper 1950X. However, some LOs had questionable fills in some scenarios I visualized. So I decided to build a real orderbook microstructure simulator.
The second simulator: SimBook
I implemented a market simulator called SimBook based on a flowchart from Robert Almgren’s slides which was very helpful(thank you!).
The idea is to build an orderbook matching engine that follows a set of pessimistic exchange matching rules.
Since I couldn’t find the original paper “Combining historical data with a market simulator for testing algorithmic trading” (presumably because it was a term paper for a course hence never published), it was hard to understand some of the rationale behind these design decisions but it was mostly self-explanatory.
For my implementation, I relaxed some assumptions but they don’t affect realism too much. For example, simulated orders are allowed to establish new price levels as long as they are worse than bba. I also implemented some heuristics for maximum order size and latency restrictions which makes it slightly more realistic.
I was able to achieve great results with this approach. Here are some example executions from my backtests:
The downside of this approach is: market impact not being taken into account. At this time, my strategies are dumb so I don’t need more sophisticated simulators just yet.
The third simulation: Queue-Reactive Model
This one I did not implement but explored in depth. I got inspired to develop a signal that uses Kalman filter to estimate the implicit spread. I will explain this in a separate blog post.