Update: Nov. 17 2017: All algorithms on this page have been ported to Rust.

In this short post I will demonstrate how to use the order book. I am giving out visualization algorithms for free.

Visualization should lead to truth and understanding. There are different ways of visualizing the order book. We will start with the simplest one.

This is the supply-demand curve.

Order book on Bittrex:

As you can see in the simplest case, the order book is nothing but the transpose of the supply-demand curve zoomed in. This is a simple visualization - most traders can only catch a fleeting glimpse. Its utility is limited to spotting walls at any instant.

Now let’s work our way up towards better visualization.

Establish DB Connection

First, establish a connection to the database and retrieve orderbook updates over 1 hour.

The following is a trick to “proxy/forward” a db connection using ssh. Let’s say the database(dburl) is configured to only accept connections from secure-server, then it’s used to simply forward the port from localhost to dburl.

ssh rhan@secure-server -CNL localhost:5432:dburl:5432

import psycopg2
import sys
from time import time
from pprint import pprint
import matplotlib.pyplot as plt
import numpy as np
from math import floor, ceil
import datetime
import pandas as pd
from matplotlib.colors import LinearSegmentedColormap
import copy

conn_string = "host='localhost' dbname='bittrex' user='rhan' password='[REDACTED]'"
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()

Configure matplotlib

plt.rcParams["font.family"] = "Ubuntu Mono"
plt.grid(False)
plt.axis('on')
plt.style.use('dark_background')

Retrieve Data

We only need the following columns:

  1. ts: Timestamp of the order received by client.

  2. seq: Sequence number to re-order the events received.

  3. size: Order size

  4. price: Price of order

  5. is_bid: Is the order a buy or sell

  6. is_trade: Is it a market order or limit order

Although the exchange may send other fields such as trade id, update type(create, delete, partial fill) and some exchange-specific order types, the above is the minimum set of fields to reconstruct an order book.

h = int(time()) # ! do not change
h_ago = h - 7200

cursor.execute("""
    SELECT ts, seq, size, price, is_bid, is_trade
      FROM orderbook_btc_neo
     WHERE ts > {} AND ts < {}
  ORDER BY seq ASC;
""".format(h_ago, h))
result = cursor.fetchall()
conn.commit()
print (result[-1][0] - result[0][0]) / 60, "minutes"
119.984233332 minutes
events = pd.DataFrame.from_records(result,
                                       columns=["ts", "seq", "size", "price", "is_bid", "is_trade"],
                                       index="seq")

Now let us plot price distributions

prices = np.array(events["price"])
plt.hist(prices)
plt.title("Price distribution")
plt.show()

png

As you can see, most of the liquidity aggregates in 1 bin. So let’s “zoom in” on this bin.

def reject_outliers(data, m = 2.):
    d = np.abs(data - np.median(data))
    mdev = np.median(d)
    s = d/mdev if mdev else 0.
    return data[s<m]

rejected = reject_outliers(prices, m=4)
plt.hist(rejected)
plt.title("Outliers rejected")
plt.show()

png

Internally, plt.hist uses np.histogram which we will later use to get a list of boundaries to rebin the ticks.

Also trivial is the percentage of buy/sell market orders. These orders are considered “aggressive” as in crossing the spread. The plot below is not weighted by order size.

is_bids = events[events["is_trade"]]["is_bid"]
total_events_cnt = is_bids.size
bids = np.sum(is_bids)
asks = total_events_cnt - bids

plt.pie([bids, asks])
plt.legend(["bids", "asks"])
plt.title("bid/ask")
plt.show()

png

Split Events into Separate Categories

We split events into three categories:

  • limit order creation
  • limit order cancellation
  • market order

This is done by comparing the previous liquidity to the new liquidity at a given price level .

cols = ["ts", "seq", "size", "price", "is_bid", "is_trade"]

cancelled = []
created = []
current_level = {}

for seq, (ts, size, price, is_bid, is_trade) in events.sort_index().iterrows():
    if not is_trade:
        prev = current_level[price] if price in current_level else 0
        if (size == 0 or size <= prev):
            cancelled.append((ts, seq, prev - size, price, is_bid, is_trade))
        elif (size > prev):
            created.append((ts, seq, size - prev, price, is_bid, is_trade))
        else: # size == prev
            raise Exception("Impossible")

    current_level[price] = size

cancelled = pd.DataFrame.from_records(cancelled, columns=cols, index="seq")
created =   pd.DataFrame.from_records(created,   columns=cols, index="seq")
trades = events[events['is_trade']]

# sanity check
assert len(cancelled) + len(created) + len(trades) == len(events)

Visualize Individual Orders

Visualizing order cancellation/creation is the lowest level one can go about visualization. The x-axis is time, y-axis is size of order. It provides insights into the activities of individual market participants.

import datetime as dt
import matplotlib.dates as md

def plotVolumeMap(df, volFrom=None, volTo=None, log_scale=True):
    if volFrom:
        df = df[df["size"] >= volFrom]
    if volTo:
        df = df[df["size"] <= volTo]
        
    colors = map(lambda b: '#ffff00' if b else '#00ffff', df["is_bid"])
    
    fig = plt.figure(figsize=(24, 18))
    ax = fig.add_subplot(111)
    if log_scale:
        ax.set_yscale('log')

    plt.scatter(df["ts"], df["size"], c=colors, s=5)
    plt.legend(["bid"])
    plt.show()
plotVolumeMap(cancelled, volFrom=100, volTo=200, log_scale=True)

png

plotVolumeMap(cancelled, volFrom=1, volTo=30, log_scale=True)

png

By filtering out events within a volume range, it is possible to isolate what are most likely individual order placement strategies.

I don’t understand how this bot works. If you do please let me know.

Rebinning Events

Now we convert events into deltas that have a start time and end time. In the process, rebin the events along the time and ticks axis so more liquidity aggregates on each tick.

This way, we can plot how the order book evolves over time.

def to_updates(events):
    tick_bins_cnt = 2000
    step_bins_cnt = 2000
    
    sizes, boundaries = np.histogram(rejected, tick_bins_cnt)
    def into_tick_bin(price):
        for (s, b) in zip(boundaries, boundaries[1:]):
            if b > price > s:
                return s
        return False

    min_ts = result[0][0]
    max_ts = result[-1][0]
    step_thresholds = range(int(floor(min_ts)), int(ceil(max_ts)), int(floor((max_ts - min_ts)/(step_bins_cnt))))
    def into_step_bin(time):
        for (s, b) in zip(step_thresholds, step_thresholds[1:]):
            if b > time > s:
                return b
        return False
        
    updates = {}
    for row in result:
        ts, seq, size, price, is_bid, is_trade = row
        price = into_tick_bin(price)
        time = into_step_bin(ts)
        if not price or not time:
            continue
        if price not in updates:
            updates[price] = {}
        if time not in updates[price]:
            updates[price][time] = 0
        updates[price][time] += size;
    return updates
updates = to_updates(events) # expensive
def plot_price_levels(updates, zorder=0, max_threshold=100, min_threshold=0.5):    
    ys = []
    xmins = []
    xmaxs = []
    colors = []

    for price, vdict in updates.items():
        vtuples = vdict.items()
        vtuples = sorted(vtuples, key=lambda tup: tup[0])
        for (t1, s1), (t2, s2) in zip(vtuples, vtuples[1:]): # bigram
            xmins.append(t1)
            xmaxs.append(t2)
            ys.append(price)
            if s1 < min_threshold:
                colors.append((0, 0, 0))
            elif s1 > max_threshold:
                colors.append((0, 1, 1))
            else:
                colors.append((0, s1/max_threshold, s1/max_threshold))
    plt.hlines(ys, xmins, xmaxs, color=colors, lw=3, alpha=1, zorder=zorder)
#     plt.colorbar()    
plt.figure(figsize=(24, 18))
plot_price_levels(updates, max_threshold=100, min_threshold=10)
plt.show()

png

This visualization technique is from the famous Nanex Research.

An interesting strategy emerges using this visualization:

Note the orders sitting far from the market are pegged to be x basis points from the inside.

Reconstructing Order Book

We can also reconstruct order book to get order book depth at each instant. With the shape of the order book at each time step, we can plot how the best bid and ask change over time.

The algorithm is very straightforward. Order book updates are tracked by keeping a temp copy of the limit order book and store each updated “temp” version into a dictionary indexed by timestamps.

def get_ob():
    most_recent_orderbook = {"bids": {}, "asks": {}}
    orderbook = {}
    for seq, e in events.iterrows():
        if e.is_trade:
            continue
        if e.ts not in orderbook:
            for side, sidedicts in most_recent_orderbook.items():
                for price, size in sidedicts.items():
                    if size == 0:
                        del sidedicts[price]
            most_recent_orderbook["bids" if e.is_bid else "asks"][e.price] = e["size"]
            orderbook[e.ts] = copy.deepcopy(most_recent_orderbook)        
    return orderbook
def best_ba(orderbook):
    best_bids_asks = []

    for ts, ob in orderbook.items():
        try:
            best_bid = max(ob["bids"].keys())
        except: # sometimes L in max(L) is []
            continue
        try:
            best_ask = min(ob["asks"].keys())
        except:
            continue
        best_bids_asks.append((ts, best_bid, best_ask))

    best_bids_asks = pd.DataFrame.from_records(best_bids_asks, columns=["ts", "best_bid", "best_ask"], index="ts").sort_index()
    return best_bids_asks
def plot_best_ba(best_ba_df):
    bhys = []    # bid - horizontal - ys
    bhxmins = [] # bid - horizontal - xmins
    bhxmaxs = [] # ...
    bvxs = []
    bvymins = []
    bvymaxs = []
    ahys = []
    ahxmins = []
    ahxmaxs = []
    avxs = []
    avymins = []
    avymaxs = []

    bba_tuple = best_ba_df.to_records()
    for (ts1, b1, a1), (ts2, b2, a2) in zip(bba_tuple, bba_tuple[1:]): # bigram
        bhys.append(b1)
        bhxmins.append(ts1)
        bhxmaxs.append(ts2)
        bvxs.append(ts2)
        bvymins.append(b1)
        bvymaxs.append(b2)
        ahys.append(a1)
        ahxmins.append(ts1)
        ahxmaxs.append(ts2)
        avxs.append(ts2)
        avymins.append(a1)
        avymaxs.append(a2)

    plt.hlines(bhys, bhxmins, bhxmaxs, color="green", lw=3, alpha=1)
    plt.vlines(bvxs, bvymins, bvymaxs, color="green", lw=3, alpha=1)
    plt.hlines(ahys, ahxmins, ahxmaxs, color="red", lw=3, alpha=1)
    plt.vlines(avxs, avymins, avymaxs, color="red", lw=3, alpha=1)
def plot_trades(trades, size=1, zorder=10):
    trades_colors = map(lambda is_bid: "#00ff00" if is_bid else "#ff0000", trades.is_bid)
    plt.scatter(trades["ts"], trades["price"], s=trades["size"]*size, color=trades_colors, zorder=zorder)
ob = get_ob() # expensive
best_ba_df = best_ba(ob) # expensive
plt.figure(figsize=(24, 18))

plot_best_ba(best_ba_df)
plot_trades(trades, size=0.5, zorder=10)
plot_price_levels(updates, zorder=0, max_threshold=60, min_threshold=1)
plt.ylim([0.00529, 0.005425])

plt.show()

png

def plot_events(df, trades_df=None, min_price=None, max_price=None, log_scale=True):
    if min_price:
        df = df[(df["price"] > min_price)]
        if trades_df is not None:
            trades_df = trades_df[(trades_df["price"] > min_price)]
    if max_price:
        df =  df[(df["price"] < max_price)]
        if trades_df is not None:
            trades_df =  trades_df[(trades_df["price"] < max_price)]
    
    fig = plt.figure(figsize=(24, 18))
    ax = fig.add_subplot(111)
    
    if trades_df is not None:
        plt.title("Trades And Cancellation")
        plt.legend(["Trades", "Cancellation"])
        plt.scatter(trades_df["ts"], trades_df["price"], s=trades_df["size"], color="#00ffff")
        plt.scatter(df["ts"], df["price"], s=df["size"]/30, color="#ffff00")
    else:
        plt.scatter(df["ts"], df["price"], s=df["size"], color="#ffff00")
        plt.legend("Cancelled")
        plt.title("Cancellation")
    if log_scale:
        ax.set_yscale('log')
    
plot_events(cancelled, trades_df=trades, min_price=0.00499, max_price=0.00529, log_scale=False)
plt.show()

png

def plot_ob(bidask, bps=.25):
    # bps: basis points

    best_bid = max(bidask["bids"].keys())
    best_ask = min(bidask["asks"].keys())
    worst_bid = best_bid * (1 - bps)
    worst_ask = best_bid * (1 + bps)
    filtered_bids = sorted(filter(lambda (k,v): k >= worst_bid, bidask['bids'].items()), key=lambda x:-x[0])
    filtered_asks = sorted(filter(lambda (k,v): k <= worst_ask, bidask['asks'].items()), key=lambda x:+x[0])

    bsizeacc = 0
    bhys = []    # bid - horizontal - ys
    bhxmins = [] # bid - horizontal - xmins
    bhxmaxs = [] # ...
    bvxs = []
    bvymins = []
    bvymaxs = []
    asizeacc = 0
    ahys = []
    ahxmins = []
    ahxmaxs = []
    avxs = []
    avymins = []
    avymaxs = []
    
    for (p1, s1), (p2, s2) in zip(filtered_bids, filtered_bids[1:]):
        bvymins.append(bsizeacc)
        if bsizeacc == 0:
            bsizeacc += s1
        bhys.append(bsizeacc)
        bhxmins.append(p2)
        bhxmaxs.append(p1)
        bvxs.append(p2)
        bsizeacc += s2
        bvymaxs.append(bsizeacc)
    
    for (p1, s1), (p2, s2) in zip(filtered_asks, filtered_asks[1:]):
        avymins.append(asizeacc)
        if asizeacc == 0:
            asizeacc += s1
        ahys.append(asizeacc)
        ahxmins.append(p1)
        ahxmaxs.append(p2)
        avxs.append(p2)
        asizeacc += s2
        avymaxs.append(asizeacc)
        
    plt.hlines(bhys, bhxmins, bhxmaxs, color="green")
    plt.vlines(bvxs, bvymins, bvymaxs, color="green")
    plt.hlines(ahys, ahxmins, ahxmaxs, color="red")
    plt.vlines(avxs, avymins, avymaxs, color="red")
    
# d_ts = max(ob.keys())
# d_ob = ob[d_ts]
plt.figure(figsize=(10,4))
plot_ob(d_ob, bps=.05)
plt.ylim([0, 17500])
plt.show()

png

cursor.close()
conn.close()

This is a demonstration of some visualization algorithms using matplotlib. As you can see, they are fairly straightforward to implement.

In the future I plan on porting such visualization to react-stockcharts so I can interactively explore data. The algorithms are already here, what is left to be done is changing matplotlib calls to d3. If you would like to collaborate with me on this, please email me.