板情報の可視化(フル板)

板情報の可視化(フル板)

注意

ToDo: 解説を書く

import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

Tardis.dev よりフル板のCSVをダウンロードして読み込みます。

df = pd.read_csv("coinbase_incremental_book_L2_2019-07-01_BTC-USD.csv.gz", nrows=100000)
df.to_pickle("coinbase_incremental_book_L2_2019-07-01_BTC-USD.pickle", protocol=4)

注意

実際には上記のコードで取得しますが、ここでは取得済みのpickleファイルから読み込みます。

df = pd.read_pickle("coinbase_incremental_book_L2_2019-07-01_BTC-USD.pickle")
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 8 columns):
 #   Column           Non-Null Count   Dtype  
---  ------           --------------   -----  
 0   exchange         100000 non-null  object 
 1   symbol           100000 non-null  object 
 2   timestamp        100000 non-null  int64  
 3   local_timestamp  100000 non-null  int64  
 4   is_snapshot      100000 non-null  bool   
 5   side             100000 non-null  object 
 6   price            100000 non-null  float64
 7   amount           100000 non-null  float64
dtypes: bool(1), float64(2), int64(2), object(3)
memory usage: 5.4+ MB
df.loc[:, "timestamp"] = pd.to_datetime(df.loc[:, "timestamp"], unit="us")
df.loc[:, "local_timestamp"] = pd.to_datetime(df.loc[:, "local_timestamp"], unit="us")
df.sort_values("timestamp", inplace=True)
df.set_index("timestamp", drop=False, inplace=True)
df.index.name = None
# amount=0を欠損値とする
df.loc[df.loc[:, "amount"] == 0, "amount"] = np.nan
df.head()
exchange symbol timestamp local_timestamp is_snapshot side price amount
2019-07-01 00:00:00.923 coinbase BTC-USD 2019-07-01 00:00:00.923 2019-07-01 00:00:01.444699 False bid 10751.56 5.345
2019-07-01 00:00:00.924 coinbase BTC-USD 2019-07-01 00:00:00.924 2019-07-01 00:00:01.444749 False bid 10751.56 1.000
2019-07-01 00:00:00.926 coinbase BTC-USD 2019-07-01 00:00:00.926 2019-07-01 00:00:01.444836 False ask 10775.89 NaN
2019-07-01 00:00:00.926 coinbase BTC-USD 2019-07-01 00:00:00.926 2019-07-01 00:00:01.444908 False bid 10733.75 NaN
2019-07-01 00:00:00.926 coinbase BTC-USD 2019-07-01 00:00:00.926 2019-07-01 00:00:01.444922 False bid 10735.06 NaN

外れ値の確認

df.describe()
price amount
count 1.000000e+05 7.220700e+04
mean 1.334097e+05 3.626727e+00
std 2.695268e+07 1.442468e+02
min 1.000000e-02 4.000000e-08
25% 1.068040e+04 1.367835e-01
50% 1.072972e+04 7.000000e-01
75% 1.075993e+04 1.431781e+00
max 8.385506e+09 2.961931e+04
px.box(df, x="price")
px.line(df.loc[:, "price"].resample("10s").mean())

前半のデータに外れ値があるため、除外します。

cut_df = df.loc["2019-07-01 00:02":, :]
px.line(cut_df, x="timestamp", y="price")

四分位値を算出して、外れ値を除外します。

lower, upper = cut_df.loc[:, "price"].quantile((0.005, 0.99))
filtered_df = cut_df.loc[
    (cut_df.loc[:, "price"] > lower) & (cut_df.loc[:, "price"] < upper)
]
px.box(filtered_df, x="price")
df.describe()
price amount
count 1.000000e+05 7.220700e+04
mean 1.334097e+05 3.626727e+00
std 2.695268e+07 1.442468e+02
min 1.000000e-02 4.000000e-08
25% 1.068040e+04 1.367835e-01
50% 1.072972e+04 7.000000e-01
75% 1.075993e+04 1.431781e+00
max 8.385506e+09 2.961931e+04

散布図に可視化します。

side = filtered_df.groupby("side")
bid, ask = side.get_group("bid"), side.get_group("ask")

fig = go.Figure(
    [
        go.Scattergl(
            mode="markers",
            x=bid.loc[:, "timestamp"],
            y=bid.loc[:, "price"],
            name="bid",
            marker={
                "color": np.log(bid.loc[:, "amount"]),
                # "colorscale": "Blues",
            },
        ),
        go.Scattergl(
            mode="markers",
            x=ask.loc[:, "timestamp"],
            y=ask.loc[:, "price"],
            name="ask",
            marker={
                "color": -np.log(ask.loc[:, "amount"]),
                # "colorscale": "Reds",
            },
        ),
    ],
    layout=go.Layout(
        title="exchange: coinbase, instrument: btc-usd", width=1400, height=600
    ),
)
fig.show()