{
"cells": [
{
"cell_type": "markdown",
"id": "58967242-acce-4597-96e3-76cffafa8df0",
"metadata": {},
"source": [
"# OrderbookのDataFrameを整然データに変換"
]
},
{
"cell_type": "markdown",
"id": "f7cc24ea-8b8e-43e4-b04c-33f31bcaf3af",
"metadata": {},
"source": [
"ここでは [Public Data API From CryptoChassis](https://github.com/crypto-chassis/cryptochassis-data-api-docs) から、Orderbook(depth=10)のデータを取得します。"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2529d224-7baa-4a21-b4a8-704148e477b6",
"metadata": {},
"outputs": [],
"source": [
"import gzip\n",
"import io\n",
"import json\n",
"from urllib import request\n",
"\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "00cabca4-3434-486f-894d-eecc249a45dd",
"metadata": {},
"source": [
"```python\n",
"url = \"https://api.cryptochassis.com/v1/market-depth/coinbase/btc-usd?startTime=1594166400&depth=10\"\n",
"\n",
"with request.urlopen(url) as res:\n",
" json_data = json.loads(res.read().decode())\n",
"\n",
"json_data\n",
"```\n",
"\n",
"```json\n",
"{'urls': [{'startTime': {'seconds': 1594166400,\n",
" 'iso': '2020-07-08T00:00:00.000Z'},\n",
" 'endTime': {'seconds': 1594252800, 'iso': '2020-07-09T00:00:00.000Z'},\n",
" 'url': 'https://marketdata-e0323a9039add2978bf5b49550572c7c.s3.amazonaws.com/market_depth/coinbase/btc_usd/1594166400.csv.gz?AWSAccessKeyId=ASIATPNB7YZIRYRUYH6E&Expires=1647314231&Signature=15BugAnUze5vX9Xp4d%2ByxR5Dmjw%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEPD%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJIMEYCIQC9757DDmjrKc%2Ff37ZukleooFJnSjX5yNP60fqvjsnZKAIhAKXxh%2Fb8GX941U13FecZJMSN9UZKrjJUAVVPq8RWURXDKvoDCGkQAhoMMjM5MjQ3Mjc5Njk3Igx6%2BHbD4P%2B%2FvMwGNXMq1wNLsVB7ZHoDj9XV%2BNveAYPCD7AJC2IiwjLz9Ai6295MQ7g%2FENH4FzM%2FLiP0h6pkRZ0lhjXvyFAQ%2FhY6pwDrkIn11IbO6JbrN4zcYvv6GcdRghTXrVr7LFQaXd%2Foqrp5DRj5n9jVPeENClWMjh11y1ry%2BA7MHgDXKTU6g03K2uwUBgmD7Q%2BR%2BP46EgZP3SfOTKRokZEooMmHwg9QMQ0M6DYUzUh8JLZMpDWX9IbsqnOAv%2B%2BHD7OFiUAzMdpjWynLbM5ar%2Fd0JV8P8mRyQoEm63VdXAGWF4g0bWcUwhXcy%2BRBQIeHhqvfH0cBaJCd0G5103bYsOtgV1H%2FDOWv2lUK4jNo06f29FO50OpPOUcl69G1zjSiCwrYNgOBHrsCyA3EM9xRySZG85oeFgt6ka11cYLRAPVexAz9dYq%2B1bJCptk6uaeoglRLEVm%2FTXRRpoZpwSj0IDRdRJmDSSzlm%2Fe5mb6Vxl%2Ft7J%2BCA7kHwaWgA3XWta9wvlOPN%2FR6rt8ZS3ir3EDeAsRo9IY6KXTW1X00Qgsz4LX%2F16bDwWjVMFXfnv715AdgYKUAc5%2BeqyGGUNfpyb7N%2FAIVQEZvqBH0gvFCbggbZvcgDO%2FQHFFPhfVXUR9wrl6Em8ku%2FvAwwqi%2FkQY6pAEV0v%2B9yGcGGUT%2BWVfEcVRuUUuNdM4QrrOfxZXhdgEGzv9SRMCbRNjxmpnXYJAY54gXRb9Ymbqi0mm75V1x54sHc8b7ccfzQaOF37HNWdkRy2LQ%2B77ou4IOgyGP0Mf5MHUP%2FQxff3MLsrmzwTUnHj5ferevzVtozKeH%2Be%2BPtsqQ3nU%2BlosLtUljiYnwzeR1%2Fz4CbSmAU%2Fdp%2FaMjU%2F0NkpwxP5LknQ%3D%3D'}],\n",
" 'expiration': '300 seconds'}\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "694ca221-b576-4923-ba26-9b15df3306bc",
"metadata": {},
"source": [
"APIから得られたURLをもとに、gzipファイルを展開し、CSVファイルを `read_csv` 関数でDataFrameに読み込みます。\n",
"\n",
"```python\n",
"with request.urlopen(json_data[\"urls\"][0][\"url\"]) as res:\n",
" gz_data = res.read()\n",
"\n",
"with gzip.open(io.BytesIO(gz_data), \"rt\") as f:\n",
" csv_data = f.read()\n",
"\n",
"raw_df = pd.read_csv(io.StringIO(csv_data))\n",
"del json_data, gz_data, csv_data\n",
"raw_df.columns = \"timestamp\", \"bid\", \"ask\"\n",
"```\n",
"\n",
"```{attention}\n",
"実際には上記のコードで取得しますが、ここでは取得済みのCSVファイルから読み込みます。\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a79d3986-b302-4a26-a558-f838d04c86e1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" timestamp | \n",
" bid | \n",
" ask | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1594245605 | \n",
" 9433.24_2.30558256|9433.07_1.6680282|9432.49_0... | \n",
" 9433.25_0.2|9433.63_0.159|9433.68_0.06|9434_0.... | \n",
"
\n",
" \n",
" 1 | \n",
" 1594245606 | \n",
" 9433.24_2.30558256|9433.07_1.6680282|9432.24_1... | \n",
" 9433.25_0.2|9433.63_0.159|9433.67_0.06|9434_0.... | \n",
"
\n",
" \n",
" 2 | \n",
" 1594245607 | \n",
" 9433.24_2.32361076|9433.07_1.65|9432.24_1.65|9... | \n",
" 9433.25_0.17401468|9433.63_0.159|9433.67_0.06|... | \n",
"
\n",
" \n",
" 3 | \n",
" 1594245608 | \n",
" 9433.61_0.60504644|9433.51_1.15223697|9433.25_... | \n",
" 9433.62_0.2|9433.63_0.159|9433.66_0.06|9434_0.... | \n",
"
\n",
" \n",
" 4 | \n",
" 1594245609 | \n",
" 9433.51_1.17026517|9433.25_1.65|9433.24_1.65|9... | \n",
" 9433.52_0.637005|9433.61_0.10947951|9433.62_0.... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" timestamp bid \\\n",
"0 1594245605 9433.24_2.30558256|9433.07_1.6680282|9432.49_0... \n",
"1 1594245606 9433.24_2.30558256|9433.07_1.6680282|9432.24_1... \n",
"2 1594245607 9433.24_2.32361076|9433.07_1.65|9432.24_1.65|9... \n",
"3 1594245608 9433.61_0.60504644|9433.51_1.15223697|9433.25_... \n",
"4 1594245609 9433.51_1.17026517|9433.25_1.65|9433.24_1.65|9... \n",
"\n",
" ask \n",
"0 9433.25_0.2|9433.63_0.159|9433.68_0.06|9434_0.... \n",
"1 9433.25_0.2|9433.63_0.159|9433.67_0.06|9434_0.... \n",
"2 9433.25_0.17401468|9433.63_0.159|9433.67_0.06|... \n",
"3 9433.62_0.2|9433.63_0.159|9433.66_0.06|9434_0.... \n",
"4 9433.52_0.637005|9433.61_0.10947951|9433.62_0.... "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw_df = pd.read_csv(\"btcusd_2020-07-08.zip\")\n",
"raw_df.head()"
]
},
{
"cell_type": "markdown",
"id": "aeff76b6-9340-454f-bd2d-2a1d81fa04a5",
"metadata": {},
"source": [
"読み込んだデータは雑然データのため、整然データに整形します。\n",
"\n",
"```{hint}\n",
"整然データとは何か: https://id.fnshr.info/2017/01/09/tidy-data-intro/\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "e0334891-67bb-4605-9282-cf17258a177f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" price | \n",
" size | \n",
" timestamp | \n",
" side | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9433.24 | \n",
" 2.305583 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9434.32 | \n",
" 1.000000 | \n",
" 2020-07-08 22:00:05 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9434.35 | \n",
" 0.530100 | \n",
" 2020-07-08 22:00:05 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9434.90 | \n",
" 0.015848 | \n",
" 2020-07-08 22:00:05 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9435.00 | \n",
" 0.050000 | \n",
" 2020-07-08 22:00:05 | \n",
" ask | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" price size timestamp side\n",
"2020-07-08 22:00:05 9433.24 2.305583 2020-07-08 22:00:05 bid\n",
"2020-07-08 22:00:05 9434.32 1.000000 2020-07-08 22:00:05 ask\n",
"2020-07-08 22:00:05 9434.35 0.530100 2020-07-08 22:00:05 ask\n",
"2020-07-08 22:00:05 9434.90 0.015848 2020-07-08 22:00:05 ask\n",
"2020-07-08 22:00:05 9435.00 0.050000 2020-07-08 22:00:05 ask"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def format_dataframe_by_side(df, side):\n",
" new_df = (\n",
" df.loc[:, side]\n",
" .str.split(\"|\")\n",
" .explode()\n",
" .str.split(\"_\", expand=True)\n",
" .rename({0: \"price\", 1: \"size\"}, axis=1)\n",
" )\n",
" new_df = new_df.astype(float)\n",
" new_df.loc[:, \"timestamp\"] = pd.to_datetime(raw_df.loc[:, \"timestamp\"], unit=\"s\")\n",
" new_df.loc[:, \"side\"] = side\n",
" new_df.set_index(\"timestamp\", drop=False, inplace=True)\n",
" new_df.index.name = None\n",
" return new_df\n",
"\n",
"\n",
"def format_dataframe(df):\n",
" new_df = pd.concat(\n",
" [format_dataframe_by_side(df, \"bid\"), format_dataframe_by_side(df, \"ask\")]\n",
" )\n",
" return new_df.sort_index()\n",
"\n",
"\n",
"df = format_dataframe(raw_df)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "b38eb8b2-1dea-4a60-9d06-cdabe235903a",
"metadata": {},
"source": [
"ここからは関数の内容を分解して解説します。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "5b840a25-cb11-4f2a-80be-dd622e098166",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" timestamp | \n",
" bid | \n",
" ask | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1594245605 | \n",
" 9433.24_2.30558256|9433.07_1.6680282|9432.49_0... | \n",
" 9433.25_0.2|9433.63_0.159|9433.68_0.06|9434_0.... | \n",
"
\n",
" \n",
" 1 | \n",
" 1594245606 | \n",
" 9433.24_2.30558256|9433.07_1.6680282|9432.24_1... | \n",
" 9433.25_0.2|9433.63_0.159|9433.67_0.06|9434_0.... | \n",
"
\n",
" \n",
" 2 | \n",
" 1594245607 | \n",
" 9433.24_2.32361076|9433.07_1.65|9432.24_1.65|9... | \n",
" 9433.25_0.17401468|9433.63_0.159|9433.67_0.06|... | \n",
"
\n",
" \n",
" 3 | \n",
" 1594245608 | \n",
" 9433.61_0.60504644|9433.51_1.15223697|9433.25_... | \n",
" 9433.62_0.2|9433.63_0.159|9433.66_0.06|9434_0.... | \n",
"
\n",
" \n",
" 4 | \n",
" 1594245609 | \n",
" 9433.51_1.17026517|9433.25_1.65|9433.24_1.65|9... | \n",
" 9433.52_0.637005|9433.61_0.10947951|9433.62_0.... | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 7063 | \n",
" 1594252795 | \n",
" 9435.27_16.94086454|9435.26_0.13966131|9435.05... | \n",
" 9435.28_8.05085315|9435.3_0.110163|9439.5_0.11... | \n",
"
\n",
" \n",
" 7064 | \n",
" 1594252796 | \n",
" 9435.27_16.94086454|9435.26_0.13966131|9435.03... | \n",
" 9435.28_7.80012384|9435.3_0.110163|9439.5_0.11... | \n",
"
\n",
" \n",
" 7065 | \n",
" 1594252797 | \n",
" 9435.27_16.94086454|9435.26_0.13966131|9435.04... | \n",
" 9435.28_7.79200253|9435.3_0.110163|9439.41_0.0... | \n",
"
\n",
" \n",
" 7066 | \n",
" 1594252798 | \n",
" 9435.27_17.05242522|9435.26_0.13966131|9435.03... | \n",
" 9435.28_7.79200253|9435.3_0.110163|9439.4_0.11... | \n",
"
\n",
" \n",
" 7067 | \n",
" 1594252799 | \n",
" 9435.27_17.05242522|9435.26_0.13966131|9435.04... | \n",
" 9435.28_7.00936353|9435.3_0.110163|9439.4_0.05... | \n",
"
\n",
" \n",
"
\n",
"
7068 rows × 3 columns
\n",
"
"
],
"text/plain": [
" timestamp bid \\\n",
"0 1594245605 9433.24_2.30558256|9433.07_1.6680282|9432.49_0... \n",
"1 1594245606 9433.24_2.30558256|9433.07_1.6680282|9432.24_1... \n",
"2 1594245607 9433.24_2.32361076|9433.07_1.65|9432.24_1.65|9... \n",
"3 1594245608 9433.61_0.60504644|9433.51_1.15223697|9433.25_... \n",
"4 1594245609 9433.51_1.17026517|9433.25_1.65|9433.24_1.65|9... \n",
"... ... ... \n",
"7063 1594252795 9435.27_16.94086454|9435.26_0.13966131|9435.05... \n",
"7064 1594252796 9435.27_16.94086454|9435.26_0.13966131|9435.03... \n",
"7065 1594252797 9435.27_16.94086454|9435.26_0.13966131|9435.04... \n",
"7066 1594252798 9435.27_17.05242522|9435.26_0.13966131|9435.03... \n",
"7067 1594252799 9435.27_17.05242522|9435.26_0.13966131|9435.04... \n",
"\n",
" ask \n",
"0 9433.25_0.2|9433.63_0.159|9433.68_0.06|9434_0.... \n",
"1 9433.25_0.2|9433.63_0.159|9433.67_0.06|9434_0.... \n",
"2 9433.25_0.17401468|9433.63_0.159|9433.67_0.06|... \n",
"3 9433.62_0.2|9433.63_0.159|9433.66_0.06|9434_0.... \n",
"4 9433.52_0.637005|9433.61_0.10947951|9433.62_0.... \n",
"... ... \n",
"7063 9435.28_8.05085315|9435.3_0.110163|9439.5_0.11... \n",
"7064 9435.28_7.80012384|9435.3_0.110163|9439.5_0.11... \n",
"7065 9435.28_7.79200253|9435.3_0.110163|9439.41_0.0... \n",
"7066 9435.28_7.79200253|9435.3_0.110163|9439.4_0.11... \n",
"7067 9435.28_7.00936353|9435.3_0.110163|9439.4_0.05... \n",
"\n",
"[7068 rows x 3 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw_df"
]
},
{
"cell_type": "markdown",
"id": "c1f41b22-9ef0-4a24-8306-1ab78e24b31c",
"metadata": {},
"source": [
"「bid」列、「ask」列は板情報が `|` の文字列で連結されています。これを `str.split` メソッドで分割します。"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "dfd51790-9ef1-402d-8fc7-de4131c6f598",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 [9433.24_2.30558256, 9433.07_1.6680282, 9432.4...\n",
"1 [9433.24_2.30558256, 9433.07_1.6680282, 9432.2...\n",
"2 [9433.24_2.32361076, 9433.07_1.65, 9432.24_1.6...\n",
"3 [9433.61_0.60504644, 9433.51_1.15223697, 9433....\n",
"4 [9433.51_1.17026517, 9433.25_1.65, 9433.24_1.6...\n",
"Name: bid, dtype: object"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"split_ser = raw_df.loc[:, \"bid\"].str.split(\"|\")\n",
"split_ser.head()"
]
},
{
"cell_type": "markdown",
"id": "02682c91-7dbb-46b7-b30f-46404cad271b",
"metadata": {},
"source": [
"Seriesの各要素がリスト型になっていることが確認できます。"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "8f250ebf-c6ac-4316-b3fd-7b24de90bf0f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['9433.24_2.30558256',\n",
" '9433.07_1.6680282',\n",
" '9432.49_0.04',\n",
" '9432.24_1.65',\n",
" '9432.16_1.65',\n",
" '9432.15_1.65',\n",
" '9432.09_0.25112245',\n",
" '9432_0.60011883',\n",
" '9431.6_1.585',\n",
" '9430.88_1.586']"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"split_ser[0]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "5c790931-ff9a-4111-87ee-ba4e960ac084",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"list"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(split_ser[0])"
]
},
{
"cell_type": "markdown",
"id": "ea288793-6f0c-419e-83b3-1db014b4faaa",
"metadata": {},
"source": [
"`explode` メソッドは各要素のオブジェクトを行に展開します。\n",
"\n",
"```{tip}\n",
"`explode` メソッドは整然データを生成する際に便利です\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "aad17ebc-159c-4dcb-a5de-fa96f15cb77d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 9433.24_2.30558256\n",
"0 9433.07_1.6680282\n",
"0 9432.49_0.04\n",
"0 9432.24_1.65\n",
"0 9432.16_1.65\n",
" ... \n",
"7067 9435_0.0477\n",
"7067 9434.65_0.06359115\n",
"7067 9434.64_2\n",
"7067 9434.39_0.555\n",
"7067 9434.25_0.203\n",
"Name: bid, Length: 70680, dtype: object"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"explode_ser = split_ser.explode()\n",
"explode_ser"
]
},
{
"cell_type": "markdown",
"id": "11eb7caf-a64c-45c6-b8f4-eba96a5286f6",
"metadata": {},
"source": [
"`explode_ser` の各要素は `価格_サイズ(枚数)` の形式になっています。これを `str.split` メソッドで分割します。\n",
"\n",
"```{tip}\n",
"`str.split` メソッドの引数 `expand=True` にすることで、分割されたデータがDataFrameの列として展開されます\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "977627fd-37af-4d68-a5e5-2f2a891ad171",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 9433.24 | \n",
" 2.30558256 | \n",
"
\n",
" \n",
" 0 | \n",
" 9433.07 | \n",
" 1.6680282 | \n",
"
\n",
" \n",
" 0 | \n",
" 9432.49 | \n",
" 0.04 | \n",
"
\n",
" \n",
" 0 | \n",
" 9432.24 | \n",
" 1.65 | \n",
"
\n",
" \n",
" 0 | \n",
" 9432.16 | \n",
" 1.65 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 7067 | \n",
" 9435 | \n",
" 0.0477 | \n",
"
\n",
" \n",
" 7067 | \n",
" 9434.65 | \n",
" 0.06359115 | \n",
"
\n",
" \n",
" 7067 | \n",
" 9434.64 | \n",
" 2 | \n",
"
\n",
" \n",
" 7067 | \n",
" 9434.39 | \n",
" 0.555 | \n",
"
\n",
" \n",
" 7067 | \n",
" 9434.25 | \n",
" 0.203 | \n",
"
\n",
" \n",
"
\n",
"
70680 rows × 2 columns
\n",
"
"
],
"text/plain": [
" 0 1\n",
"0 9433.24 2.30558256\n",
"0 9433.07 1.6680282\n",
"0 9432.49 0.04\n",
"0 9432.24 1.65\n",
"0 9432.16 1.65\n",
"... ... ...\n",
"7067 9435 0.0477\n",
"7067 9434.65 0.06359115\n",
"7067 9434.64 2\n",
"7067 9434.39 0.555\n",
"7067 9434.25 0.203\n",
"\n",
"[70680 rows x 2 columns]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bid_df = explode_ser.str.split(\"_\", expand=True)\n",
"bid_df"
]
},
{
"cell_type": "markdown",
"id": "a2086880-e32d-4c60-a0fa-5396a2ca9fe2",
"metadata": {},
"source": [
"列名を変更します。"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "7180e2e2-c448-450f-84d5-0ab3053e3a3d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" price | \n",
" size | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 9433.24 | \n",
" 2.30558256 | \n",
"
\n",
" \n",
" 0 | \n",
" 9433.07 | \n",
" 1.6680282 | \n",
"
\n",
" \n",
" 0 | \n",
" 9432.49 | \n",
" 0.04 | \n",
"
\n",
" \n",
" 0 | \n",
" 9432.24 | \n",
" 1.65 | \n",
"
\n",
" \n",
" 0 | \n",
" 9432.16 | \n",
" 1.65 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 7067 | \n",
" 9435 | \n",
" 0.0477 | \n",
"
\n",
" \n",
" 7067 | \n",
" 9434.65 | \n",
" 0.06359115 | \n",
"
\n",
" \n",
" 7067 | \n",
" 9434.64 | \n",
" 2 | \n",
"
\n",
" \n",
" 7067 | \n",
" 9434.39 | \n",
" 0.555 | \n",
"
\n",
" \n",
" 7067 | \n",
" 9434.25 | \n",
" 0.203 | \n",
"
\n",
" \n",
"
\n",
"
70680 rows × 2 columns
\n",
"
"
],
"text/plain": [
" price size\n",
"0 9433.24 2.30558256\n",
"0 9433.07 1.6680282\n",
"0 9432.49 0.04\n",
"0 9432.24 1.65\n",
"0 9432.16 1.65\n",
"... ... ...\n",
"7067 9435 0.0477\n",
"7067 9434.65 0.06359115\n",
"7067 9434.64 2\n",
"7067 9434.39 0.555\n",
"7067 9434.25 0.203\n",
"\n",
"[70680 rows x 2 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bid_df = bid_df.rename({0: \"price\", 1: \"size\"}, axis=1)\n",
"bid_df"
]
},
{
"cell_type": "markdown",
"id": "f28ed9e5-5a34-44ee-bc8e-5eb9c7657ae8",
"metadata": {},
"source": [
"データ型をfloat型に変換します。"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "b43a0718-3cdd-41a3-9edc-fd73fc91dca9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"price float64\n",
"size float64\n",
"dtype: object"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bid_df = bid_df.astype(float)\n",
"bid_df.dtypes"
]
},
{
"cell_type": "markdown",
"id": "b1912b15-e957-4e16-a234-5ca2971a161a",
"metadata": {},
"source": [
"`raw_df` の「timestamp」列をdatetime型に変換し、 `new_df` の「timestamp」列として追加します。"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "194ddc03-dff0-4ac2-8ae1-88e86236c3a7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"price float64\n",
"size float64\n",
"timestamp datetime64[ns]\n",
"dtype: object"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bid_df.loc[:, \"timestamp\"] = pd.to_datetime(raw_df.loc[:, \"timestamp\"], unit=\"s\")\n",
"bid_df.dtypes"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "070dd4a4-a3fa-4bce-a238-7afc9a8f61ef",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" price | \n",
" size | \n",
" timestamp | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 9433.24 | \n",
" 2.305583 | \n",
" 2020-07-08 22:00:05 | \n",
"
\n",
" \n",
" 0 | \n",
" 9433.07 | \n",
" 1.668028 | \n",
" 2020-07-08 22:00:05 | \n",
"
\n",
" \n",
" 0 | \n",
" 9432.49 | \n",
" 0.040000 | \n",
" 2020-07-08 22:00:05 | \n",
"
\n",
" \n",
" 0 | \n",
" 9432.24 | \n",
" 1.650000 | \n",
" 2020-07-08 22:00:05 | \n",
"
\n",
" \n",
" 0 | \n",
" 9432.16 | \n",
" 1.650000 | \n",
" 2020-07-08 22:00:05 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" price size timestamp\n",
"0 9433.24 2.305583 2020-07-08 22:00:05\n",
"0 9433.07 1.668028 2020-07-08 22:00:05\n",
"0 9432.49 0.040000 2020-07-08 22:00:05\n",
"0 9432.24 1.650000 2020-07-08 22:00:05\n",
"0 9432.16 1.650000 2020-07-08 22:00:05"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bid_df.head()"
]
},
{
"cell_type": "markdown",
"id": "faf85bfa-a446-45d6-8d82-3a1c850b9fe0",
"metadata": {},
"source": [
"「side」列を追加し、値に `\"bid\"` を入力します。"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "5013408b-cfbd-4c56-8e55-8111f49a33a2",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" price | \n",
" size | \n",
" timestamp | \n",
" side | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 9433.24 | \n",
" 2.305583 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
" 0 | \n",
" 9433.07 | \n",
" 1.668028 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
" 0 | \n",
" 9432.49 | \n",
" 0.040000 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
" 0 | \n",
" 9432.24 | \n",
" 1.650000 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
" 0 | \n",
" 9432.16 | \n",
" 1.650000 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" price size timestamp side\n",
"0 9433.24 2.305583 2020-07-08 22:00:05 bid\n",
"0 9433.07 1.668028 2020-07-08 22:00:05 bid\n",
"0 9432.49 0.040000 2020-07-08 22:00:05 bid\n",
"0 9432.24 1.650000 2020-07-08 22:00:05 bid\n",
"0 9432.16 1.650000 2020-07-08 22:00:05 bid"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bid_df.loc[:, \"side\"] = \"bid\"\n",
"bid_df.head()"
]
},
{
"cell_type": "markdown",
"id": "d8f3326c-2ea0-4588-867b-08244b48b7f8",
"metadata": {},
"source": [
"`set_index` メソッドで「timestamp」列をインデックスにします。\n",
"\n",
"```{tip}\n",
"`set_index` メソッドの引数 `drop=False` では、インデックス対象列を列としてそのまま残します\n",
"```\n",
"\n",
"```{tip}\n",
"`bid_df.index.name` 属性はインデックスの名前が格納されており、この属性を操作してインデックス名を変更できます\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "8e3662fc-a4e6-4a6b-9b9f-4945ba53d99d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" price | \n",
" size | \n",
" timestamp | \n",
" side | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9433.24 | \n",
" 2.305583 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9433.07 | \n",
" 1.668028 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9432.49 | \n",
" 0.040000 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9432.24 | \n",
" 1.650000 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9432.16 | \n",
" 1.650000 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" price size timestamp side\n",
"2020-07-08 22:00:05 9433.24 2.305583 2020-07-08 22:00:05 bid\n",
"2020-07-08 22:00:05 9433.07 1.668028 2020-07-08 22:00:05 bid\n",
"2020-07-08 22:00:05 9432.49 0.040000 2020-07-08 22:00:05 bid\n",
"2020-07-08 22:00:05 9432.24 1.650000 2020-07-08 22:00:05 bid\n",
"2020-07-08 22:00:05 9432.16 1.650000 2020-07-08 22:00:05 bid"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bid_df.set_index(\"timestamp\", drop=False, inplace=True)\n",
"bid_df.index.name = None\n",
"bid_df.head()"
]
},
{
"cell_type": "markdown",
"id": "89ae928c-24fa-4909-98a8-3b28196b313f",
"metadata": {},
"source": [
"ask側で同様の処理をします。"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "87928c69-fd33-447f-8374-be6c0c3ab4c4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" price | \n",
" size | \n",
" timestamp | \n",
" side | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9433.25 | \n",
" 0.200 | \n",
" 2020-07-08 22:00:05 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9433.63 | \n",
" 0.159 | \n",
" 2020-07-08 22:00:05 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9433.68 | \n",
" 0.060 | \n",
" 2020-07-08 22:00:05 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9434.00 | \n",
" 0.750 | \n",
" 2020-07-08 22:00:05 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9434.32 | \n",
" 1.000 | \n",
" 2020-07-08 22:00:05 | \n",
" ask | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" price size timestamp side\n",
"2020-07-08 22:00:05 9433.25 0.200 2020-07-08 22:00:05 ask\n",
"2020-07-08 22:00:05 9433.63 0.159 2020-07-08 22:00:05 ask\n",
"2020-07-08 22:00:05 9433.68 0.060 2020-07-08 22:00:05 ask\n",
"2020-07-08 22:00:05 9434.00 0.750 2020-07-08 22:00:05 ask\n",
"2020-07-08 22:00:05 9434.32 1.000 2020-07-08 22:00:05 ask"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ask_df = (\n",
" raw_df.loc[:, \"ask\"]\n",
" .str.split(\"|\")\n",
" .explode()\n",
" .str.split(\"_\", expand=True)\n",
" .rename({0: \"price\", 1: \"size\"}, axis=1)\n",
")\n",
"ask_df = ask_df.astype(float)\n",
"ask_df.loc[:, \"timestamp\"] = pd.to_datetime(raw_df.loc[:, \"timestamp\"], unit=\"s\")\n",
"ask_df.loc[:, \"side\"] = \"ask\"\n",
"ask_df.set_index(\"timestamp\", drop=False, inplace=True)\n",
"ask_df.index.name = None\n",
"ask_df.head()"
]
},
{
"cell_type": "markdown",
"id": "e6d9a83f-2a6f-48f0-8d14-935ebd24680b",
"metadata": {},
"source": [
"`concat` 関数で `bid_df` と `ask_df` を連結します。"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "bfd77126-1ba8-4ccf-8e1e-c6518bd79317",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" price | \n",
" size | \n",
" timestamp | \n",
" side | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9433.24 | \n",
" 2.305583 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9433.07 | \n",
" 1.668028 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9432.49 | \n",
" 0.040000 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9432.24 | \n",
" 1.650000 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9432.16 | \n",
" 1.650000 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" price size timestamp side\n",
"2020-07-08 22:00:05 9433.24 2.305583 2020-07-08 22:00:05 bid\n",
"2020-07-08 22:00:05 9433.07 1.668028 2020-07-08 22:00:05 bid\n",
"2020-07-08 22:00:05 9432.49 0.040000 2020-07-08 22:00:05 bid\n",
"2020-07-08 22:00:05 9432.24 1.650000 2020-07-08 22:00:05 bid\n",
"2020-07-08 22:00:05 9432.16 1.650000 2020-07-08 22:00:05 bid"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_df = pd.concat([bid_df, ask_df])\n",
"new_df.head()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "ad9323ec-f69f-4394-876e-c30a5bfb1429",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" price | \n",
" size | \n",
" timestamp | \n",
" side | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-07-08 23:59:59 | \n",
" 9439.90 | \n",
" 0.500000 | \n",
" 2020-07-08 23:59:59 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 23:59:59 | \n",
" 9440.31 | \n",
" 0.100000 | \n",
" 2020-07-08 23:59:59 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 23:59:59 | \n",
" 9440.33 | \n",
" 1.500000 | \n",
" 2020-07-08 23:59:59 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 23:59:59 | \n",
" 9440.61 | \n",
" 1.197847 | \n",
" 2020-07-08 23:59:59 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 23:59:59 | \n",
" 9440.63 | \n",
" 0.393151 | \n",
" 2020-07-08 23:59:59 | \n",
" ask | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" price size timestamp side\n",
"2020-07-08 23:59:59 9439.90 0.500000 2020-07-08 23:59:59 ask\n",
"2020-07-08 23:59:59 9440.31 0.100000 2020-07-08 23:59:59 ask\n",
"2020-07-08 23:59:59 9440.33 1.500000 2020-07-08 23:59:59 ask\n",
"2020-07-08 23:59:59 9440.61 1.197847 2020-07-08 23:59:59 ask\n",
"2020-07-08 23:59:59 9440.63 0.393151 2020-07-08 23:59:59 ask"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_df.tail()"
]
},
{
"cell_type": "markdown",
"id": "a4740589-5857-4891-bab0-276122c34758",
"metadata": {},
"source": [
"インデックスの降順にソートします。"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "4902178d-c41d-4ada-949c-6014da0392e3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" price | \n",
" size | \n",
" timestamp | \n",
" side | \n",
"
\n",
" \n",
" \n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9433.24 | \n",
" 2.305583 | \n",
" 2020-07-08 22:00:05 | \n",
" bid | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9434.32 | \n",
" 1.000000 | \n",
" 2020-07-08 22:00:05 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9434.35 | \n",
" 0.530100 | \n",
" 2020-07-08 22:00:05 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9434.90 | \n",
" 0.015848 | \n",
" 2020-07-08 22:00:05 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 22:00:05 | \n",
" 9435.00 | \n",
" 0.050000 | \n",
" 2020-07-08 22:00:05 | \n",
" ask | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 2020-07-08 23:59:59 | \n",
" 9435.04 | \n",
" 0.050000 | \n",
" 2020-07-08 23:59:59 | \n",
" bid | \n",
"
\n",
" \n",
" 2020-07-08 23:59:59 | \n",
" 9435.26 | \n",
" 0.139661 | \n",
" 2020-07-08 23:59:59 | \n",
" bid | \n",
"
\n",
" \n",
" 2020-07-08 23:59:59 | \n",
" 9435.27 | \n",
" 17.052425 | \n",
" 2020-07-08 23:59:59 | \n",
" bid | \n",
"
\n",
" \n",
" 2020-07-08 23:59:59 | \n",
" 9440.61 | \n",
" 1.197847 | \n",
" 2020-07-08 23:59:59 | \n",
" ask | \n",
"
\n",
" \n",
" 2020-07-08 23:59:59 | \n",
" 9440.63 | \n",
" 0.393151 | \n",
" 2020-07-08 23:59:59 | \n",
" ask | \n",
"
\n",
" \n",
"
\n",
"
141360 rows × 4 columns
\n",
"
"
],
"text/plain": [
" price size timestamp side\n",
"2020-07-08 22:00:05 9433.24 2.305583 2020-07-08 22:00:05 bid\n",
"2020-07-08 22:00:05 9434.32 1.000000 2020-07-08 22:00:05 ask\n",
"2020-07-08 22:00:05 9434.35 0.530100 2020-07-08 22:00:05 ask\n",
"2020-07-08 22:00:05 9434.90 0.015848 2020-07-08 22:00:05 ask\n",
"2020-07-08 22:00:05 9435.00 0.050000 2020-07-08 22:00:05 ask\n",
"... ... ... ... ...\n",
"2020-07-08 23:59:59 9435.04 0.050000 2020-07-08 23:59:59 bid\n",
"2020-07-08 23:59:59 9435.26 0.139661 2020-07-08 23:59:59 bid\n",
"2020-07-08 23:59:59 9435.27 17.052425 2020-07-08 23:59:59 bid\n",
"2020-07-08 23:59:59 9440.61 1.197847 2020-07-08 23:59:59 ask\n",
"2020-07-08 23:59:59 9440.63 0.393151 2020-07-08 23:59:59 ask\n",
"\n",
"[141360 rows x 4 columns]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_df.sort_index()"
]
},
{
"cell_type": "markdown",
"id": "2183936a-bb8b-4ad2-b8bb-7fd426631675",
"metadata": {},
"source": [
"DataFrameを `to_pickle` メソッドで直列化し、ほかのNotebookでも利用できるようにします。"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "048fdbdb-67b7-45ee-a837-cc06d5b4c9e2",
"metadata": {},
"outputs": [],
"source": [
"df.to_pickle(\"btcusd_2020-07-08.pickle\", protocol=4)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}