{ "cells": [ { "cell_type": "markdown", "id": "58967242-acce-4597-96e3-76cffafa8df0", "metadata": {}, "source": [ "# OrderbookのDataFrameを整然データに変換" ] }, { "cell_type": "markdown", "id": "f7cc24ea-8b8e-43e4-b04c-33f31bcaf3af", "metadata": {}, "source": [ "ここでは [Public Data API From CryptoChassis](https://github.com/crypto-chassis/cryptochassis-data-api-docs) から、Orderbook(depth=10)のデータを取得します。" ] }, { "cell_type": "code", "execution_count": 1, "id": "2529d224-7baa-4a21-b4a8-704148e477b6", "metadata": {}, "outputs": [], "source": [ "import gzip\n", "import io\n", "import json\n", "from urllib import request\n", "\n", "import pandas as pd" ] }, { "cell_type": "markdown", "id": "00cabca4-3434-486f-894d-eecc249a45dd", "metadata": {}, "source": [ "```python\n", "url = \"https://api.cryptochassis.com/v1/market-depth/coinbase/btc-usd?startTime=1594166400&depth=10\"\n", "\n", "with request.urlopen(url) as res:\n", " json_data = json.loads(res.read().decode())\n", "\n", "json_data\n", "```\n", "\n", "```json\n", "{'urls': [{'startTime': {'seconds': 1594166400,\n", " 'iso': '2020-07-08T00:00:00.000Z'},\n", " 'endTime': {'seconds': 1594252800, 'iso': '2020-07-09T00:00:00.000Z'},\n", " 'url': 'https://marketdata-e0323a9039add2978bf5b49550572c7c.s3.amazonaws.com/market_depth/coinbase/btc_usd/1594166400.csv.gz?AWSAccessKeyId=ASIATPNB7YZIRYRUYH6E&Expires=1647314231&Signature=15BugAnUze5vX9Xp4d%2ByxR5Dmjw%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEPD%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJIMEYCIQC9757DDmjrKc%2Ff37ZukleooFJnSjX5yNP60fqvjsnZKAIhAKXxh%2Fb8GX941U13FecZJMSN9UZKrjJUAVVPq8RWURXDKvoDCGkQAhoMMjM5MjQ3Mjc5Njk3Igx6%2BHbD4P%2B%2FvMwGNXMq1wNLsVB7ZHoDj9XV%2BNveAYPCD7AJC2IiwjLz9Ai6295MQ7g%2FENH4FzM%2FLiP0h6pkRZ0lhjXvyFAQ%2FhY6pwDrkIn11IbO6JbrN4zcYvv6GcdRghTXrVr7LFQaXd%2Foqrp5DRj5n9jVPeENClWMjh11y1ry%2BA7MHgDXKTU6g03K2uwUBgmD7Q%2BR%2BP46EgZP3SfOTKRokZEooMmHwg9QMQ0M6DYUzUh8JLZMpDWX9IbsqnOAv%2B%2BHD7OFiUAzMdpjWynLbM5ar%2Fd0JV8P8mRyQoEm63VdXAGWF4g0bWcUwhXcy%2BRBQIeHhqvfH0cBaJCd0G5103bYsOtgV1H%2FDOWv2lUK4jNo06f29FO50OpPOUcl69G1zjSiCwrYNgOBHrsCyA3EM9xRySZG85oeFgt6ka11cYLRAPVexAz9dYq%2B1bJCptk6uaeoglRLEVm%2FTXRRpoZpwSj0IDRdRJmDSSzlm%2Fe5mb6Vxl%2Ft7J%2BCA7kHwaWgA3XWta9wvlOPN%2FR6rt8ZS3ir3EDeAsRo9IY6KXTW1X00Qgsz4LX%2F16bDwWjVMFXfnv715AdgYKUAc5%2BeqyGGUNfpyb7N%2FAIVQEZvqBH0gvFCbggbZvcgDO%2FQHFFPhfVXUR9wrl6Em8ku%2FvAwwqi%2FkQY6pAEV0v%2B9yGcGGUT%2BWVfEcVRuUUuNdM4QrrOfxZXhdgEGzv9SRMCbRNjxmpnXYJAY54gXRb9Ymbqi0mm75V1x54sHc8b7ccfzQaOF37HNWdkRy2LQ%2B77ou4IOgyGP0Mf5MHUP%2FQxff3MLsrmzwTUnHj5ferevzVtozKeH%2Be%2BPtsqQ3nU%2BlosLtUljiYnwzeR1%2Fz4CbSmAU%2Fdp%2FaMjU%2F0NkpwxP5LknQ%3D%3D'}],\n", " 'expiration': '300 seconds'}\n", "```" ] }, { "cell_type": "markdown", "id": "694ca221-b576-4923-ba26-9b15df3306bc", "metadata": {}, "source": [ "APIから得られたURLをもとに、gzipファイルを展開し、CSVファイルを `read_csv` 関数でDataFrameに読み込みます。\n", "\n", "```python\n", "with request.urlopen(json_data[\"urls\"][0][\"url\"]) as res:\n", " gz_data = res.read()\n", "\n", "with gzip.open(io.BytesIO(gz_data), \"rt\") as f:\n", " csv_data = f.read()\n", "\n", "raw_df = pd.read_csv(io.StringIO(csv_data))\n", "del json_data, gz_data, csv_data\n", "raw_df.columns = \"timestamp\", \"bid\", \"ask\"\n", "```\n", "\n", "```{attention}\n", "実際には上記のコードで取得しますが、ここでは取得済みのCSVファイルから読み込みます。\n", "```" ] }, { "cell_type": "code", "execution_count": 2, "id": "a79d3986-b302-4a26-a558-f838d04c86e1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampbidask
015942456059433.24_2.30558256|9433.07_1.6680282|9432.49_0...9433.25_0.2|9433.63_0.159|9433.68_0.06|9434_0....
115942456069433.24_2.30558256|9433.07_1.6680282|9432.24_1...9433.25_0.2|9433.63_0.159|9433.67_0.06|9434_0....
215942456079433.24_2.32361076|9433.07_1.65|9432.24_1.65|9...9433.25_0.17401468|9433.63_0.159|9433.67_0.06|...
315942456089433.61_0.60504644|9433.51_1.15223697|9433.25_...9433.62_0.2|9433.63_0.159|9433.66_0.06|9434_0....
415942456099433.51_1.17026517|9433.25_1.65|9433.24_1.65|9...9433.52_0.637005|9433.61_0.10947951|9433.62_0....
\n", "
" ], "text/plain": [ " timestamp bid \\\n", "0 1594245605 9433.24_2.30558256|9433.07_1.6680282|9432.49_0... \n", "1 1594245606 9433.24_2.30558256|9433.07_1.6680282|9432.24_1... \n", "2 1594245607 9433.24_2.32361076|9433.07_1.65|9432.24_1.65|9... \n", "3 1594245608 9433.61_0.60504644|9433.51_1.15223697|9433.25_... \n", "4 1594245609 9433.51_1.17026517|9433.25_1.65|9433.24_1.65|9... \n", "\n", " ask \n", "0 9433.25_0.2|9433.63_0.159|9433.68_0.06|9434_0.... \n", "1 9433.25_0.2|9433.63_0.159|9433.67_0.06|9434_0.... \n", "2 9433.25_0.17401468|9433.63_0.159|9433.67_0.06|... \n", "3 9433.62_0.2|9433.63_0.159|9433.66_0.06|9434_0.... \n", "4 9433.52_0.637005|9433.61_0.10947951|9433.62_0.... " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_df = pd.read_csv(\"btcusd_2020-07-08.zip\")\n", "raw_df.head()" ] }, { "cell_type": "markdown", "id": "aeff76b6-9340-454f-bd2d-2a1d81fa04a5", "metadata": {}, "source": [ "読み込んだデータは雑然データのため、整然データに整形します。\n", "\n", "```{hint}\n", "整然データとは何か: https://id.fnshr.info/2017/01/09/tidy-data-intro/\n", "```" ] }, { "cell_type": "code", "execution_count": 3, "id": "e0334891-67bb-4605-9282-cf17258a177f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pricesizetimestampside
2020-07-08 22:00:059433.242.3055832020-07-08 22:00:05bid
2020-07-08 22:00:059434.321.0000002020-07-08 22:00:05ask
2020-07-08 22:00:059434.350.5301002020-07-08 22:00:05ask
2020-07-08 22:00:059434.900.0158482020-07-08 22:00:05ask
2020-07-08 22:00:059435.000.0500002020-07-08 22:00:05ask
\n", "
" ], "text/plain": [ " price size timestamp side\n", "2020-07-08 22:00:05 9433.24 2.305583 2020-07-08 22:00:05 bid\n", "2020-07-08 22:00:05 9434.32 1.000000 2020-07-08 22:00:05 ask\n", "2020-07-08 22:00:05 9434.35 0.530100 2020-07-08 22:00:05 ask\n", "2020-07-08 22:00:05 9434.90 0.015848 2020-07-08 22:00:05 ask\n", "2020-07-08 22:00:05 9435.00 0.050000 2020-07-08 22:00:05 ask" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def format_dataframe_by_side(df, side):\n", " new_df = (\n", " df.loc[:, side]\n", " .str.split(\"|\")\n", " .explode()\n", " .str.split(\"_\", expand=True)\n", " .rename({0: \"price\", 1: \"size\"}, axis=1)\n", " )\n", " new_df = new_df.astype(float)\n", " new_df.loc[:, \"timestamp\"] = pd.to_datetime(raw_df.loc[:, \"timestamp\"], unit=\"s\")\n", " new_df.loc[:, \"side\"] = side\n", " new_df.set_index(\"timestamp\", drop=False, inplace=True)\n", " new_df.index.name = None\n", " return new_df\n", "\n", "\n", "def format_dataframe(df):\n", " new_df = pd.concat(\n", " [format_dataframe_by_side(df, \"bid\"), format_dataframe_by_side(df, \"ask\")]\n", " )\n", " return new_df.sort_index()\n", "\n", "\n", "df = format_dataframe(raw_df)\n", "df.head()" ] }, { "cell_type": "markdown", "id": "b38eb8b2-1dea-4a60-9d06-cdabe235903a", "metadata": {}, "source": [ "ここからは関数の内容を分解して解説します。" ] }, { "cell_type": "code", "execution_count": 4, "id": "5b840a25-cb11-4f2a-80be-dd622e098166", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampbidask
015942456059433.24_2.30558256|9433.07_1.6680282|9432.49_0...9433.25_0.2|9433.63_0.159|9433.68_0.06|9434_0....
115942456069433.24_2.30558256|9433.07_1.6680282|9432.24_1...9433.25_0.2|9433.63_0.159|9433.67_0.06|9434_0....
215942456079433.24_2.32361076|9433.07_1.65|9432.24_1.65|9...9433.25_0.17401468|9433.63_0.159|9433.67_0.06|...
315942456089433.61_0.60504644|9433.51_1.15223697|9433.25_...9433.62_0.2|9433.63_0.159|9433.66_0.06|9434_0....
415942456099433.51_1.17026517|9433.25_1.65|9433.24_1.65|9...9433.52_0.637005|9433.61_0.10947951|9433.62_0....
............
706315942527959435.27_16.94086454|9435.26_0.13966131|9435.05...9435.28_8.05085315|9435.3_0.110163|9439.5_0.11...
706415942527969435.27_16.94086454|9435.26_0.13966131|9435.03...9435.28_7.80012384|9435.3_0.110163|9439.5_0.11...
706515942527979435.27_16.94086454|9435.26_0.13966131|9435.04...9435.28_7.79200253|9435.3_0.110163|9439.41_0.0...
706615942527989435.27_17.05242522|9435.26_0.13966131|9435.03...9435.28_7.79200253|9435.3_0.110163|9439.4_0.11...
706715942527999435.27_17.05242522|9435.26_0.13966131|9435.04...9435.28_7.00936353|9435.3_0.110163|9439.4_0.05...
\n", "

7068 rows × 3 columns

\n", "
" ], "text/plain": [ " timestamp bid \\\n", "0 1594245605 9433.24_2.30558256|9433.07_1.6680282|9432.49_0... \n", "1 1594245606 9433.24_2.30558256|9433.07_1.6680282|9432.24_1... \n", "2 1594245607 9433.24_2.32361076|9433.07_1.65|9432.24_1.65|9... \n", "3 1594245608 9433.61_0.60504644|9433.51_1.15223697|9433.25_... \n", "4 1594245609 9433.51_1.17026517|9433.25_1.65|9433.24_1.65|9... \n", "... ... ... \n", "7063 1594252795 9435.27_16.94086454|9435.26_0.13966131|9435.05... \n", "7064 1594252796 9435.27_16.94086454|9435.26_0.13966131|9435.03... \n", "7065 1594252797 9435.27_16.94086454|9435.26_0.13966131|9435.04... \n", "7066 1594252798 9435.27_17.05242522|9435.26_0.13966131|9435.03... \n", "7067 1594252799 9435.27_17.05242522|9435.26_0.13966131|9435.04... \n", "\n", " ask \n", "0 9433.25_0.2|9433.63_0.159|9433.68_0.06|9434_0.... \n", "1 9433.25_0.2|9433.63_0.159|9433.67_0.06|9434_0.... \n", "2 9433.25_0.17401468|9433.63_0.159|9433.67_0.06|... \n", "3 9433.62_0.2|9433.63_0.159|9433.66_0.06|9434_0.... \n", "4 9433.52_0.637005|9433.61_0.10947951|9433.62_0.... \n", "... ... \n", "7063 9435.28_8.05085315|9435.3_0.110163|9439.5_0.11... \n", "7064 9435.28_7.80012384|9435.3_0.110163|9439.5_0.11... \n", "7065 9435.28_7.79200253|9435.3_0.110163|9439.41_0.0... \n", "7066 9435.28_7.79200253|9435.3_0.110163|9439.4_0.11... \n", "7067 9435.28_7.00936353|9435.3_0.110163|9439.4_0.05... \n", "\n", "[7068 rows x 3 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_df" ] }, { "cell_type": "markdown", "id": "c1f41b22-9ef0-4a24-8306-1ab78e24b31c", "metadata": {}, "source": [ "「bid」列、「ask」列は板情報が `|` の文字列で連結されています。これを `str.split` メソッドで分割します。" ] }, { "cell_type": "code", "execution_count": 5, "id": "dfd51790-9ef1-402d-8fc7-de4131c6f598", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 [9433.24_2.30558256, 9433.07_1.6680282, 9432.4...\n", "1 [9433.24_2.30558256, 9433.07_1.6680282, 9432.2...\n", "2 [9433.24_2.32361076, 9433.07_1.65, 9432.24_1.6...\n", "3 [9433.61_0.60504644, 9433.51_1.15223697, 9433....\n", "4 [9433.51_1.17026517, 9433.25_1.65, 9433.24_1.6...\n", "Name: bid, dtype: object" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "split_ser = raw_df.loc[:, \"bid\"].str.split(\"|\")\n", "split_ser.head()" ] }, { "cell_type": "markdown", "id": "02682c91-7dbb-46b7-b30f-46404cad271b", "metadata": {}, "source": [ "Seriesの各要素がリスト型になっていることが確認できます。" ] }, { "cell_type": "code", "execution_count": 6, "id": "8f250ebf-c6ac-4316-b3fd-7b24de90bf0f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['9433.24_2.30558256',\n", " '9433.07_1.6680282',\n", " '9432.49_0.04',\n", " '9432.24_1.65',\n", " '9432.16_1.65',\n", " '9432.15_1.65',\n", " '9432.09_0.25112245',\n", " '9432_0.60011883',\n", " '9431.6_1.585',\n", " '9430.88_1.586']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "split_ser[0]" ] }, { "cell_type": "code", "execution_count": 7, "id": "5c790931-ff9a-4111-87ee-ba4e960ac084", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(split_ser[0])" ] }, { "cell_type": "markdown", "id": "ea288793-6f0c-419e-83b3-1db014b4faaa", "metadata": {}, "source": [ "`explode` メソッドは各要素のオブジェクトを行に展開します。\n", "\n", "```{tip}\n", "`explode` メソッドは整然データを生成する際に便利です\n", "```" ] }, { "cell_type": "code", "execution_count": 8, "id": "aad17ebc-159c-4dcb-a5de-fa96f15cb77d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 9433.24_2.30558256\n", "0 9433.07_1.6680282\n", "0 9432.49_0.04\n", "0 9432.24_1.65\n", "0 9432.16_1.65\n", " ... \n", "7067 9435_0.0477\n", "7067 9434.65_0.06359115\n", "7067 9434.64_2\n", "7067 9434.39_0.555\n", "7067 9434.25_0.203\n", "Name: bid, Length: 70680, dtype: object" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "explode_ser = split_ser.explode()\n", "explode_ser" ] }, { "cell_type": "markdown", "id": "11eb7caf-a64c-45c6-b8f4-eba96a5286f6", "metadata": {}, "source": [ "`explode_ser` の各要素は `価格_サイズ(枚数)` の形式になっています。これを `str.split` メソッドで分割します。\n", "\n", "```{tip}\n", "`str.split` メソッドの引数 `expand=True` にすることで、分割されたデータがDataFrameの列として展開されます\n", "```" ] }, { "cell_type": "code", "execution_count": 9, "id": "977627fd-37af-4d68-a5e5-2f2a891ad171", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01
09433.242.30558256
09433.071.6680282
09432.490.04
09432.241.65
09432.161.65
.........
706794350.0477
70679434.650.06359115
70679434.642
70679434.390.555
70679434.250.203
\n", "

70680 rows × 2 columns

\n", "
" ], "text/plain": [ " 0 1\n", "0 9433.24 2.30558256\n", "0 9433.07 1.6680282\n", "0 9432.49 0.04\n", "0 9432.24 1.65\n", "0 9432.16 1.65\n", "... ... ...\n", "7067 9435 0.0477\n", "7067 9434.65 0.06359115\n", "7067 9434.64 2\n", "7067 9434.39 0.555\n", "7067 9434.25 0.203\n", "\n", "[70680 rows x 2 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bid_df = explode_ser.str.split(\"_\", expand=True)\n", "bid_df" ] }, { "cell_type": "markdown", "id": "a2086880-e32d-4c60-a0fa-5396a2ca9fe2", "metadata": {}, "source": [ "列名を変更します。" ] }, { "cell_type": "code", "execution_count": 10, "id": "7180e2e2-c448-450f-84d5-0ab3053e3a3d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pricesize
09433.242.30558256
09433.071.6680282
09432.490.04
09432.241.65
09432.161.65
.........
706794350.0477
70679434.650.06359115
70679434.642
70679434.390.555
70679434.250.203
\n", "

70680 rows × 2 columns

\n", "
" ], "text/plain": [ " price size\n", "0 9433.24 2.30558256\n", "0 9433.07 1.6680282\n", "0 9432.49 0.04\n", "0 9432.24 1.65\n", "0 9432.16 1.65\n", "... ... ...\n", "7067 9435 0.0477\n", "7067 9434.65 0.06359115\n", "7067 9434.64 2\n", "7067 9434.39 0.555\n", "7067 9434.25 0.203\n", "\n", "[70680 rows x 2 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bid_df = bid_df.rename({0: \"price\", 1: \"size\"}, axis=1)\n", "bid_df" ] }, { "cell_type": "markdown", "id": "f28ed9e5-5a34-44ee-bc8e-5eb9c7657ae8", "metadata": {}, "source": [ "データ型をfloat型に変換します。" ] }, { "cell_type": "code", "execution_count": 11, "id": "b43a0718-3cdd-41a3-9edc-fd73fc91dca9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "price float64\n", "size float64\n", "dtype: object" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bid_df = bid_df.astype(float)\n", "bid_df.dtypes" ] }, { "cell_type": "markdown", "id": "b1912b15-e957-4e16-a234-5ca2971a161a", "metadata": {}, "source": [ "`raw_df` の「timestamp」列をdatetime型に変換し、 `new_df` の「timestamp」列として追加します。" ] }, { "cell_type": "code", "execution_count": 12, "id": "194ddc03-dff0-4ac2-8ae1-88e86236c3a7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "price float64\n", "size float64\n", "timestamp datetime64[ns]\n", "dtype: object" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bid_df.loc[:, \"timestamp\"] = pd.to_datetime(raw_df.loc[:, \"timestamp\"], unit=\"s\")\n", "bid_df.dtypes" ] }, { "cell_type": "code", "execution_count": 13, "id": "070dd4a4-a3fa-4bce-a238-7afc9a8f61ef", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pricesizetimestamp
09433.242.3055832020-07-08 22:00:05
09433.071.6680282020-07-08 22:00:05
09432.490.0400002020-07-08 22:00:05
09432.241.6500002020-07-08 22:00:05
09432.161.6500002020-07-08 22:00:05
\n", "
" ], "text/plain": [ " price size timestamp\n", "0 9433.24 2.305583 2020-07-08 22:00:05\n", "0 9433.07 1.668028 2020-07-08 22:00:05\n", "0 9432.49 0.040000 2020-07-08 22:00:05\n", "0 9432.24 1.650000 2020-07-08 22:00:05\n", "0 9432.16 1.650000 2020-07-08 22:00:05" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bid_df.head()" ] }, { "cell_type": "markdown", "id": "faf85bfa-a446-45d6-8d82-3a1c850b9fe0", "metadata": {}, "source": [ "「side」列を追加し、値に `\"bid\"` を入力します。" ] }, { "cell_type": "code", "execution_count": 14, "id": "5013408b-cfbd-4c56-8e55-8111f49a33a2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pricesizetimestampside
09433.242.3055832020-07-08 22:00:05bid
09433.071.6680282020-07-08 22:00:05bid
09432.490.0400002020-07-08 22:00:05bid
09432.241.6500002020-07-08 22:00:05bid
09432.161.6500002020-07-08 22:00:05bid
\n", "
" ], "text/plain": [ " price size timestamp side\n", "0 9433.24 2.305583 2020-07-08 22:00:05 bid\n", "0 9433.07 1.668028 2020-07-08 22:00:05 bid\n", "0 9432.49 0.040000 2020-07-08 22:00:05 bid\n", "0 9432.24 1.650000 2020-07-08 22:00:05 bid\n", "0 9432.16 1.650000 2020-07-08 22:00:05 bid" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bid_df.loc[:, \"side\"] = \"bid\"\n", "bid_df.head()" ] }, { "cell_type": "markdown", "id": "d8f3326c-2ea0-4588-867b-08244b48b7f8", "metadata": {}, "source": [ "`set_index` メソッドで「timestamp」列をインデックスにします。\n", "\n", "```{tip}\n", "`set_index` メソッドの引数 `drop=False` では、インデックス対象列を列としてそのまま残します\n", "```\n", "\n", "```{tip}\n", "`bid_df.index.name` 属性はインデックスの名前が格納されており、この属性を操作してインデックス名を変更できます\n", "```" ] }, { "cell_type": "code", "execution_count": 15, "id": "8e3662fc-a4e6-4a6b-9b9f-4945ba53d99d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pricesizetimestampside
2020-07-08 22:00:059433.242.3055832020-07-08 22:00:05bid
2020-07-08 22:00:059433.071.6680282020-07-08 22:00:05bid
2020-07-08 22:00:059432.490.0400002020-07-08 22:00:05bid
2020-07-08 22:00:059432.241.6500002020-07-08 22:00:05bid
2020-07-08 22:00:059432.161.6500002020-07-08 22:00:05bid
\n", "
" ], "text/plain": [ " price size timestamp side\n", "2020-07-08 22:00:05 9433.24 2.305583 2020-07-08 22:00:05 bid\n", "2020-07-08 22:00:05 9433.07 1.668028 2020-07-08 22:00:05 bid\n", "2020-07-08 22:00:05 9432.49 0.040000 2020-07-08 22:00:05 bid\n", "2020-07-08 22:00:05 9432.24 1.650000 2020-07-08 22:00:05 bid\n", "2020-07-08 22:00:05 9432.16 1.650000 2020-07-08 22:00:05 bid" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bid_df.set_index(\"timestamp\", drop=False, inplace=True)\n", "bid_df.index.name = None\n", "bid_df.head()" ] }, { "cell_type": "markdown", "id": "89ae928c-24fa-4909-98a8-3b28196b313f", "metadata": {}, "source": [ "ask側で同様の処理をします。" ] }, { "cell_type": "code", "execution_count": 16, "id": "87928c69-fd33-447f-8374-be6c0c3ab4c4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pricesizetimestampside
2020-07-08 22:00:059433.250.2002020-07-08 22:00:05ask
2020-07-08 22:00:059433.630.1592020-07-08 22:00:05ask
2020-07-08 22:00:059433.680.0602020-07-08 22:00:05ask
2020-07-08 22:00:059434.000.7502020-07-08 22:00:05ask
2020-07-08 22:00:059434.321.0002020-07-08 22:00:05ask
\n", "
" ], "text/plain": [ " price size timestamp side\n", "2020-07-08 22:00:05 9433.25 0.200 2020-07-08 22:00:05 ask\n", "2020-07-08 22:00:05 9433.63 0.159 2020-07-08 22:00:05 ask\n", "2020-07-08 22:00:05 9433.68 0.060 2020-07-08 22:00:05 ask\n", "2020-07-08 22:00:05 9434.00 0.750 2020-07-08 22:00:05 ask\n", "2020-07-08 22:00:05 9434.32 1.000 2020-07-08 22:00:05 ask" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ask_df = (\n", " raw_df.loc[:, \"ask\"]\n", " .str.split(\"|\")\n", " .explode()\n", " .str.split(\"_\", expand=True)\n", " .rename({0: \"price\", 1: \"size\"}, axis=1)\n", ")\n", "ask_df = ask_df.astype(float)\n", "ask_df.loc[:, \"timestamp\"] = pd.to_datetime(raw_df.loc[:, \"timestamp\"], unit=\"s\")\n", "ask_df.loc[:, \"side\"] = \"ask\"\n", "ask_df.set_index(\"timestamp\", drop=False, inplace=True)\n", "ask_df.index.name = None\n", "ask_df.head()" ] }, { "cell_type": "markdown", "id": "e6d9a83f-2a6f-48f0-8d14-935ebd24680b", "metadata": {}, "source": [ "`concat` 関数で `bid_df` と `ask_df` を連結します。" ] }, { "cell_type": "code", "execution_count": 17, "id": "bfd77126-1ba8-4ccf-8e1e-c6518bd79317", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pricesizetimestampside
2020-07-08 22:00:059433.242.3055832020-07-08 22:00:05bid
2020-07-08 22:00:059433.071.6680282020-07-08 22:00:05bid
2020-07-08 22:00:059432.490.0400002020-07-08 22:00:05bid
2020-07-08 22:00:059432.241.6500002020-07-08 22:00:05bid
2020-07-08 22:00:059432.161.6500002020-07-08 22:00:05bid
\n", "
" ], "text/plain": [ " price size timestamp side\n", "2020-07-08 22:00:05 9433.24 2.305583 2020-07-08 22:00:05 bid\n", "2020-07-08 22:00:05 9433.07 1.668028 2020-07-08 22:00:05 bid\n", "2020-07-08 22:00:05 9432.49 0.040000 2020-07-08 22:00:05 bid\n", "2020-07-08 22:00:05 9432.24 1.650000 2020-07-08 22:00:05 bid\n", "2020-07-08 22:00:05 9432.16 1.650000 2020-07-08 22:00:05 bid" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_df = pd.concat([bid_df, ask_df])\n", "new_df.head()" ] }, { "cell_type": "code", "execution_count": 18, "id": "ad9323ec-f69f-4394-876e-c30a5bfb1429", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pricesizetimestampside
2020-07-08 23:59:599439.900.5000002020-07-08 23:59:59ask
2020-07-08 23:59:599440.310.1000002020-07-08 23:59:59ask
2020-07-08 23:59:599440.331.5000002020-07-08 23:59:59ask
2020-07-08 23:59:599440.611.1978472020-07-08 23:59:59ask
2020-07-08 23:59:599440.630.3931512020-07-08 23:59:59ask
\n", "
" ], "text/plain": [ " price size timestamp side\n", "2020-07-08 23:59:59 9439.90 0.500000 2020-07-08 23:59:59 ask\n", "2020-07-08 23:59:59 9440.31 0.100000 2020-07-08 23:59:59 ask\n", "2020-07-08 23:59:59 9440.33 1.500000 2020-07-08 23:59:59 ask\n", "2020-07-08 23:59:59 9440.61 1.197847 2020-07-08 23:59:59 ask\n", "2020-07-08 23:59:59 9440.63 0.393151 2020-07-08 23:59:59 ask" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_df.tail()" ] }, { "cell_type": "markdown", "id": "a4740589-5857-4891-bab0-276122c34758", "metadata": {}, "source": [ "インデックスの降順にソートします。" ] }, { "cell_type": "code", "execution_count": 19, "id": "4902178d-c41d-4ada-949c-6014da0392e3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pricesizetimestampside
2020-07-08 22:00:059433.242.3055832020-07-08 22:00:05bid
2020-07-08 22:00:059434.321.0000002020-07-08 22:00:05ask
2020-07-08 22:00:059434.350.5301002020-07-08 22:00:05ask
2020-07-08 22:00:059434.900.0158482020-07-08 22:00:05ask
2020-07-08 22:00:059435.000.0500002020-07-08 22:00:05ask
...............
2020-07-08 23:59:599435.040.0500002020-07-08 23:59:59bid
2020-07-08 23:59:599435.260.1396612020-07-08 23:59:59bid
2020-07-08 23:59:599435.2717.0524252020-07-08 23:59:59bid
2020-07-08 23:59:599440.611.1978472020-07-08 23:59:59ask
2020-07-08 23:59:599440.630.3931512020-07-08 23:59:59ask
\n", "

141360 rows × 4 columns

\n", "
" ], "text/plain": [ " price size timestamp side\n", "2020-07-08 22:00:05 9433.24 2.305583 2020-07-08 22:00:05 bid\n", "2020-07-08 22:00:05 9434.32 1.000000 2020-07-08 22:00:05 ask\n", "2020-07-08 22:00:05 9434.35 0.530100 2020-07-08 22:00:05 ask\n", "2020-07-08 22:00:05 9434.90 0.015848 2020-07-08 22:00:05 ask\n", "2020-07-08 22:00:05 9435.00 0.050000 2020-07-08 22:00:05 ask\n", "... ... ... ... ...\n", "2020-07-08 23:59:59 9435.04 0.050000 2020-07-08 23:59:59 bid\n", "2020-07-08 23:59:59 9435.26 0.139661 2020-07-08 23:59:59 bid\n", "2020-07-08 23:59:59 9435.27 17.052425 2020-07-08 23:59:59 bid\n", "2020-07-08 23:59:59 9440.61 1.197847 2020-07-08 23:59:59 ask\n", "2020-07-08 23:59:59 9440.63 0.393151 2020-07-08 23:59:59 ask\n", "\n", "[141360 rows x 4 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_df.sort_index()" ] }, { "cell_type": "markdown", "id": "2183936a-bb8b-4ad2-b8bb-7fd426631675", "metadata": {}, "source": [ "DataFrameを `to_pickle` メソッドで直列化し、ほかのNotebookでも利用できるようにします。" ] }, { "cell_type": "code", "execution_count": 20, "id": "048fdbdb-67b7-45ee-a837-cc06d5b4c9e2", "metadata": {}, "outputs": [], "source": [ "df.to_pickle(\"btcusd_2020-07-08.pickle\", protocol=4)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 5 }