{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Groupby と Resample \n", "\n", "- 参照\n", " - [Group by: split-apply-combine — pandas 1.4.1 documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#group-by-split-apply-combine)\n", " - [Resampling — pandas 1.4.1 documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Groupbyとは\n", "\n", "1. 1つのデータを複数のグループに分割する(Splitting)\n", "1. 分割した各データに関数を適用して値を得る (Applying)\n", "1. 2で得た値をデータに一つにまとめる (Combining)\n" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "df = pd.DataFrame(\n", " {\n", " \"date\": pd.date_range(start=\"2000-1-1 0:0:0\", periods=9, freq=\"H\"),\n", " \"class\": np.array([\"A\", \"B\", \"C\"]).repeat(3),\n", " \"value A\": np.arange(1,10),\n", " \"value B\": np.arange(1,10) * 100,\n", " }\n", ")\n", "# データ確認\n", "df" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# class 毎にデータを分割\n", "df_grouped = df.groupby(by=\"class\")\n", "df_grouped\n" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "# グループ化した各データの \"value A\" カラムに max 関数を適用し、一つのデータにまとめる\n", "mx = df_grouped[[\"value A\", \"value B\"]].max()\n" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value Avalue B
class
A3300
B6600
C9900
\n", "
" ], "text/plain": [ " value A value B\n", "class \n", "A 3 300\n", "B 6 600\n", "C 9 900" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# まとめたデータを確認\n", "mx" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{tip} \n", "- `by=` に渡す column 名は、複数指定可。その場合はリストで渡す。\n", "- 適用できるメソッド一覧\n", " - [GroupBy Function application — pandas 1.4.1 documentation](https://pandas.pydata.org/docs/reference/groupby.html#function-application)\n", "- 複数適用させたい場合は、 `agg` もしくは `aggregate` メソッドを使う\n", " - [pandas.core.groupby.SeriesGroupBy.aggregate — pandas 1.4.1 documentation](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.SeriesGroupBy.aggregate.html)\n", "- 自作の関数を使いたい場合は、`apply` メソッドを使う\n", " - [pandas.core.groupby.GroupBy.apply — pandas 1.4.1 documentation](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.apply.html)\n", "```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 分割されたデータを確認したい\n", "\n", "1. groupby で得られた `groupby object` をリストやループに入れる。ただし巨大なデータは時間がかかる\n", "1. `groupby object` のメソッドである、 `.get_group()` を使う\n" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('A',\n", " date class value A value B\n", " 0 2000-01-01 00:00:00 A 1 100\n", " 1 2000-01-01 01:00:00 A 2 200\n", " 2 2000-01-01 02:00:00 A 3 300),\n", " ('B',\n", " date class value A value B\n", " 3 2000-01-01 03:00:00 B 4 400\n", " 4 2000-01-01 04:00:00 B 5 500\n", " 5 2000-01-01 05:00:00 B 6 600),\n", " ('C',\n", " date class value A value B\n", " 6 2000-01-01 06:00:00 C 7 700\n", " 7 2000-01-01 07:00:00 C 8 800\n", " 8 2000-01-01 08:00:00 C 9 900)]" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(df_grouped)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " date class value A value B\n", "0 2000-01-01 00:00:00 A 1 100\n", "1 2000-01-01 01:00:00 A 2 200\n", "2 2000-01-01 02:00:00 A 3 300\n", " date class value A value B\n", "3 2000-01-01 03:00:00 B 4 400\n", "4 2000-01-01 04:00:00 B 5 500\n", "5 2000-01-01 05:00:00 B 6 600\n", " date class value A value B\n", "6 2000-01-01 06:00:00 C 7 700\n", "7 2000-01-01 07:00:00 C 8 800\n", "8 2000-01-01 08:00:00 C 9 900\n" ] } ], "source": [ "for k, df in df_grouped:\n", " print(df)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateclassvalue Avalue B
02000-01-01 00:00:00A1100
12000-01-01 01:00:00A2200
22000-01-01 02:00:00A3300
\n", "
" ], "text/plain": [ " date class value A value B\n", "0 2000-01-01 00:00:00 A 1 100\n", "1 2000-01-01 01:00:00 A 2 200\n", "2 2000-01-01 02:00:00 A 3 300" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_grouped.get_group(\"A\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resample とは\n", "\n", "- \"時間\"で Groupby すること\n", "- `.groupby()` メソッドではなく `.resample()` メソッドを使う\n", "- `groupby()` との違いとして、以下3つをまずは抑えてください。\n", " 1. Datetimeindex や Periodindex といった、**時間を表すindexを持つデータにしか使えない**\n", " 2. どの時間の単位でデータを分割するかを指定する。その際に渡す文字列を \"Frequency String\" と呼ぶ。\n", " - 参照:[Frequency String](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects)\n", " 3. `label` オプション\n", " - どちらのエッジでラベルをつけるかを指定" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
classvalue Avalue B
date
2000-01-01 00:00:00A1100
2000-01-01 01:00:00A2200
2000-01-01 02:00:00A3300
2000-01-01 03:00:00B4400
2000-01-01 04:00:00B5500
2000-01-01 05:00:00B6600
2000-01-01 06:00:00C7700
2000-01-01 07:00:00C8800
2000-01-01 08:00:00C9900
\n", "
" ], "text/plain": [ " class value A value B\n", "date \n", "2000-01-01 00:00:00 A 1 100\n", "2000-01-01 01:00:00 A 2 200\n", "2000-01-01 02:00:00 A 3 300\n", "2000-01-01 03:00:00 B 4 400\n", "2000-01-01 04:00:00 B 5 500\n", "2000-01-01 05:00:00 B 6 600\n", "2000-01-01 06:00:00 C 7 700\n", "2000-01-01 07:00:00 C 8 800\n", "2000-01-01 08:00:00 C 9 900" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 1. 時間を表す index を持つデータに対してのみ使用可\n", "\n", "df = pd.DataFrame(\n", " {\n", " \"date\": pd.date_range(start=\"2000-1-1 0:0:0\", periods=9, freq=\"H\"),\n", " \"class\": np.array([\"A\", \"B\", \"C\"]).repeat(3),\n", " \"value A\": np.arange(1,10),\n", " \"value B\": np.arange(1,10) * 100,\n", " }\n", ")\n", "\n", "# date コラムを、このデータのインデックスに設定して上書き\n", "df.set_index(\"date\", inplace=True)\n", "df" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:00:00',\n", " '2000-01-01 02:00:00', '2000-01-01 03:00:00',\n", " '2000-01-01 04:00:00', '2000-01-01 05:00:00',\n", " '2000-01-01 06:00:00', '2000-01-01 07:00:00',\n", " '2000-01-01 08:00:00'],\n", " dtype='datetime64[ns]', name='date', freq=None)" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.index" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 2. Frequency String\n", "# resample() メソッドは、Resampler object を返す\n", "df_resampled = df.resample(\"2H\")\n", "df_resampled\n" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "date\n", "2000-01-01 00:00:00 2\n", "2000-01-01 02:00:00 4\n", "2000-01-01 04:00:00 6\n", "2000-01-01 06:00:00 8\n", "2000-01-01 08:00:00 9\n", "Freq: 2H, Name: value A, dtype: int64" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# あとはgroupby と同様\n", "# 2時間毎に、Value Aの max を取得\n", "df_resampled[\"value A\"].max()" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "date\n", "2000-01-01 02:00:00 2\n", "2000-01-01 04:00:00 4\n", "2000-01-01 06:00:00 6\n", "2000-01-01 08:00:00 8\n", "2000-01-01 10:00:00 9\n", "Freq: 2H, Name: value A, dtype: int64" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 3. label オプション\n", "# label = 'right' を指定すると、bin の最後のエッジがラベルになる\n", "df.resample(\"2H\", label = \"right\")[\"value A\"].max()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{tip} \n", "- 曜日毎にグループ化したい場合は resample ではなく groupby \n", " - `index.strftime(\"%w\")` で曜日番号、もしくは `index.strftime(\"%a\")` で曜日文字列を得て、カラムに追加。そのカラムで groupby する\n", " - `.resample(\"w\")` は 週\n", "\n", "- 各国の営業日や、マーケットのオープン時間などを知りたい場合\n", " - [pandas-market-calendars · PyPI](https://pypi.org/project/pandas-market-calendars/)\n", " - [Calendar Status — pandas_market_calendars 3.4 documentation](https://pandas-market-calendars.readthedocs.io/en/latest/calendars.html)\n", "```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 金融データのResampling\n", "\n", "\n", "### 取引データからOHLCVを作成\n", "1. 取引データ取得\n", "1. Datetimeindex を持つ DataFrame に変換\n", "1. 時間単位を指定してResampler Objectを作成\n", "1. `.ohlc()` メソッドを適用\n", "1. `.sum()` メソッドを適用(出来高)\n", "1. `.count()` メソッドを適用(取引回数)\n", "1. 表示" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "import asyncio\n", "\n", "import nest_asyncio\n", "import pandas as pd\n", "import plotly.graph_objects as go\n", "import pybotters\n", "from IPython.display import HTML\n", "\n", "nest_asyncio.apply()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1. pybotters 経由で FTX の取引データを取得" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'id': 3254047916,\n", " 'price': 38526.0,\n", " 'size': 0.1464,\n", " 'side': 'buy',\n", " 'liquidation': False,\n", " 'time': '2022-02-01T14:59:59.690649+00:00'},\n", " {'id': 3254047915,\n", " 'price': 38526.0,\n", " 'size': 0.0536,\n", " 'side': 'buy',\n", " 'liquidation': False,\n", " 'time': '2022-02-01T14:59:59.690649+00:00'},\n", " {'id': 3254047888,\n", " 'price': 38524.0,\n", " 'size': 0.0047,\n", " 'side': 'sell',\n", " 'liquidation': False,\n", " 'time': '2022-02-01T14:59:59.288778+00:00'}]" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "async def get_trades(market_name, start_time, end_time):\n", " async with pybotters.Client(\n", " apis={\"ftx\": [\"\", \"\"]}, base_url=\"https://ftx.com/api\"\n", " ) as client:\n", " res = await client.get(\n", " f\"/markets/{market_name}/trades\",\n", " params={\n", " \"start_time\": start_time,\n", " \"end_time\": end_time,\n", " },\n", " )\n", " return await res.json()\n", "\n", "# 取得したデータを確認\n", "data = asyncio.run(get_trades(\"BTC-PERP\", 1643641200, 1643727600))\n", "data[\"result\"][:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2. DataFrame の作成\n", "- `time` コラムを datetimeindex に持つDataFrameを作成" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idpricesizesideliquidationtime
0325404791638526.00.1464buyFalse2022-02-01T14:59:59.690649+00:00
1325404791538526.00.0536buyFalse2022-02-01T14:59:59.690649+00:00
2325404788838524.00.0047sellFalse2022-02-01T14:59:59.288778+00:00
3325404787038524.00.1210sellFalse2022-02-01T14:59:59.078000+00:00
4325404785538524.00.0127sellFalse2022-02-01T14:59:58.777561+00:00
.....................
4995325399837938475.00.1500sellFalse2022-02-01T14:53:19.423013+00:00
4996325399837738475.00.0080sellFalse2022-02-01T14:53:19.370232+00:00
4997325399837638475.00.2960sellFalse2022-02-01T14:53:19.370232+00:00
4998325399837538475.00.6960sellFalse2022-02-01T14:53:19.370232+00:00
4999325399837438476.00.0014buyFalse2022-02-01T14:53:19.339001+00:00
\n", "

5000 rows × 6 columns

\n", "
" ], "text/plain": [ " id price size side liquidation \\\n", "0 3254047916 38526.0 0.1464 buy False \n", "1 3254047915 38526.0 0.0536 buy False \n", "2 3254047888 38524.0 0.0047 sell False \n", "3 3254047870 38524.0 0.1210 sell False \n", "4 3254047855 38524.0 0.0127 sell False \n", "... ... ... ... ... ... \n", "4995 3253998379 38475.0 0.1500 sell False \n", "4996 3253998377 38475.0 0.0080 sell False \n", "4997 3253998376 38475.0 0.2960 sell False \n", "4998 3253998375 38475.0 0.6960 sell False \n", "4999 3253998374 38476.0 0.0014 buy False \n", "\n", " time \n", "0 2022-02-01T14:59:59.690649+00:00 \n", "1 2022-02-01T14:59:59.690649+00:00 \n", "2 2022-02-01T14:59:59.288778+00:00 \n", "3 2022-02-01T14:59:59.078000+00:00 \n", "4 2022-02-01T14:59:58.777561+00:00 \n", "... ... \n", "4995 2022-02-01T14:53:19.423013+00:00 \n", "4996 2022-02-01T14:53:19.370232+00:00 \n", "4997 2022-02-01T14:53:19.370232+00:00 \n", "4998 2022-02-01T14:53:19.370232+00:00 \n", "4999 2022-02-01T14:53:19.339001+00:00 \n", "\n", "[5000 rows x 6 columns]" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(data[\"result\"])\n", "df" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "id int64\n", "price float64\n", "size float64\n", "side object\n", "liquidation bool\n", "time object\n", "dtype: object" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes\n" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idpricesizesideliquidation
time
2022-02-01 14:53:19.339001+00:00325399837438476.00.0014buyFalse
2022-02-01 14:53:19.370232+00:00325399837738475.00.0080sellFalse
2022-02-01 14:53:19.370232+00:00325399837538475.00.6960sellFalse
2022-02-01 14:53:19.370232+00:00325399837638475.00.2960sellFalse
2022-02-01 14:53:19.423013+00:00325399837938475.00.1500sellFalse
..................
2022-02-01 14:59:58.777561+00:00325404785538524.00.0127sellFalse
2022-02-01 14:59:59.078000+00:00325404787038524.00.1210sellFalse
2022-02-01 14:59:59.288778+00:00325404788838524.00.0047sellFalse
2022-02-01 14:59:59.690649+00:00325404791538526.00.0536buyFalse
2022-02-01 14:59:59.690649+00:00325404791638526.00.1464buyFalse
\n", "

5000 rows × 5 columns

\n", "
" ], "text/plain": [ " id price size side \\\n", "time \n", "2022-02-01 14:53:19.339001+00:00 3253998374 38476.0 0.0014 buy \n", "2022-02-01 14:53:19.370232+00:00 3253998377 38475.0 0.0080 sell \n", "2022-02-01 14:53:19.370232+00:00 3253998375 38475.0 0.6960 sell \n", "2022-02-01 14:53:19.370232+00:00 3253998376 38475.0 0.2960 sell \n", "2022-02-01 14:53:19.423013+00:00 3253998379 38475.0 0.1500 sell \n", "... ... ... ... ... \n", "2022-02-01 14:59:58.777561+00:00 3254047855 38524.0 0.0127 sell \n", "2022-02-01 14:59:59.078000+00:00 3254047870 38524.0 0.1210 sell \n", "2022-02-01 14:59:59.288778+00:00 3254047888 38524.0 0.0047 sell \n", "2022-02-01 14:59:59.690649+00:00 3254047915 38526.0 0.0536 buy \n", "2022-02-01 14:59:59.690649+00:00 3254047916 38526.0 0.1464 buy \n", "\n", " liquidation \n", "time \n", "2022-02-01 14:53:19.339001+00:00 False \n", "2022-02-01 14:53:19.370232+00:00 False \n", "2022-02-01 14:53:19.370232+00:00 False \n", "2022-02-01 14:53:19.370232+00:00 False \n", "2022-02-01 14:53:19.423013+00:00 False \n", "... ... \n", "2022-02-01 14:59:58.777561+00:00 False \n", "2022-02-01 14:59:59.078000+00:00 False \n", "2022-02-01 14:59:59.288778+00:00 False \n", "2022-02-01 14:59:59.690649+00:00 False \n", "2022-02-01 14:59:59.690649+00:00 False \n", "\n", "[5000 rows x 5 columns]" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# time を datetime 型に変更し、この dataframe の index として設定\n", "df = pd.DataFrame(data[\"result\"])\n", "df[\"time\"] = pd.to_datetime(df[\"time\"])\n", "df.set_index(\"time\", inplace=True)\n", "df.sort_index(inplace=True)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3. resampler object を作成\n", "\n", "例:1分足で作成" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rule = \"1min\"\n", "df_resampled = df.resample(rule, label=\"right\")\n", "df_resampled\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4. `.ohlc()` メソッドを適用\n", "- resampler オブジェクトのメソッドとして .ohlc() が用意されている\n", " - [pandas.core.resample.Resampler.ohlc — pandas 1.4.1 documentation](https://pandas.pydata.org/docs/reference/api/pandas.core.resample.Resampler.ohlc.html)\n", "- `price` データを OHLC 計算に使えば良い\n" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
openhighlowclose
time
2022-02-01 14:54:00+00:0038476.038538.038452.038516.0
2022-02-01 14:55:00+00:0038517.038520.038383.038395.0
2022-02-01 14:56:00+00:0038395.038461.038374.038417.0
2022-02-01 14:57:00+00:0038416.038417.038336.038395.0
2022-02-01 14:58:00+00:0038395.038478.038394.038476.0
2022-02-01 14:59:00+00:0038472.038473.038407.038473.0
2022-02-01 15:00:00+00:0038473.038550.038473.038526.0
\n", "
" ], "text/plain": [ " open high low close\n", "time \n", "2022-02-01 14:54:00+00:00 38476.0 38538.0 38452.0 38516.0\n", "2022-02-01 14:55:00+00:00 38517.0 38520.0 38383.0 38395.0\n", "2022-02-01 14:56:00+00:00 38395.0 38461.0 38374.0 38417.0\n", "2022-02-01 14:57:00+00:00 38416.0 38417.0 38336.0 38395.0\n", "2022-02-01 14:58:00+00:00 38395.0 38478.0 38394.0 38476.0\n", "2022-02-01 14:59:00+00:00 38472.0 38473.0 38407.0 38473.0\n", "2022-02-01 15:00:00+00:00 38473.0 38550.0 38473.0 38526.0" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_ohlc = df_resampled[\"price\"].ohlc()\n", "df_ohlc" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
openhighlowclose
time
2022-02-01 14:54:00+00:0038476.038538.038452.038516.0
2022-02-01 14:55:00+00:0038517.038520.038383.038395.0
2022-02-01 14:56:00+00:0038395.038461.038374.038417.0
2022-02-01 14:57:00+00:0038416.038417.038336.038395.0
2022-02-01 14:58:00+00:0038395.038478.038394.038476.0
2022-02-01 14:59:00+00:0038472.038473.038407.038473.0
2022-02-01 15:00:00+00:0038473.038550.038473.038526.0
\n", "
" ], "text/plain": [ " open high low close\n", "time \n", "2022-02-01 14:54:00+00:00 38476.0 38538.0 38452.0 38516.0\n", "2022-02-01 14:55:00+00:00 38517.0 38520.0 38383.0 38395.0\n", "2022-02-01 14:56:00+00:00 38395.0 38461.0 38374.0 38417.0\n", "2022-02-01 14:57:00+00:00 38416.0 38417.0 38336.0 38395.0\n", "2022-02-01 14:58:00+00:00 38395.0 38478.0 38394.0 38476.0\n", "2022-02-01 14:59:00+00:00 38472.0 38473.0 38407.0 38473.0\n", "2022-02-01 15:00:00+00:00 38473.0 38550.0 38473.0 38526.0" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ohlc メソッドを使わずに、OHLCを作るには\n", "\n", "df_ohlc_2 = pd.DataFrame(\n", " {\n", " \"open\": df_resampled[\"price\"].first(),\n", " \"high\": df_resampled[\"price\"].max(),\n", " \"low\": df_resampled[\"price\"].min(),\n", " \"close\": df_resampled[\"price\"].last(),\n", " }\n", ")\n", "\n", "df_ohlc_2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 5. `.sum()` メソッドを適用(出来高)\n", "\n", "- size を合計して出来高を出す" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
openhighlowclosevolume
time
2022-02-01 14:54:00+00:0038476.038538.038452.038516.089.7768
2022-02-01 14:55:00+00:0038517.038520.038383.038395.0201.3913
2022-02-01 14:56:00+00:0038395.038461.038374.038417.0257.0133
2022-02-01 14:57:00+00:0038416.038417.038336.038395.0213.7127
2022-02-01 14:58:00+00:0038395.038478.038394.038476.0177.0822
2022-02-01 14:59:00+00:0038472.038473.038407.038473.094.0939
2022-02-01 15:00:00+00:0038473.038550.038473.038526.0123.6707
\n", "
" ], "text/plain": [ " open high low close volume\n", "time \n", "2022-02-01 14:54:00+00:00 38476.0 38538.0 38452.0 38516.0 89.7768\n", "2022-02-01 14:55:00+00:00 38517.0 38520.0 38383.0 38395.0 201.3913\n", "2022-02-01 14:56:00+00:00 38395.0 38461.0 38374.0 38417.0 257.0133\n", "2022-02-01 14:57:00+00:00 38416.0 38417.0 38336.0 38395.0 213.7127\n", "2022-02-01 14:58:00+00:00 38395.0 38478.0 38394.0 38476.0 177.0822\n", "2022-02-01 14:59:00+00:00 38472.0 38473.0 38407.0 38473.0 94.0939\n", "2022-02-01 15:00:00+00:00 38473.0 38550.0 38473.0 38526.0 123.6707" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_ohlc[\"volume\"] = df_resampled[\"size\"].sum()\n", "df_ohlc\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 6. `.count()` メソッドを適用(取引回数)\n", "\n", "- `id` を数えて取引回数を出す。\n", "- 行数のカウントなので `id` 以外でもよい" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
openhighlowclosevolumecount
time
2022-02-01 14:54:00+00:0038476.038538.038452.038516.089.7768410
2022-02-01 14:55:00+00:0038517.038520.038383.038395.0201.3913739
2022-02-01 14:56:00+00:0038395.038461.038374.038417.0257.01331097
2022-02-01 14:57:00+00:0038416.038417.038336.038395.0213.7127939
2022-02-01 14:58:00+00:0038395.038478.038394.038476.0177.0822763
2022-02-01 14:59:00+00:0038472.038473.038407.038473.094.0939451
2022-02-01 15:00:00+00:0038473.038550.038473.038526.0123.6707601
\n", "
" ], "text/plain": [ " open high low close volume count\n", "time \n", "2022-02-01 14:54:00+00:00 38476.0 38538.0 38452.0 38516.0 89.7768 410\n", "2022-02-01 14:55:00+00:00 38517.0 38520.0 38383.0 38395.0 201.3913 739\n", "2022-02-01 14:56:00+00:00 38395.0 38461.0 38374.0 38417.0 257.0133 1097\n", "2022-02-01 14:57:00+00:00 38416.0 38417.0 38336.0 38395.0 213.7127 939\n", "2022-02-01 14:58:00+00:00 38395.0 38478.0 38394.0 38476.0 177.0822 763\n", "2022-02-01 14:59:00+00:00 38472.0 38473.0 38407.0 38473.0 94.0939 451\n", "2022-02-01 15:00:00+00:00 38473.0 38550.0 38473.0 38526.0 123.6707 601" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_ohlc[\"count\"] = df_resampled[\"id\"].count()\n", "df_ohlc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 関数化\n", "\n" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "def generate_ohlcv(df_resampled):\n", " df_ohlc = df_resampled[\"price\"].ohlc()\n", " df_ohlc[\"volume\"] = df_resampled[\"size\"].sum()\n", " df_ohlc[\"count\"] = df_resampled[\"id\"].count()\n", " return df_ohlc\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### OHLCV を sell と buy で分けて作成\n", "- FTX から取得したデータに `side` があるので、`side` で groupby して、resample する" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "rule = \"1min\"\n", "df_buy_resampled = df.groupby(\"side\").get_group(\"buy\").resample(rule, label=\"right\")\n", "df_sell_resampled = df.groupby(\"side\").get_group(\"sell\").resample(rule, label=\"right\")\n", "df_buy = generate_ohlcv(df_buy_resampled)\n", "df_sell = generate_ohlcv(df_sell_resampled)" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "# コラム名をリネーム\n", "df_buy.rename(columns={c:f\"{c}_buy\" for c in df_buy.columns}, inplace=True)\n", "df_sell.rename(columns={c:f\"{c}_sell\" for c in df_sell.columns}, inplace=True)\n" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
open_buyhigh_buylow_buyclose_buyvolume_buycount_buyopen_sellhigh_selllow_sellclose_sellvolume_sellcount_sell
time
2022-02-01 14:54:00+00:0038476.038536.038453.038517.029.243115938475.038538.038452.038516.060.5337251
2022-02-01 14:55:00+00:0038517.038517.038392.038395.062.048222438520.038520.038383.038391.0139.3431515
2022-02-01 14:56:00+00:0038395.038461.038375.038417.0130.645950538394.038460.038374.038409.0126.3674592
2022-02-01 14:57:00+00:0038407.038417.038340.038395.089.194342438416.038416.038336.038394.0124.5184515
2022-02-01 14:58:00+00:0038395.038478.038395.038478.0123.879447738394.038477.038394.038476.053.2028286
2022-02-01 14:59:00+00:0038472.038473.038411.038473.051.761023838471.038471.038407.038467.042.3329213
2022-02-01 15:00:00+00:0038473.038550.038473.038526.069.728739138479.038549.038479.038524.053.9420210
\n", "
" ], "text/plain": [ " open_buy high_buy low_buy close_buy volume_buy \\\n", "time \n", "2022-02-01 14:54:00+00:00 38476.0 38536.0 38453.0 38517.0 29.2431 \n", "2022-02-01 14:55:00+00:00 38517.0 38517.0 38392.0 38395.0 62.0482 \n", "2022-02-01 14:56:00+00:00 38395.0 38461.0 38375.0 38417.0 130.6459 \n", "2022-02-01 14:57:00+00:00 38407.0 38417.0 38340.0 38395.0 89.1943 \n", "2022-02-01 14:58:00+00:00 38395.0 38478.0 38395.0 38478.0 123.8794 \n", "2022-02-01 14:59:00+00:00 38472.0 38473.0 38411.0 38473.0 51.7610 \n", "2022-02-01 15:00:00+00:00 38473.0 38550.0 38473.0 38526.0 69.7287 \n", "\n", " count_buy open_sell high_sell low_sell \\\n", "time \n", "2022-02-01 14:54:00+00:00 159 38475.0 38538.0 38452.0 \n", "2022-02-01 14:55:00+00:00 224 38520.0 38520.0 38383.0 \n", "2022-02-01 14:56:00+00:00 505 38394.0 38460.0 38374.0 \n", "2022-02-01 14:57:00+00:00 424 38416.0 38416.0 38336.0 \n", "2022-02-01 14:58:00+00:00 477 38394.0 38477.0 38394.0 \n", "2022-02-01 14:59:00+00:00 238 38471.0 38471.0 38407.0 \n", "2022-02-01 15:00:00+00:00 391 38479.0 38549.0 38479.0 \n", "\n", " close_sell volume_sell count_sell \n", "time \n", "2022-02-01 14:54:00+00:00 38516.0 60.5337 251 \n", "2022-02-01 14:55:00+00:00 38391.0 139.3431 515 \n", "2022-02-01 14:56:00+00:00 38409.0 126.3674 592 \n", "2022-02-01 14:57:00+00:00 38394.0 124.5184 515 \n", "2022-02-01 14:58:00+00:00 38476.0 53.2028 286 \n", "2022-02-01 14:59:00+00:00 38467.0 42.3329 213 \n", "2022-02-01 15:00:00+00:00 38524.0 53.9420 210 " ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# DataFrame のConcat\n", "pd.concat([df_buy,df_sell], axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## アップサンプリングとダウンサンプリング" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- ダウンサンプリング :高頻度から低頻度へ(毎日→毎月)\n", "- アップサンプリング :低頻度から高頻度へ(毎週→毎日)\n", "- 今日話した内容は全てダウンサンプリング。\n", "- アップサンプリングしたい場合も同様に可。データがない場合は NaNが返る。\n" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "time\n", "2022-02-01 14:54:00+00:00 38516.0\n", "2022-02-01 14:54:30+00:00 NaN\n", "2022-02-01 14:55:00+00:00 38395.0\n", "2022-02-01 14:55:30+00:00 NaN\n", "2022-02-01 14:56:00+00:00 38417.0\n", "2022-02-01 14:56:30+00:00 NaN\n", "2022-02-01 14:57:00+00:00 38395.0\n", "2022-02-01 14:57:30+00:00 NaN\n", "2022-02-01 14:58:00+00:00 38476.0\n", "2022-02-01 14:58:30+00:00 NaN\n", "2022-02-01 14:59:00+00:00 38473.0\n", "2022-02-01 14:59:30+00:00 NaN\n", "2022-02-01 15:00:00+00:00 38526.0\n", "Freq: 30S, Name: close, dtype: float64" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 例:1分足で作った df_ohlc の close データを使って 30秒の max を得る\n", "df_ohlc.resample(\"30s\")[\"close\"].max()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }