{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"raw_mimetype": "text/markdown",
"tags": []
},
"source": [
"# 整然データ\n",
"\n",
"pandasでデータを扱う際には整然データであることが望ましいです。本節では整然データの概要および、DataFrameを整然データに整形する方法を紹介します。\n",
"\n",
"## 整然データとは\n",
"\n",
"整然データ (Tidy Data)とは次の要件を満たすデータです。\n",
"\n",
"1. 各変数が1つの列で構成されている\n",
"2. 各観測が1つの行で構成されている\n",
"3. 観測単位が1つの表で構成されている\n",
"\n",
"> Wickham, Hadley (20 February 2013). \"Tidy Data\" https://www.jstatsoft.org/article/view/v059i10 Tidy Data in Python https://www.jeannicholashould.com/tidy-data-in-python.html\n",
"\n",
"雑然データ(Messy Data)とは整然データの要件を満たさないデータです。整然データと雑然データの例として、1月1日から1月3日の各都市の平均気温データを扱います。\n",
"\n",
"\n",
"\n",
"次のように、雑然データでは1つの列に日付と気温の2つの変数で構成されているのに対して、整然データでは各列が1つの変数で構成されているのが確認できます。\n",
"\n",
"\n",
"\n",
"次のように、雑然データでは各行が複数の日付で構成(観測単位が複数)されているのに対して、整然データでは各列が1つの観測で構成されているのが確認できます。\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"## 雑然データから整然データへの変換\n",
"\n",
"ここでは次のような雑然データを整然データに変換する手順を解説します。"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Location | \n",
" 2020-01-01 00:00:00 | \n",
" 2020-01-02 00:00:00 | \n",
" 2020-01-03 00:00:00 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Tokyo | \n",
" 5.8 | \n",
" 5.7 | \n",
" 5.6 | \n",
"
\n",
" \n",
" 1 | \n",
" Osaka | \n",
" 6.8 | \n",
" 6.7 | \n",
" 6.6 | \n",
"
\n",
" \n",
" 2 | \n",
" Nagoya | \n",
" 5.1 | \n",
" 5.1 | \n",
" 5.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Location 2020-01-01 00:00:00 2020-01-02 00:00:00 2020-01-03 00:00:00\n",
"0 Tokyo 5.8 5.7 5.6\n",
"1 Osaka 6.8 6.7 6.6\n",
"2 Nagoya 5.1 5.1 5.0"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import datetime\n",
"\n",
"import pandas as pd\n",
"\n",
"wide_df = pd.DataFrame(\n",
" [\n",
" [\"Tokyo\", 5.8, 5.7, 5.6],\n",
" [\"Osaka\", 6.8, 6.7, 6.6],\n",
" [\"Nagoya\", 5.1, 5.1, 5.0],\n",
" ],\n",
" columns=[\"Location\"] + pd.date_range(\"2020-01-01\", periods=3).tolist(),\n",
")\n",
"wide_df"
]
},
{
"cell_type": "markdown",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"DataFrameの ``melt`` メソッドを実行すると、整然データへの変換が行えます。次のような引数を渡します。\n",
"\n",
"- id_vars: 基本となる列名を指定\n",
"- var_name: 変数となる列名を指定\n",
"- value_name: 値となる列名を指定"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Location | \n",
" Date | \n",
" Temperature | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Tokyo | \n",
" 2020-01-01 | \n",
" 5.8 | \n",
"
\n",
" \n",
" 1 | \n",
" Osaka | \n",
" 2020-01-01 | \n",
" 6.8 | \n",
"
\n",
" \n",
" 2 | \n",
" Nagoya | \n",
" 2020-01-01 | \n",
" 5.1 | \n",
"
\n",
" \n",
" 3 | \n",
" Tokyo | \n",
" 2020-01-02 | \n",
" 5.7 | \n",
"
\n",
" \n",
" 4 | \n",
" Osaka | \n",
" 2020-01-02 | \n",
" 6.7 | \n",
"
\n",
" \n",
" 5 | \n",
" Nagoya | \n",
" 2020-01-02 | \n",
" 5.1 | \n",
"
\n",
" \n",
" 6 | \n",
" Tokyo | \n",
" 2020-01-03 | \n",
" 5.6 | \n",
"
\n",
" \n",
" 7 | \n",
" Osaka | \n",
" 2020-01-03 | \n",
" 6.6 | \n",
"
\n",
" \n",
" 8 | \n",
" Nagoya | \n",
" 2020-01-03 | \n",
" 5.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Location Date Temperature\n",
"0 Tokyo 2020-01-01 5.8\n",
"1 Osaka 2020-01-01 6.8\n",
"2 Nagoya 2020-01-01 5.1\n",
"3 Tokyo 2020-01-02 5.7\n",
"4 Osaka 2020-01-02 6.7\n",
"5 Nagoya 2020-01-02 5.1\n",
"6 Tokyo 2020-01-03 5.6\n",
"7 Osaka 2020-01-03 6.6\n",
"8 Nagoya 2020-01-03 5.0"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tidy_df = wide_df.melt(id_vars=\"Location\", var_name=\"Date\", value_name=\"Temperature\")\n",
"tidy_df"
]
},
{
"cell_type": "markdown",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"## 整然データを可視化\n",
"\n",
"``tidy_df`` は整然データのDataFrameであるため、Plotly Expressに渡せます。次のコードでは整然データに変換したDataFrameをPlotly Expressを用いて棒グラフに描画しています。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" \n",
" "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.plotly.v1+json": {
"config": {
"plotlyServerURL": "https://plot.ly"
},
"data": [
{
"alignmentgroup": "True",
"hovertemplate": "Location=Tokyo
Date=%{x}
Temperature=%{y}",
"legendgroup": "Tokyo",
"marker": {
"color": "#636efa",
"pattern": {
"shape": ""
}
},
"name": "Tokyo",
"offsetgroup": "Tokyo",
"orientation": "v",
"showlegend": true,
"textposition": "auto",
"type": "bar",
"x": [
"2020-01-01T00:00:00",
"2020-01-02T00:00:00",
"2020-01-03T00:00:00"
],
"xaxis": "x",
"y": [
5.8,
5.7,
5.6
],
"yaxis": "y"
},
{
"alignmentgroup": "True",
"hovertemplate": "Location=Osaka
Date=%{x}
Temperature=%{y}",
"legendgroup": "Osaka",
"marker": {
"color": "#EF553B",
"pattern": {
"shape": ""
}
},
"name": "Osaka",
"offsetgroup": "Osaka",
"orientation": "v",
"showlegend": true,
"textposition": "auto",
"type": "bar",
"x": [
"2020-01-01T00:00:00",
"2020-01-02T00:00:00",
"2020-01-03T00:00:00"
],
"xaxis": "x",
"y": [
6.8,
6.7,
6.6
],
"yaxis": "y"
},
{
"alignmentgroup": "True",
"hovertemplate": "Location=Nagoya
Date=%{x}
Temperature=%{y}",
"legendgroup": "Nagoya",
"marker": {
"color": "#00cc96",
"pattern": {
"shape": ""
}
},
"name": "Nagoya",
"offsetgroup": "Nagoya",
"orientation": "v",
"showlegend": true,
"textposition": "auto",
"type": "bar",
"x": [
"2020-01-01T00:00:00",
"2020-01-02T00:00:00",
"2020-01-03T00:00:00"
],
"xaxis": "x",
"y": [
5.1,
5.1,
5
],
"yaxis": "y"
}
],
"layout": {
"autosize": true,
"barmode": "group",
"legend": {
"title": {
"text": "Location"
},
"tracegroupgap": 0
},
"margin": {
"t": 60
},
"template": {
"data": {
"bar": [
{
"error_x": {
"color": "#2a3f5f"
},
"error_y": {
"color": "#2a3f5f"
},
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "bar"
}
],
"barpolar": [
{
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "barpolar"
}
],
"carpet": [
{
"aaxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"baxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"type": "carpet"
}
],
"choropleth": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "choropleth"
}
],
"contour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "contour"
}
],
"contourcarpet": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "contourcarpet"
}
],
"heatmap": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmap"
}
],
"heatmapgl": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmapgl"
}
],
"histogram": [
{
"marker": {
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "histogram"
}
],
"histogram2d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2d"
}
],
"histogram2dcontour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2dcontour"
}
],
"mesh3d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "mesh3d"
}
],
"parcoords": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "parcoords"
}
],
"pie": [
{
"automargin": true,
"type": "pie"
}
],
"scatter": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatter"
}
],
"scatter3d": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatter3d"
}
],
"scattercarpet": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattercarpet"
}
],
"scattergeo": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergeo"
}
],
"scattergl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergl"
}
],
"scattermapbox": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattermapbox"
}
],
"scatterpolar": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolar"
}
],
"scatterpolargl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolargl"
}
],
"scatterternary": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterternary"
}
],
"surface": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "surface"
}
],
"table": [
{
"cells": {
"fill": {
"color": "#EBF0F8"
},
"line": {
"color": "white"
}
},
"header": {
"fill": {
"color": "#C8D4E3"
},
"line": {
"color": "white"
}
},
"type": "table"
}
]
},
"layout": {
"annotationdefaults": {
"arrowcolor": "#2a3f5f",
"arrowhead": 0,
"arrowwidth": 1
},
"autotypenumbers": "strict",
"coloraxis": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"colorscale": {
"diverging": [
[
0,
"#8e0152"
],
[
0.1,
"#c51b7d"
],
[
0.2,
"#de77ae"
],
[
0.3,
"#f1b6da"
],
[
0.4,
"#fde0ef"
],
[
0.5,
"#f7f7f7"
],
[
0.6,
"#e6f5d0"
],
[
0.7,
"#b8e186"
],
[
0.8,
"#7fbc41"
],
[
0.9,
"#4d9221"
],
[
1,
"#276419"
]
],
"sequential": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"sequentialminus": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
]
},
"colorway": [
"#636efa",
"#EF553B",
"#00cc96",
"#ab63fa",
"#FFA15A",
"#19d3f3",
"#FF6692",
"#B6E880",
"#FF97FF",
"#FECB52"
],
"font": {
"color": "#2a3f5f"
},
"geo": {
"bgcolor": "white",
"lakecolor": "white",
"landcolor": "#E5ECF6",
"showlakes": true,
"showland": true,
"subunitcolor": "white"
},
"hoverlabel": {
"align": "left"
},
"hovermode": "closest",
"mapbox": {
"style": "light"
},
"paper_bgcolor": "white",
"plot_bgcolor": "#E5ECF6",
"polar": {
"angularaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"radialaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"scene": {
"xaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"yaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"zaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
}
},
"shapedefaults": {
"line": {
"color": "#2a3f5f"
}
},
"ternary": {
"aaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"baxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"caxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"title": {
"x": 0.05
},
"xaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
},
"yaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
}
}
},
"xaxis": {
"anchor": "y",
"autorange": true,
"domain": [
0,
1
],
"range": [
"2019-12-31 12:00",
"2020-01-03 12:00"
],
"title": {
"text": "Date"
},
"type": "date"
},
"yaxis": {
"anchor": "x",
"autorange": true,
"domain": [
0,
1
],
"range": [
0,
7.157894736842105
],
"title": {
"text": "Temperature"
},
"type": "linear"
}
}
},
"image/png": "",
"text/html": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import plotly.express as px\n",
"\n",
"px.bar(tidy_df, x=\"Date\", y=\"Temperature\", color=\"Location\", barmode=\"group\").show()"
]
}
],
"metadata": {
"celltoolbar": "Raw Cell Format",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}