[Python] Data manipulation with pandas(4)

1 minute read

Pandas

Creating and Visualizing DataFrames
- .plot()
  - kind=”bar”/ “line”/ “scatter”
- .hist()
- .legend()
Missing data
- .isna()
Read & write dataframe
- pd.read_csv()
- .to_csv()

# import data
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

iris = sns.load_dataset("iris")

.plot()

kind=”bar”

petal_len_by_species = iris.groupby("species")[["petal_length"]].mean()
petal_len_by_species.plot(kind="bar")
plt.show()

kind = “line”

sp_wd_by_sp_len = iris.groupby("sepal_length")[["sepal_width"]].mean()
sp_wd_by_sp_len.plot(kind="line")
plt.show()

kind = “scatter”

iris.plot(x="sepal_length",
          y="sepal_width",
          kind="scatter",
          title="Sepal length x Sepal width")

plt.show()

.hist()

pandas Series로 subsetting 한 경우 <AxesSubplot:>이 되어 2개의 그래프가 하나의 그래프로 표현될 수 있음

iris[iris.species=="setosa"]['sepal_length'].hist(alpha=0.5, bins=20)
iris[iris.species=="versicolor"]['sepal_length'].hist(alpha=0.5, bins=20)
plt.legend(['setosa sepal length','versicolor sepal length'])
plt.show()

.isna()

.any()
.sum()

# column별 missing value 존재 유무 확인
iris.isna().any()

# column별 missing value 개수 확인
iris.isna().sum()

sepal_length    0
sepal_width     0
petal_length    0
petal_width     0
species         0
dtype: int64

.dropna()

결측값이 존재하는 row 삭제

iris.dropna().head()

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa
3	4.6	3.1	1.5	0.2	setosa
4	5.0	3.6	1.4	0.2	setosa

.fillna()

결측값 채워넣기

iris.fillna(0).head()

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa
3	4.6	3.1	1.5	0.2	setosa
4	5.0	3.6	1.4	0.2	setosa

Creating dataframes

1) from a dictionary of lists
2) from a list of dictionaries

# Create a dictionary of lists with new data
class_dict = {
  "class": ["A", "B"],
  "number_of_students": [40, 38],
  "Teacher": ["Kwon", "Park"]
}

pd.DataFrame(class_dict)

	class	number_of_students	Teacher
0	A	40	Kwon
1	B	38	Park

class_list = [
    {"class": "A", "number_of_students": 40, "Teacher":"Kwon"},
    {"class": "B", "number_of_students": 38, "Teacher":"Park"},
]

pd.DataFrame(class_list)

	class	number_of_students	Teacher
0	A	40	Kwon
1	B	38	Park

pd.read_csv() & .to_csv()

tp = pd.read_csv("temperatures.csv")
tp.head()

	Unnamed: 0	date	city	country	avg_temp_c
0	0	2000-01-01	Abidjan	Côte D'Ivoire	27.293
1	1	2000-02-01	Abidjan	Côte D'Ivoire	27.685
2	2	2000-03-01	Abidjan	Côte D'Ivoire	29.061
3	3	2000-04-01	Abidjan	Côte D'Ivoire	28.162
4	4	2000-05-01	Abidjan	Côte D'Ivoire	27.547

tp.to_csv("name_of_file.csv")

Share on

Twitter Facebook LinkedIn

ZSU

[Python] Data manipulation with pandas(4)

Pandas

.plot()

.hist()

.isna()

.dropna()

.fillna()

Creating dataframes

pd.read_csv() & .to_csv()

Share on

You may also enjoy

[Software Engineering at Google] CH8 Style Guides and Rules

[Software Engineering at Google] CH7 Measuring Engineering Productivity

[Software Engineering at Google] CH6 Leading at Scale

[Software Engineering at Google] CH5 How to Lead a Team