3. matplotlib으로 그래프 그리기

많이 쓰이는 라이브러리

matplotlib : 가장 기본이 되는 시각화 라이브러리, 다양하게 생성할 수 있으나 복잡함
- 공식문서, 위키독스
seaborn : 좀 더 발전되서 통계적 그래프에 중점을 둔 고급 시각화 라이브러리, 더 간단함

설치

pip list | grep matplotlib 
pip list | grep seaborn

matplotlib

데이터정의

import matplotlib.pyplot as plt
%matplotlib inline

%matplotlib inline : Rich output(그래프와 같은 그림, 소리, 애니매이션과 같은 결과물)에 대한 IPython의 매직메서드, Jupyter Notebook에서도 이 명령어를 입력하면 그래프가 바로 출력됨

축 그리기(figure , add_subplot)

fig = plt.figure()  #도화지(그래프 객체) 생성, ()안에 사이즈 입력도 가능함
# fig 객체에 add_subplot 메서드를 이용해 축 그리기
# ax = fig.add_subplot(n행, n열, n인덱스) 
ax1 = fig.add_subplot(2,3,1)
ax2 = fig.add_subplot(2,3,4)
ax3 = fig.add_subplot(2,3,5)
ax4 = fig.add_subplot(2,3,6)

plot (linestyle, marker, color)

# figure + add_subplot 두 과정을 한번에 대체하는 plot()
plt.plot(x 데이터, y 데이터, linestyle, marker, color 등...)  

# linestyle
plt.plot(x, y, linestyle='solid' or '-') 
plt.plot(x, y, linestyle='dashed' or '--') 
plt.plot(x, y, linestyle='dashdot' or '-.')
plt.plot(x, y, linestyle='dotted' or ';')

# color
plt.plot(x, y, 'b'); # blue
plt.plot(x, y, 'g') # green 
plt.plot(x, y, 'r'); # red
plt.plot(x, y, 'c') # cyan 
plt.plot(x, y, 'm'); # magenta
plt.plot(x, y, 'y'); # yellow
plt.plot(x, y, 'k') # black 
plt.plot(x, y, 'w'); # white

# linestyle + color
plt.plot(x, x + 0, '-g') # solid green 
plt.plot(x, x + 1, '--c') # dashed cyan 
plt.plot(x, x + 2, '-.k') # dashdot black 
plt.plot(x, x + 3, ':r'); # dotted red

pandas.plot 메서드 인자 matplotlib와 연계해서 사용하면 좋음

label: 그래프의 범례 이름
ax: 그래프를 그릴 matplotlib의 서브플롯 객체
style: matplotlib에 전달할 'ko--'같은 스타일의 문자열
alpha: 투명도 (0 ~1)
kind: 그래프의 종류: line(선), kde(커널 밀도추정) , bar(세로), barh(가로)
logy: Y축에 대한 로그 스케일
use_index: 객체의 색인을 눈금 이름으로 사용할지의 여부
rot: 눈금 이름을 로테이션(0 ~ 360)
xticks, yticks: x축, y축으로 사용할 값
xlim, ylim: x축, y축 한계
grid: 축의 그리드 표시할지 여부

pandas의 data가 DataFrame 일 때 plot 메서드 인자

subplots: 각 DataFrame의 칼럼(column)을 독립된 서브플롯에 그립니다.
sharex: subplots=True면 같은 X축을 공유하고 축의 범위와 눈금을 연결합니다.
sharey: subplots=True면 같은 Y축을 공유합니다.
figsize: 그래프의 크기를 지정합니다. (튜플)
title: 그래프의 제목을 지정합니다. (문자열)
sort_columns: 칼럼을 알파벳 순서로 그립니다.

그래프 그리기(bar, label, title)

ax1.bar(데이터1, 데이터2)  # 데이터1, 데이터2를 x,y축으로 하는 그래프 생성

plt.xlabel('x축라벨')
plt.ylabel('y축라벨')
plt.title("그래프 제목")

# x,y 좌표축의 범위를 설정
plt.xlim()
plt.ylim()

ax1.annotate()  # 그래프 안에 글자, 화살표 등 주석그리기
plt.grid()  # 그리드(격자) 그리기
plt.savefig  # 저장하기
plt.show()  # 그래프 출력

https://matplotlib.org/stable/gallery/showcase/anatomy.html

seaborn

load_dataset()

- API를 통해 손쉽게 데이터를 불러올 수 있음

- default directory : ~/seaborn-data/ ( home에 자동으로 seaborn-data 폴더 생성되어 데이터는 내부에 담김)

import seaborn as sns

tip = sns.load_dataset("tips")

https://csshark.tistory.com/56

파이썬 시각화 그래프 여러개 그리기 plt.subplots

시각화 그래프를 나타 낼 때, 여러개의 그래프를 그리는 법. 1. 필요 라이브러리 import seaborn as sns import matplotlib.pyplot as plt 2. 데이터 준비 df = sns.load_dataset("iris") 데이터 샘플 불러오기에 대해서 자

csshark.tistory.com

여러개 그리려면

fig, ax = plt.subplots(ncols=2, nrows=2,  figsize=(20,20))

sns.distplot(df['sepal_length'], ax=ax[0,0])
sns.distplot(df['sepal_width'], ax=ax[0,1])
sns.distplot(df['petal_length'], ax=ax[1,0])
sns.distplot(df['petal_width'], ax=ax[1,1])

막대그래프

범주형 데이터는 주로 (가로,세로,누적,그룹화된) 막대그래프를 사용하여 수치를 요약함

■ pandas, matplotlib 활용

1. groupby 메서드: 'sex'에 대한 정보로 'tip'을 분석가능함

grouped = dataframe['tip'].groupby(dataframe['sex'])
grouped.mean() # 성별에 따른 팁의 평균
grouped.size() # 성별에 따른 데이터 량(팁 횟수)

2. 성별에 따른 팀 액수의 평균을 막대그래프로 나타내기

import numpy as np
import matplotlib.pyplot as plt

sex = dict(grouped.mean()) #평균 데이터를 딕셔너리로 바꿔주기

x = list(sex.keys())
y = list(sex.values())

plt.bar(x = x, height = y)
plt.ylabel('tip[USD]')
plt.title('Tip by Sex')
plt.show()

■ seaborn, matplotlib 활용 더 간편함

# 성별에 대한 팁 평균을 그래프로 바로 보여줌
# matplotlib와 함께 사용하면 다양한 옵션 넣을 수 있음
plt.figure(figsize=(10,6)) # 도화지 사이즈를 정합니다.
sns.barplot(data=df, x='sex', y='tip')  # barplot = 막대그래프
plt.ylim(0, 4) # y값의 범위를 정합니다.
plt.title('Tip by sex') # 그래프 제목을 정합니다.

# 범주형을 나타내기 좋은 violin plot
sns.violinplot(data =df, x='day', y='tip', palette = "ch:.25")
# caplot(jitter:좌우로도 흔들림)
sns.catplot(x='time',y='tip', jitter=False, data =tips)

barplot	violinplot	catplot

산점도(scatter plot)

수치형 데이터를 나타낼 때는 선그래프 or 산점도가 제일 좋음

sns.scatterplot(data=df, x='total_bill', y='tip', palette="ch:r=-.2,d=.3_r")

# day와의 관계까지 추가
sns.scatterplot(data=df, x='total_bill', y='tip', hue='day')

선그래프

# pyplot
plt.plot(x, np.sin(x), 'o')

# seaborn
# x,y 식 각각 입력
sns.lineplot(x=x, y=np.sin(x))
# data를 가져오기 떄문에 축만 설정
sns.pointplot(data=데이터, x='X축', y='Y축')
# hue 값 추가해서 객체들??? 추가할 수 있음??????

히스토그램

#자료 생성
df['tip_pct'] = df['tip']/df['total_bill']*100
# figure 객체 만들기
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
# bins(x축을 몇 개 구간으로 나눌건지)
patches = ax.hist(df['tip_pct'], bins=50, density = False)
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
#제목붙이기
plt.xlabel('total_bill')
plt.ylabel('tip_percent')
ax.set_title('tip per total_bill')

heatmap

방대한 양의 데이터와 현상을 수치에 따는 색상으로 나타낸 것

* pivot 처리(어떤 축, 점을 기준으로 바꾸기)를 먼저 해야함

flights	pivot = flights.pivot(index='year', columns='month', values='passengers')

# linewidths : heatmap cell 사이 간격이 0.2
# annot = True : cell 안에 데이터 값 넣기
# fmt = "d" : 데이터값 형식은 "정수"
sns.heatmap(pivot, linewidths=.2, annot=True, fmt="d")

sns.heatmap(pivot, linewidths=.2, annot=True, fmt="d",cmap = "YlGnBu")

heatmap	heatmap(cmap = "YlGnBu")
heatmap

'Aiffel_learning > Data_analysis' 카테고리의 다른 글

5. PCA (1)	2024.06.06
4. sklearn : scaling(standard, robust, minmax) (0)	2024.06.06
2-6. pandas : outlier (3)	2024.06.06
2-6. pandas : 결측치 (1)	2024.06.06
2-5. pandas : dt 메소드와 python datetime 모듈 (1)	2024.06.06

이유있는 공부생활

3. matplotlib으로 그래프 그리기

많이 쓰이는 라이브러리

설치

matplotlib

데이터정의

축 그리기(figure , add_subplot)

plot (linestyle, marker, color)

pandas.plot 메서드 인자 matplotlib와 연계해서 사용하면 좋음

pandas의 data가 DataFrame 일 때 plot 메서드 인자

그래프 그리기(bar, label, title)

seaborn

load_dataset()

https://csshark.tistory.com/56

막대그래프

■ pandas, matplotlib 활용

■ seaborn, matplotlib 활용 더 간편함

산점도(scatter plot)

선그래프

히스토그램

heatmap

'Aiffel_learning > Data_analysis' 카테고리의 다른 글

티스토리툴바

3. matplotlib으로 그래프 그리기

많이 쓰이는 라이브러리

설치

matplotlib

데이터정의

축 그리기(figure , add_subplot)

plot (linestyle, marker, color)

pandas.plot 메서드 인자 *matplotlib와 연계해서 사용하면 좋음*

pandas의 data가 DataFrame 일 때 plot 메서드 인자

그래프 그리기(bar, label, title)

seaborn

load_dataset()

https://csshark.tistory.com/56

막대그래프

■ pandas, matplotlib 활용

■ seaborn, matplotlib 활용 *더 간편함*

산점도(scatter plot)

선그래프

히스토그램

heatmap

'Aiffel_learning > Data_analysis' 카테고리의 다른 글

'Aiffel_learning/Data_analysis' Related Articles

티스토리툴바

pandas.plot 메서드 인자 matplotlib와 연계해서 사용하면 좋음

■ seaborn, matplotlib 활용 더 간편함