210726_Pivot Tables 2

import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import matplotlib.pyplot as plt

tipdf = pd.read_csv('../data/tips.csv')
tipdf

# 인덱스 라벨 지정해서 nan 값 추가하기

tipdf.loc['25020'] = np.nan
tipdf.loc['25021'] = np.nan
tipdf.tail()

tipdf.ndim
np.ndim(tipdf)

==================
2

tipdf.shape
np.shape(tipdf)

======================
(247, 7)

tipdf.info()

=======================

<class 'pandas.core.frame.DataFrame'>
Index: 247 entries, 0 to 25021
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   total_bill  245 non-null    float64
 1   tip         244 non-null    float64
 2   sex         244 non-null    object 
 3   smoker      244 non-null    object 
 4   day         244 non-null    object 
 5   time        244 non-null    object 
 6   size        244 non-null    float64
dtypes: float64(3), object(4)
memory usage: 15.4+ KB

# 가장 팁이 많은 데이터 3개의 정보를 디스플레이

# 가장 팁이 많은 데이터 3개의 정보를 디스플레이 

tipdf.sort_values('tip',ascending=False).head(3)

# def get_tippc(df):
#     return (df ['tip']/df['total_bill']*100).round(2) # 소숫점 2 자리

def get_tippc(df):
    return round(df ['tip']/df['total_bill']*100,2) # 소숫점 2 자리, 위랑 같음

tipdf['tip_pct']=get_tippc(tipdf)
tipdf

# 성별, 팁비율 별로 정렬 .. 높은것만 3개 추출 .. 내림차순 정렬

# 성별, 팁비율 별로 정렬 .. 높은것만 3개 추출 .. 내림차순 정렬

tipdf.sort_values(['sex','tip_pct'], ascending=[False,False]).head(3)

tipdf.sort_values(['tip_pct','sex'], ascending=False)

# Male 정보를 따로 분리해서 tipdf_man, tipdf_man 크기를 조회

# Male 정보를 따로 분리해서 tipdf_man, tipdf_man 크기를 조회

tipdf_man = tipdf[tipdf['sex'].isin(['Male'])]
tipdf_man.shape

=======================
(157, 8)

# Female 정보를 따로 분리해서 tipdf_female, tipdf_female 크기를 조회

# Female 정보를 따로 분리해서 tipdf_female, tipdf_female 크기를 조회

tipdf_female = tipdf[tipdf['sex']=='Female']
tipdf_female.shape

===================================
(87, 8)

# tipdf_female, tipdf_man을 단순하게 결합, == tip_all
# tipdf_female = Woman, tipdf_man = Man

tip_all = pd.concat([tipdf_man, tipdf_female], keys = ['Man','Woman'])
tip_all

tipdf

# 각 행별 누락 데이터의 갯수를 출력

# 각 행별 누락 데이터의 갯수를 출력
tipdf.isna().sum(axis=0)# axis=0 defalut

======================================

total_bill    2
tip           3
sex           3
smoker        3
day           3
time          3
size          3
tip_pct       3
dtype: int64

# 각 컬럼별 누락 데이터의 갯수를 출력

# 각 컬럼별 누락 데이터의 갯수를 출력
tipdf.isna().sum(axis=1)# axis=0 defalut

===================================

0        0
1        0
2        0
3        0
4        0
        ..
242      0
243      0
244      7
25020    8
25021    8
Length: 247, dtype: int64

# 누락데이터가 들어있는 데이터를 삭제 ... dropna()

# 누락데이터가 들어있는 데이터를 삭제 ... dropna()

tipdf.dropna(inplace = True) # dropna()는 기본적으로 axis = 0 으로 진행
tipdf

tipdf['tip_pct'].plot(kind='hist', bins = 50) #빈도수를 확인하는 그래프
plt.show()

'｜Playdata_study > Python' 카테고리의 다른 글

210914_powershell 실행오류 (0)	2021.09.16
210727_시각화(Matplot, Seaborn) (0)	2021.07.27
210723_GroupBy, Pivot Tables (0)	2021.07.24
210723_Concat,Merge (0)	2021.07.24
210722_NaN (누락데이터) (0)	2021.07.23

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

HYERI_PLACE

210726_Pivot Tables 2

'｜Playdata_study > Python' 카테고리의 다른 글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

210726_Pivot Tables 2

'｜Playdata_study > Python' 카테고리의 다른 글

관련글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역