[Pandas] 정렬과 순위

2018. 5. 6. 23:31

pandas_align

정렬과 순위

1. 정렬

정렬은 기준, 즉 row index 순, column index 순 등 필요

import pandas as pd
from pandas import Series, DataFrame
import numpy as np

df = DataFrame(np.random.randn(4,3),
              columns=list('bde'),
              index=['seoul', 'busan', 'daegu', 'incheon'])

format = lambda x: '%.2f' % x
df['e'].map(format)

seoul       0.35
busan       0.08
daegu       0.89
incheon    -0.44
Name: e, dtype: object

s1 = df['e'].map(format)
s1

seoul       0.35
busan       0.08
daegu       0.89
incheon    -0.44
Name: e, dtype: object

1) Row index 기준 정렬

Series 객체 index 순으로 정렬

s1.sort_index()

busan       0.08
daegu       0.89
incheon    -0.44
seoul       0.35
Name: e, dtype: object

df2 = DataFrame(np.arange(8).reshape(2,4),
               index=['three', 'one'],
               columns=['d', 'a', 'b', 'c'])
df2

	d	a	b	c
three	0	1	2	3
one	4	5	6	7

DataFrame 객체 index 순으로 정렬

df2.sort_index()

	d	a	b	c
one	4	5	6	7
three	0	1	2	3

2) Column index 기준 정렬

DataFrame 객체 column 순으로 정렬: axis 지정

df2.sort_index(axis=1)

	a	b	c	d
three	1	2	3	0
one	5	6	7	4

3) 내림차순 정렬

데이터는 기본적으로 오름차순으로 정렬됨
내림차순으로 정렬을 할 때는 ascending=False로 지정

df2.sort_index(axis=1, ascending=False)

	d	c	b	a
three	0	3	2	1
one	4	7	6	5

4) 객체를 값에 따라 정렬

객체를 값에 따라 정렬할 경우에는 sort_values 메서드를 이용, 기본적으로 오름차순으로 정렬됨

obj = Series([4, 7, -3, 1])
obj

0    4
1    7
2   -3
3    1
dtype: int64

obj.sort_values()

2   -3
3    1
0    4
1    7
dtype: int64

NaN은 정렬 시 가장 마지막에 위치

obj2 = Series([4, np.nan, 8, np.nan, -10, 2])
obj2

0     4.0
1     NaN
2     8.0
3     NaN
4   -10.0
5     2.0
dtype: float64

obj2.sort_values()

4   -10.0
5     2.0
0     4.0
2     8.0
1     NaN
3     NaN
dtype: float64

Dictionary로 DataFrame 객체를 생성하면 column index 순으로 자동 정렬됨

frame = DataFrame({'b':[4,7,-5,2],
                  'a':[0,1,0,1]})
frame

	a	b
0	0	4
1	1	7
2	0	-5
3	1	2

5) 특정 column을 지정하여 값에 따라 정렬

sort_values 메서드에 by파라미터로 정렬하고자 하는 column 명을 지정

frame.sort_values(by='b')

	a	b
2	0	-5
3	1	2
0	0	4
1	1	7

여러개의 column을 지정하여 기준에 따라 값을 순차적으로 정렬: 리스트 형태로 지정

frame.sort_values(by='a')

	a	b
0	0	4
2	0	-5
1	1	7
3	1	2

frame.sort_values(by=['a', 'b'])

	a	b
2	0	-5
0	0	4
3	1	2
1	1	7

2. 순위

순위를 정하는 메소드: rank()
기본적으로 값을 오름차순 순으로 순위를 매김
내림차순 순으로 순위를 매기려면 ascending=False이용

obj3 = Series([7, -2, 7, 4, 2, 0, 4])
obj3

0    7
1   -2
2    7
3    4
4    2
5    0
6    4
dtype: int64

obj3.sort_values()

1   -2
5    0
4    2
3    4
6    4
0    7
2    7
dtype: int64

1) (오름차순) 순위

값을 오름차순 순으로 정렬한 후 순위를 매긴 후 만약, 동일한 값이 있으면 각 순위를 더한 후 평균값으로 지정

obj3.rank()

0    6.5
1    1.0
2    6.5
3    4.5
4    3.0
5    2.0
6    4.5
dtype: float64

method='first'는 데이터의 순서에 따라 순위를 매김, 동일값 데이터 평균값으로 표현하지 않음

obj3.rank(method='first')

0    6.0
1    1.0
2    7.0
3    4.0
4    3.0
5    2.0
6    5.0
dtype: float64

2) 내림차순 순위

obj3

0    7
1   -2
2    7
3    4
4    2
5    0
6    4
dtype: int64

obj3.sort_values(ascending=False)

2    7
0    7
6    4
3    4
4    2
5    0
1   -2
dtype: int64

obj3.rank(ascending=False)

0    1.5
1    7.0
2    1.5
3    3.5
4    5.0
5    6.0
6    3.5
dtype: float64

obj3.rank(ascending=False, method='first')

0    1.0
1    7.0
2    2.0
3    3.0
4    5.0
5    6.0
6    4.0
dtype: float64

3) 그룹지어서 순위를 매김

obj3

0    7
1   -2
2    7
3    4
4    2
5    0
6    4
dtype: int64

동일 데이터가 여러개 있을때 순위가 큰 것으로 사용

obj3.rank(ascending=False, method='max')

0    2.0
1    7.0
2    2.0
3    4.0
4    5.0
5    6.0
6    4.0
dtype: float64

동일 데이터가 여러개 있을때 순위가 작은 것으로 사용

obj3.rank(ascending=False, method='min')

0    1.0
1    7.0
2    1.0
3    3.0
4    5.0
5    6.0
6    3.0
dtype: float64

저작자표시 비영리 변경금지

'AI&BigData > Basics' 카테고리의 다른 글

[Pandas] 기술통계 계산 2 (0)	2018.06.03
[Pandas] 기술통계 계산 1 (0)	2018.06.03
Confusion matrix, accuracy, f1 score, precision, recall (0)	2018.05.28
[Pandas] 함수 적용과 매핑 (0)	2018.05.06
[Pandas] Operation (0)	2018.05.06
[Pandas] Index 객체, reindex (0)	2018.05.06
[Pandas] DataFrame (0)	2018.04.30
[Pandas] Series 객체 (0)	2018.04.29
[Numpy] 브로드캐스트. 기타활용 (0)	2018.04.24

오늘도 난, 하하하

[Pandas] 정렬과 순위

정렬과 순위

1. 정렬

1) Row index 기준 정렬

2) Column index 기준 정렬

3) 내림차순 정렬

4) 객체를 값에 따라 정렬

5) 특정 column을 지정하여 값에 따라 정렬

2. 순위

1) (오름차순) 순위

2) 내림차순 순위

3) 그룹지어서 순위를 매김

'AI&BigData > Basics' 카테고리의 다른 글

티스토리툴바