Pandas包学习(2)之数据统计函数

本文总结Pandas中常用的统计学函数。

1. 常用函数

常用的统计学函数如下：

函数名称	描述说明
count()	统计某个非空值的数量。
sum()	求和
mean()	求均值
median()	求中位数
mode()	求众数
std()	求标准差
min()	求最小值
max()	求最大值
abs()	求绝对值
prod()	求所有数值的乘积。
cumsum()	计算累计和，axis=0，按照行累加；axis=1，按照列累加。
cumprod()	计算累计积，axis=0，按照行累积；axis=1，按照列累积。
corr()	计算数列或变量之间的相关系数，取值-1到1，值越大表示关联性越强。

在 DataFrame 中，使用聚合类方法时需要指定轴(axis)参数。
行：默认使用 axis=0 或者使用 “index”；
列：默认使用 axis=1 或者使用 “columns”。
如图所示：

2. sum求和

创建DataFrame对象。

> data = {'col1':pd.Series([1,2,3,4],index=['a','b','c']),
		  'col2':pd.Series([4,5,6,7],index=['a','b','c','d'])}
> data = {'name':pd.Series(['a','b','c','d']),
		  'age':pd.Series([14,15,13,17]),
		  'score':pd.Series([89,78,99,100])}
> dd = pd.DataFrame(data)
> dd
  name  age  score
0    a   14     89
1    b   15     78
2    c   13     99
3    d   17    100

默认情况下，sum()函数使用axis=0，按垂直方向进行计算。
对于字符串数据，不会得到异常，将字符串进行连接。

> dd.sum()
name     abcd
age        59
score     366
dtype: object

# 按垂直方向计算
> dd.sum(axis=0)
name     abcd
age        59
score     366
dtype: object

# 按水平方向计算
> dd.sum(axis=1)
0    103
1     93
2    112
3    117
dtype: int64

3. mean()求均值

> dd.mean()
age      14.75
score    91.50
dtype: float64
> dd.mean(axis=0)
age      14.75
score    91.50
dtype: float64
> dd.mean(axis=1)
0    51.5
1    46.5
2    56.0
3    58.5
dtype: float64

4. std()求标准差

标准差：反应数据的离散程度。

> dd.std()
age       1.707825
score    10.279429
dtype: float64

> dd.std(axis=0)
age       1.707825
score    10.279429
dtype: float64

5. 数据汇总描述

describe()函数输出：均值、标准差、四分位值IQR等一系列信息。

> dd.describe()
             age       score
count   4.000000    4.000000
mean   14.750000   91.500000
std     1.707825   10.279429
min    13.000000   78.000000
25%    13.750000   86.250000
50%    14.500000   94.000000
75%    15.500000   99.250000
max    17.000000  100.000000

include能够筛选字符列或者数字列的摘要信息。include 相关参数值说明如下：
object：表示对字符列进行统计信息描述；
number：表示对数字列进行统计信息描述；
all：汇总所有列的统计信息。

> dd.describe(include=['object'])
       name
count     4
unique    4
top       d
freq      1
> dd.describe(include='all')
       name        age       score
count     4   4.000000    4.000000
unique    4        NaN         NaN
top       d        NaN         NaN
freq      1        NaN         NaN
mean    NaN  14.750000   91.500000
std     NaN   1.707825   10.279429
min     NaN  13.000000   78.000000
25%     NaN  13.750000   86.250000
50%     NaN  14.500000   94.000000
75%     NaN  15.500000   99.250000
max     NaN  17.000000  100.000000