[挖坑]Numpy，Pandas

参考
Numpy
Pandas
报错
- 'int' object is not callable
- ` Use a.any() or a.all()`
其他
- 极值点

仅记录使用到的，后续使用再补充

import numpy as np
import pandas as pd

参考

NumPy 教程

Numpy

注意

复制矩阵一定要用.copy(),不然就是复制的地址,容易改变原有数据

>>> a=np.arange(10)
>>> b=a[2:4]
>>> b[:]=0.0
>>> a
array([0, 1, 0, 0, 4, 5, 6, 7, 8, 9])

numpy的初始类型给定, 给数组的元素赋值,会自动转为原始的类型

>>> a=np.array([1,2])
>>> a[0]+=1.1
>>> a
array([2, 2])
>>> a[0]=a[0]+2.123
>>> a
array([4, 2])
>>> a=a+2.213
>>> a
array([6.213, 4.213])

创建矩阵

np.array()

>>> np.array([1,2,3,4])
array([1, 2, 3, 4])
>>> np.array([[1,2],[3,4]])
array([[1, 2],
       [3, 4]])
>>> b=np.array([[1,2],[3,4]])
>>> b[0,1] #b[m,n,l], m,n,l层数依次向内
2

np.array([],后面还有很多属性)，略

未初始化的数组`numpy.empty(shape, dtype = float, order = 'C')`

np.empty([维度] [,dtype=类型缺省float] )

>>> np.empty([3,2],dtype=float)
array([[1., 0.],
       [2., 0.],
       [3., 0.]])

全0数组`numpy.zeros(shape, dtype = float, order = 'C')`

全1数组`numpy.ones(shape, dtype = None, order = 'C')`

等差(累加)数组`numpy.arange(start, stop, step, dtype)`

注意终止值不包含

参数	描述
`start`	起始值，默认为`0`
`stop`	终止值（不包含）
`step`	步长，默认为`1`
`dtype`	返回`ndarray`的数据类型，如果没有提供，则会使用输入数据的类型。

>>> np.arange(0,1,0.5)
array([0. , 0.5])

等差(均分)数组`np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)`

参数	描述
`start`	序列的起始值
`stop`	序列的终止值，默认包含，如果`endpoint`为`true`，该值包含于数列中
`num`	要生成的等步长的样本数量，默认为`50`
`endpoint`	该值为 `true` 时，数列中中包含`stop`值，反之不包含，默认是True。
`retstep`	如果为 True 时，生成的数组中会显示间距，反之不显示。
`dtype`	`ndarray` 的数据类型

>>> np.linspace(0,1,3)
array([0. , 0.5, 1. ])

等比数组`np.logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None)`

产生[base**start,...,base**end]类型的数组,python中用**表示幂级数

>>> np.logspace(3,5,4,base=2)
array([ 8.        , 12.69920842, 20.1587368 , 32.        ])
#[2^3,...,2^5]

参数	描述
`start`	序列的起始值为：base ** start
`stop`	序列的终止值为：base ** stop。如果`endpoint`为`true`，该值包含于数列中
`num`	要生成的等步长的样本数量，默认为`50`
`endpoint`	该值为 `true` 时，数列中中包含`stop`值，反之不包含，默认是True。
`base`	对数 log 的底数。
`dtype`	`ndarray` 的数据类型

随机数矩阵

>>> np.random.rand(2,3)
array([[0.58772337, 0.89550384, 0.64660628],
       [0.50886221, 0.34201769, 0.09582977]])
>>> np.random.randint(1,5,(2,3))
array([[4, 4, 1],
       [3, 3, 1]])

矩阵属性

属性	说明
ndarray.ndim	数组维度, 秩，即轴的数量或维度的数量 `np.zeros([1,2,3]).ndim == 3`
ndarray.shape	数组各个维度长度，对于矩阵，n 行 m 列 `np.zeros([1,2,3]).shape == (1, 2, 3)`
ndarray.size	数组元素的总个数，相当于 .shape 中 n*m 的值 `np.zeros([1,2,3]).size == 6`
ndarray.dtype	ndarray 对象的元素类型
ndarray.itemsize	ndarray 对象中每个元素的大小，以字节为单位
ndarray.flags	ndarray 对象的内存信息
ndarray.real	ndarray元素的实部
ndarray.imag	ndarray 元素的虚部
ndarray.data	包含实际数组元素的缓冲区，由于一般通过数组的索引获取元素，所以通常不需要使用这个属性。

统计信息

最大 a.max(), 最大值位置a.argmax()

>>> a
array([[1, 2, 3],
     [4, 5, 6],
     [7, 8, 9]])
>>> a.max()
9
>>> a.max(axis=0)
array([7, 8, 9])
>>> a.max(axis=1)
array([3, 6, 9])
>>> a.argmax()
8
>>> a.argmax(axis=1)
array([2, 2, 2])

最小np.min(),同上
平均值a.mean()同上
```
>>> a.mean(axis=0)
array([4., 5., 6.])
```
方差a.var(),同上
标准差a.std(), 同上
求和a.sum(), 同上

累计求和a.cumsum(), 同上

>>> a.cumsum()
array([ 1,  3,  6, 10, 15, 21, 28, 36, 45])

中值np.median(a)

>>> np.median(a)
5.0
>>> np.median(a,axis=0)
array([4., 5., 6.])

操作矩阵

提取矩阵

a[m],a[m,:],-1代表最后一个索引，-2倒数第二个索引
a[a>4]
a[start:end:step]

多个条件a[(a > 3) & (a < 9) & ( a == 6 )]

>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[-1]
9
>>> a[[0,-1]]
array([0, 9])
>>> a[0:2]
array([0, 1])
>>> a[a<3]
array([0, 1, 2])
>>> a<3
array([ True,  True,  True, False, False, False, False, False, False,
     False])
>>> a[0:10:2]
array([0, 2, 4, 6, 8])
>>> a[[0,9]]
array([0, 9])
>>> a.resize([2,5])
>>> a
array([[0, 1, 2, 3, 4],
     [5, 6, 7, 8, 9]])
>>> a[0,:]
array([0, 1, 2, 3, 4])
>>> a[(a > 3) & (a < 9) & ( a == 6 )]
array([6])

反向排序`起始:终止:间隔`

>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[-2:4:-1]
array([8, 7, 6, 5])

注意到上面的终止4其实对应编号5,和python正向提取时一样的奇怪

修改形状reshape

np.reshape(数组,[维度])，不修改数组本省
数组.reshape([维度])，不修改数组本身
np.resize(数组,[维度]), 修改数组本身，新维度>旧维度，补0,反之，只取前几个元素
数组.resize([维度]), 修改数组本身，修改后元素数要与修改前相同

注意

>>> np.reshape(a,[2,5])
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
>>> a.reshape([2,5])
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
>>> 
>>> a
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>>> a.resize([2,5])
>>> a
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
>>> np.resize(a,[1,4])
array([[0, 1, 2, 3]])

np.where()的坑

 np.where(condition, [x, y])

示例

>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a>2
array([False, False, False,  True,  True,  True,  True,  True,  True,
        True])
>>> np.where(a>2 ,True,0)
array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1])
>>> np.where(a>2)
(array([3, 4, 5, 6, 7, 8, 9]),)
#默认返回的是符合条件的位置
>>> a
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
>>> np.where(a>12)
(array([3, 4, 5, 6, 7, 8, 9]),)
>>> np.where((a>12) & (a<15) )
(array([3, 4]),)
#所以也可以这样提取元素
>>> a[np.where((a>12) & (a<15) ) ]
array([13, 14])

提取元素后,矩阵就变成一维的了,没有了之前的结构

两个where是什么鬼东西

>>> np.where( a > 12 )
(array([3, 4, 5, 6, 7, 8, 9]),)
>>> np.where(a < 15)
(array([0, 1, 2, 3, 4]),)
>>> np.where( a > 12 ) and np.where(a < 15)
(array([0, 1, 2, 3, 4]),)

where不同维度的坑,高纬度需要a[:,np.where(condition)[0]],不然提取后矩阵维度就变了

>>> a=np.zeros([5,3,2,4])
>>> a.shape
(5, 3, 2, 4)
>>> b=np.arange(3)
>>> select=np.where( (b>0)&(b<2))
>>> select
(array([1]),)
#直接在a下标中用select,矩阵的维度就变了
>>> a[:,select].shape
(5, 1, 1, 2, 4)
#用select[0]就可以
>>> a[:,select[0]].shape
(5, 1, 2, 4)

展开数组ravel

均不改变数组本身

np.ravel(a)

a.ravel()

>>> a.ravel()
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

合并数组np.append

>>> np.append(a,a)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

可用于扩展矩阵

>>> b
array([], dtype=float64)
>>> b=np.append(b,"hello")
>>> b
array(['hello'], dtype='<U32')

更丰富的合并

Python中numpy数组的拼接、合并

>>> a
array(［0, 1, 2],
       [3, 4, 5],
       [6, 7, 8］)
>>> b = a*2
>>> b
array(［ 0, 2, 4],
       [ 6, 8, 10],
       [12, 14, 16］)
#水平合并
>>> np.hstack((a,b))
array(［ 0, 1, 2, 0, 2, 4],
       [ 3, 4, 5, 6, 8, 10],
       [ 6, 7, 8, 12, 14, 16］)

>>> np.concatenate((a,b),axis=1)
array(［ 0, 1, 2, 0, 2, 4],
       [ 3, 4, 5, 6, 8, 10],
       [ 6, 7, 8, 12, 14, 16］)
#垂直合并
>>> np.vstack((a,b))
array(［ 0, 1, 2],
       [ 3, 4, 5],
       [ 6, 7, 8],
       [ 0, 2, 4],
       [ 6, 8, 10],
       [12, 14, 16］)

>>> np.concatenate((a,b),axis=0)
array(［ 0, 1, 2],
       [ 3, 4, 5],
       [ 6, 7, 8],
       [ 0, 2, 4],
       [ 6, 8, 10],
       [12, 14, 16］)

删除矩阵指定纬度

>>> a
array([[1, 2, 3],
       [3, 4, 5]])
>>> np.delete(a,0,axis=0)
array([[3, 4, 5]])
>>> a
array([[1, 2, 3],
       [3, 4, 5]])
>>> np.delete(a,0,axis=1)
array([[2, 3],
       [4, 5]])

其他略

函数	元素及描述
`resize`	返回指定形状的新数组
`append`	将值添加到数组末尾
`insert`	沿指定轴将值插入到指定下标之前
`delete`	删掉某个轴的子数组，并返回删除后的新数组
`unique`	清除相同的元素 `a=np.unique(a)`

删除数组中特定条件的值,如删除值

>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.delete(a,a==5)
array([0, 1, 2, 3, 4, 6, 7, 8, 9])

数学函数

转载自NumPy 数学函数

三角函数（Trigonometric）

Function	Describe
sin(x[, out])	正弦值
cos(x[, out])	余弦值
tan(x[, out])	正切值
arcsin(x[, out])	反正弦
arccos(x[, out])	反余弦
arctan(x[, out])	反正切
hypot(x1, x2[, out])	求直角三角形斜边
arctan2(x1, x2[, out])	Element-wise arc tangent of x1/x2 choosing the quadrant correctly.
degrees(x[, out])	弧度求角度
radians(x[, out])	角度求弧度
unwrap(p[, discont, axis])	Unwrap by changing deltas between values to 2*pi complement.
deg2rad(x[, out])	角度求弧度
rad2deg(x[, out])	弧度求角度

双曲线函数（Hyperbolic）

Function	Describe
sinh(x[, out])	双曲线正弦
cosh(x[, out])	双曲线余弦
tanh(x[, out])	双曲线正切
arcsinh(x[, out])	反双曲线正弦
arccosh(x[, out])	反双曲线余弦
arctanh(x[, out])	反双曲线正切

四舍五入（Rounding）

Function	Describe
around(a[, decimals, out])	以给定的小数位进行四舍五入
round_(a[, decimals, out])	以给定的小数位进行四舍五入，等同于around
rint(x[, out])	四舍五入到整数
fix(x[, y])	向0取整，正数向下取整，负数向上取整
floor(x[, out])	向下取整
ceil(x[, out])	向上取整
trunc(x[, out])	取整数部分

和、积、差异（Sums、Products、Differences）

Function	Describe
prod(a[, axis, dtype, out, keepdims])	求积
sum(a[, axis, dtype, out, keepdims])	求和
nanprod(a[, axis, dtype, out, keepdims])	求积，缺省值为1
nansum(a[, axis, dtype, out, keepdims])	求和，缺省值为0
cumprod(a[, axis, dtype, out])	累积
cumsum(a[, axis, dtype, out])	累和
nancumprod(a[, axis, dtype, out])	累积，缺省为1
nancumsum(a[, axis, dtype, out])	累和，缺省为0
diff(a[, n, axis])	out[n] = a[n+1] - a[n]
ediff1d(ary[, to_end, to_begin])	The differences between consecutive elements of an array.
gradient(f, varargs, *kwargs)	Return the gradient of an N-dimensional array.
cross(a, b[, axisa, axisb, axisc, axis])	Return the cross product of two (arrays of) vectors.
trapz(y[, x, dx, axis])	Integrate along the given axis using the composite trapezoidal rule.

指数、对数函数（Exponents&Logarithm）

Function	Describe
exp(x[, out])	指数
expm1(x[, out])	exp(x) - 1
exp2(x[, out])	2**x
log(x[, out])	对数
log10(x[, out])	以10为底对数
log2(x[, out])	以2为底对数.
log1p(x[, out])	exp(x) - 1
logaddexp(x1, x2[, out])	Logarithm of the sum of exponentiations of the inputs.
logaddexp2(x1, x2[, out])	Logarithm of the sum of exponentiations of the inputs in base-2.

算术运算

Function	Describe
add(x1, x2[, out])	加法
reciprocal(x[, out])	倒数
negative(x[, out])	负数
multiply(x1, x2[, out])	乘法
divide(x1, x2[, out])	除法
power(x1, x2[, out])	幂运算
subtract(x1, x2[, out])	减法
true_divide(x1, x2[, out])	真除法 /
floor_divide(x1, x2[, out])	向下取整除法 //
fmod(x1, x2[, out])	求余
mod(x1, x2[, out])	求余，余数为正
modf(x[, out1, out2])	分别返回整数和余数
remainder(x1, x2[, out])	和mod相同

其他（Miscellaneous）

Function	Describe
convolve(a, v[, mode])	Returns the discrete, linear convolution of two one-dimensional sequences.
clip(a, a_min, a_max[, out])	求某一范围的值
sqrt(x[, out])	开平方 `dist_i=np.sqrt(dist_i)`
cbrt(x[, out])	开立方
square(x[, out])	求平方, `dist_i=np.square(coor_i[:,:,:,:]-coor_i[0,:,:,:]).sum(axis=1)`
absolute(x[, out])	绝对值
fabs(x[, out])	绝对值
sign(x[, out])	标记数字的正负零
maximum(x1, x2[, out])	求最大值
minimum(x1, x2[, out])	求最小值
fmax(x1, x2[, out])	求最大值
fmin(x1, x2[, out])	求最小值
nan_to_num(x)	替换空值
real_if_close(a[, tol])	If complex input returns a real array if complex parts are close to zero.
interp(x, xp, fp[, left, right, period])	One-dimensional linear interpolation.

还有点积，线性代数等数学函数

其他使用

排序

np.argsort进行从小到大排序，输出排序索引，加符号从大到小排序

>>> a
array([2, 3, 4, 5, 6])
>>> np.argsort(-a)
array([4, 3, 2, 1, 0])
>>> a[np.argsort(-a)]
array([6, 5, 4, 3, 2])

按纬度排序

>>> data
array([[16.        ,  0.25770879],
       [ 2.        ,  0.07467212],
       [ 8.        ,  0.20363782],
       [32.        ,  0.28113183],
       [24.        ,  0.25770879],
       [ 1.        ,  0.03683032],
       [ 4.        ,  0.1340857 ]])
>>> np.sort(data,0)
array([[ 1.        ,  0.03683032],
       [ 2.        ,  0.07467212],
       [ 4.        ,  0.1340857 ],
       [ 8.        ,  0.20363782],
       [16.        ,  0.25770879],
       [24.        ,  0.25770879],
       [32.        ,  0.28113183]])

数学函数

数值积分

    over=" sum: %6.3e"%( np.trapz(y[:,i],x) ) #积分

叉乘

$a\times b$

>>> a=np.array([1, 2, 3])
>>> b=a.copy()
>>> np.cross(a,b)
array([0, 0, 0])
>>> b=np.array([3,2,1])
>>> np.cross(a,b)
array([-4,  8, -4])

一维点乘,二维矩阵乘法

np.dot(a,b)

对应位置相乘

np.multiply(a,b)
#数组和矩阵对应位置相乘，输出与相乘数组/矩阵的大小一致
#直接a*b不就行了

报错

>>> a=np.array([-1, 1, 1])
>>> 10**a[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Integers to negative integer powers are not allowed.
>>> 10**int(a[0])
0.1

类型转换

浮点转整数

#先取整, 再转换
   kindex =np.rint(data[:,8]).astype(int) 
   ist    =np.rint(data[:9] ).astype(int) #轨道编号
   idim   =np.rint(data[:10]).astype(int) 

Pandas

易百教程-Pandas

数据类型

一维：系列(Series)
二维：数据帧(DataFrame)
三维：面板(Panel)

Series构成DataFrame，DataFrame构成Panel

Series

创建`pd.Series(数组)`

pandas.Series( data, index, dtype, copy)

编号	参数	描述
1	`data`	数据采取各种形式，如：`ndarray`，`list`，`constants`
2	`index`	索引值必须是唯一的和散列的，与数据的长度相同。默认`np.arange(n)`如果没有索引被传递。
3	`dtype`	`dtype`用于数据类型。如果没有，将推断数据类型
4	`copy`	复制数据，默认为`false`。

示例

data=pd.Series([1,2,3])
data=pd.Series(np.arange(10))

提取数值data.values

>>> data.values
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

提取指定元素data.get(N)

>>> data.get(0)
0

提取首尾head,tail

>>> data.head(2)
0    0
1    1
dtype: int64
>>> data.tail(2)
8    8
9    9
dtype: int64

DataFrame

创建

pandas.DataFrame( data, index, columns, dtype, copy)

编号	参数	描述
1	`data`	数据采取各种形式，如:`ndarray`，`series`，`map`，`lists`，`dict`，`constant`和另一个`DataFrame`。
2	`index`	对于行标签，要用于结果帧的索引是可选缺省值`np.arrange(n)`，如果没有传递索引值。
3	`columns`	对于列标签，可选的默认语法是 - `np.arange(n)`。这只有在没有索引传递的情况下才是这样。
4	`dtype`	每列的数据类型。
5	`copy`	如果默认值为`False`，则此命令(或任何它)用于复制数据。

示例

>>> a
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
>>> pd.DataFrame(a)
   0  1  2  3  4
0  0  1  2  3  4
1  5  6  7  8  9

提取数值

>>> data=pd.DataFrame(a)
>>> data.values
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
>>> data.head(1)
   0  1  2  3  4
0  0  1  2  3  4
>>> 

读写文件

pd.read_table读取文本数据作为DataFrame类型

alldata=pd.read_table(inputfile,
                    skiprows=4,#跳过n行
                    sep=' ', #间隔
                    header=None
                    )
data=alldata.values #使用values提取数值作为Numpy类型
data=data[:,3:]#前三列是NaN跳过

空格不一致时

pd.read_table(inputfile,sep="\s+")

excel

conda install xlrd

报错

`'int' object is not callable`

多加了括号

>>> a[a>5].size
5
>>> a[a>5].size()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not callable

` Use a.any() or a.all()`

  File "./spectrum.gpaw.py", line 41, in readdat
    data=data[ np.where( ( data[:,0] > xlim[0] ) and ( data[:,0] < xlim[1] ) )]
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

改为

    data=data[ np.where( ( data[:,0] > xlim[0] ) & ( data[:,0] < xlim[1] ) )]

其他

极值点

       time=xdata[np.where((xdata >= fftlimit[0]) & (xdata <= fftlimit[1]))]
       tem=ydata[:,x][np.where((xdata >= fftlimit[0]) & (xdata <= fftlimit[1]))]
       import scipy.signal as signal
       pos=signal.argrelextrema(tem,np.greater)
       p1=0
       for p in pos[0]:
           if time[p]-time[p1] < 0: #10fs内只统计一次
               p1=p
               continue
           else:
               axs.text(time[p],tem[p]+1,str(round(time[p],2)),color=colorxyz[x],size=30)
               p1=p

本文首发于我的博客@cndaqiang.
本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

[挖坑]Numpy，Pandas

参考

Numpy

注意

创建矩阵

np.array()

未初始化的数组numpy.empty(shape, dtype = float, order = 'C')

全0数组numpy.zeros(shape, dtype = float, order = 'C')

全1数组numpy.ones(shape, dtype = None, order = 'C')

等差(累加)数组numpy.arange(start, stop, step, dtype)

等差(均分)数组np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

等比数组np.logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None)

随机数矩阵

矩阵属性

统计信息

操作矩阵

提取矩阵

反向排序起始:终止:间隔

修改形状reshape

展开数组ravel

合并数组np.append

更丰富的合并

删除矩阵指定纬度

其他 略

数学函数

三角函数（Trigonometric）

双曲线函数（Hyperbolic）

四舍五入（Rounding）

和、积、差异（Sums、Products、Differences）

指数、对数函数（Exponents&Logarithm）

算术运算

其他（Miscellaneous）

其他使用

排序

按纬度排序

数学函数

数值积分

叉乘

一维点乘,二维矩阵乘法

对应位置相乘

报错

类型转换

Pandas

数据类型

Series

创建pd.Series(数组)

DataFrame

创建

读写文件

pd.read_table读取文本数据作为DataFrame类型

excel

报错

'int' object is not callable

` Use a.any() or a.all()`

其他

极值点

未初始化的数组`numpy.empty(shape, dtype = float, order = 'C')`

全0数组`numpy.zeros(shape, dtype = float, order = 'C')`

全1数组`numpy.ones(shape, dtype = None, order = 'C')`

等差(累加)数组`numpy.arange(start, stop, step, dtype)`

等差(均分)数组`np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)`

等比数组`np.logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None)`

反向排序`起始:终止:间隔`

其他略

创建`pd.Series(数组)`

`'int' object is not callable`