numpy笔记

时间 2019-11-18
标签 numpy 笔记繁體版
原文原文链接
经过下标范围获取的新的数组是原始数组的一个视图。它与原始数组共享同一块数据空间，会一块儿修改
>>> b = a[3:7] # 经过下标范围产生一个新的数组b，b和a共享同一块数据空间
>>> b
array([101, 4, 5, 6])
>>> b[2] = -10 # 将b的第2个元素修改成-10
>>> b
array([101, 4, -10, 6])
>>> a # a的第5个元素也被修改成10
array([ 0, 1, 100, 101, 4, -10, 6, 7, 8, 9])
当使用整数序列对数组元素进行存取时，将使用整数序列中的每一个元素做为下标，整数序列能够是列
表或者数组。使用整数序列做为下标得到的数组不和原始数组共享数据空间。
>>> x = np.arange(10,1,-1)
>>> x
array([10, 9, 8, 7, 6, 5, 4, 3, 2])
>>> x[[3, 3, 1, 8]] # 获取x中的下标为3, 3, 1, 8的4个元素，组成一个新的数组
array([7, 7, 9, 2])
>>> b = x[np.array([3,3,-3,8])] #下标能够是负数
>>> b[2] = 100
>>> b
array([7, 7, 100, 2])
>>> x # 因为b和x不共享数据空间，所以x中的值并无改变
array([10, 9, 8, 7, 6, 5, 4, 3, 2])
>>> x[[3,5,1]] = -1, -2, -3 # 整数序列下标也能够用来修改元素的值
>>> x
array([10, -3, 8, -1, 6, -2, 4, 3, 2])
当使用布尔数组b做为下标存取数组x中的元素时，将收集数组x中全部在数组b中对应下标为True的
元素。使用布尔数组做为下标得到的数组不和原始数组共享数据空间
#多维数组
a=np.arange(0, 60, 10).reshape(-1, 1) + np.arange(0, 6)
array([[ 0, 1, 2, 3, 4, 5],
[10, 11, 12, 13, 14, 15],
[20, 21, 22, 23, 24, 25],
[30, 31, 32, 33, 34, 35],
[40, 41, 42, 43, 44, 45],
[50, 51, 52, 53, 54, 55]])
>>> a[3:,[3,5]]
array([[33, 35],
       [43, 45],
       [53, 55]])
#结构体
>>> persontype = np.dtype({
'names':['name', 'age', 'weight'],
'formats':['S32','i', 'f']})
# S32 : 32个字节的字符串类型，因为结构中的每一个元素的大小必须固定，所以须要指定字符串的长度
# i : 32bit的整数类型，至关于np.int32
# f : 32bit的单精度浮点数类型，至关于np.float32
>>> persontype
dtype([('name', 'S32'), ('age', '<i4'), ('weight', '<f4')])
>>> a = np.array([("Zhang",32,75.5),("Wang",24,65.2)],
dtype=persontype)
>>> a
array([('Zhang', 32, 75.5), ('Wang', 24, 65.19999694824219)],
      dtype=[('name', 'S32'), ('age', '<i4'), ('weight', '<f4')])
>>> a["name"]
array(['Zhang', 'Wang'],
      dtype='|S32')
>>> a[['name','age']]
array([('Zhang', 32), ('Wang', 24)],
      dtype=[('name', 'S32'), ('age', '<i4')])
>>> a['age']+200
array([232, 224])
>>> a['name'][0]='cao'
>>> a
array([('cao', 32, 75.5), ('Wang', 24, 65.19999694824219)],
      dtype=[('name', 'S32'), ('age', '<i4'), ('weight', '<f4')])
#ufunc运算
x = np.linspace(0, 2*np.pi, 10) # 等差数列
>>> np.logspace(0, 2, 20) # 等比数列产生1(10^0)到100(10^2)、有20个元素的等比数列:
array([ 1. , 1.27427499, 1.62377674, 2.06913808,
2.6366509 , 3.35981829, 4.2813324 , 5.45559478,
6.95192796, 8.8586679 , 11.28837892, 14.38449888,
18.32980711, 23.35721469, 29.76351442, 37.92690191,
48.32930239, 61.58482111, 78.47599704, 100. ])

>>> x = np.linspace(0, 20, 11)
>>> x
array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18., 20.])
>>> len(x)
11
>>> y=np.sin(x)
>>> z=np.sqrt(x)
>>> y
array([ 0. , 0.90929743, -0.7568025 , -0.2794155 , 0.98935825,
       -0.54402111, -0.53657292, 0.99060736, -0.28790332, -0.75098725,
        0.91294525])
>>> z
array([ 0. , 1.41421356, 2. , 2.44948974, 2.82842712,
        3.16227766, 3.46410162, 3.74165739, 4. , 4.24264069,
        4.47213595])
>>> x
array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18., 20.])
>>> np.sin(x,x) #将sin函数所计算的结果直接覆盖到数组x上去的话，能够将要被覆盖的数组做为第二个参数传递给ufunc函数。
array([ 0. , 0.90929743, -0.7568025 , -0.2794155 , 0.98935825,
       -0.54402111, -0.53657292, 0.99060736, -0.28790332, -0.75098725,
        0.91294525])
>>> x
array([ 0. , 0.90929743, -0.7568025 , -0.2794155 , 0.98935825,
       -0.54402111, -0.53657292, 0.99060736, -0.28790332, -0.75098725,
        0.91294525])
>>> np.abs(x)
array([ 0. , 0.90929743, 0.7568025 , 0.2794155 , 0.98935825,
        0.54402111, 0.53657292, 0.99060736, 0.28790332, 0.75098725,
        0.91294525])
>>> x
array([ 0. , 0.90929743, -0.7568025 , -0.2794155 , 0.98935825,
       -0.54402111, -0.53657292, 0.99060736, -0.28790332, -0.75098725,
        0.91294525])
>>> np.abs(x,x) #同理，覆盖。
array([ 0. , 0.90929743, 0.7568025 , 0.2794155 , 0.98935825,
        0.54402111, 0.53657292, 0.99060736, 0.28790332, 0.75098725,
        0.91294525])
>>> x
array([ 0. , 0.90929743, 0.7568025 , 0.2794155 , 0.98935825,
        0.54402111, 0.53657292, 0.99060736, 0.28790332, 0.75098725,
        0.91294525])
numpy的ufunc能够对数组直接进行计算，因此np.sin()比for ... math.sin()要快
然而，单个值计算时，np.sin(0.5) 比 math.sin(0.5) 慢。能够把np当作批量操做。
>>> a = np.arange(0,4)
>>> b = np.arange(1,5)
>>> a+b
array([1, 3, 5, 7])
>>> np.add(a,b)
array([1, 3, 5, 7])
>>> np.add(a,b,c)

Traceback (most recent call last):
  File "<pyshell#139>", line 1, in <module>
    np.add(a,b,c)
ValueError: operands could not be broadcast together with shapes (4) (4) (100)
>>> np.add(a,b,a) #覆盖a
array([1, 3, 5, 7])
>>> a
array([1, 3, 5, 7])
>>> a=[1,2,3,4]
>>> b=[2,3,4,5]
>>> a+b #python自带数组的+
[1, 2, 3, 4, 2, 3, 4, 5]
>>> np.add(a,b)
array([3, 5, 7, 9])
#运算符
y = x1 + x2: add(x1, x2 [, y])
y = x1 - x2: subtract(x1, x2 [, y])
y = x1 * x2: multiply (x1, x2 [, y])
y = x1 / x2: divide (x1, x2 [, y]), 若是两个数组的元素为整数，那么用整数除法
y = x1 / x2: true divide (x1, x2 [, y]), 老是返回精确的商
y = x1 // x2: floor divide (x1, x2 [, y]), 老是对返回值取整
y = -x: negative(x [,y])
y = x1**x2: power(x1, x2 [, y])
y = x1 % x2: remainder(x1, x2 [, y]), mod(x1, x2, [, y])

2.2.1 广播
当咱们使用ufunc函数对两个数组进行计算时，ufunc函数会对这两个数组的对应元素进行计算，因
此它要求这两个数组有相同的大小(shape相同)。若是两个数组的shape不一样的话，会进行以下的广播
(broadcasting)处理：
1. 让全部输入数组都向其中shape最长的数组看齐，shape中不足的部分都经过在前面加1补齐
2. 输出数组的shape是输入数组shape的各个轴上的最大值
3. 若是输入数组的某个轴和输出数组的对应轴的长度相同或者其长度为1时，这个数组可以用来计
算，不然出错
4. 当输入数组的某个轴的长度为1时，沿着此轴运算时都用此轴上的第一组值
>>> a = np.arange(0, 60, 10).reshape(-1, 1)
>>> a
array([[ 0], [10], [20], [30], [40], [50]])
>>> a.shape
(6, 1)
>>> b = np.arange(0, 5)
>>> b
array([0, 1, 2, 3, 4])
>>> b.shape
(5,)
>>> c = a + b
>>> c
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44],
[50, 51, 52, 53, 54]])
>>> c.shape
(6, 5)

#矩阵
>>> a = np.matrix([[1,2,3],[5,5,6],[7,9,9]])
>>> a**-1 #逆矩阵
matrix([[-0.6 , 0.6 , -0.2 ],
        [-0.2 , -0.8 , 0.6 ],
        [ 0.66666667, 0.33333333, -0.33333333]])
>>> a*a**-1
matrix([[ 1.00000000e+00, 0.00000000e+00, -5.55111512e-17],
        [ 4.44089210e-16, 1.00000000e+00, -1.11022302e-16],
        [ 4.44089210e-16, 0.00000000e+00, 1.00000000e+00]])
#写文件，读 。维度会变成一维的
tofile能够方便地将数组中数据以二进制的格式写进文件。tofile输出的数据没有格
式，所以用numpy.fromfile读回来的时候须要本身格式化数据：
>>> a = np.arange(0,12)
>>> a.shape = 3,4
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> a.tofile("a.bin")
>>> b = np.fromfile("a.bin", dtype=np.float) # 按照float类型读入数据
>>> b # 读入的数据是错误的
array([ 2.12199579e-314, 6.36598737e-314, 1.06099790e-313,
1.48539705e-313, 1.90979621e-313, 2.33419537e-313])
>>> a.dtype # 查看a的dtype
dtype('int32')
>>> b = np.fromfile("a.bin", dtype=np.int32) # 按照int32类型读入数据
>>> b # 数据是一维的
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> b.shape = 3, 4 # 按照a的shape修改b的shape
>>> b # 此次终于正确了
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> a.tofile("d:\\a1.bin",sep='#') #数组将以文本格式输入输出。，以#分隔。
>>> np.save("d:\\a.npy", a) #二进制
>>> c = np.load( "d:\\a.npy" )#维度未丢失，且不用设置dtype
>>> c
array([[ 0, 1, 2, 3],
       [ 4, 5, 6, 7],
       [ 8, 9, 10, 11]])
#np.savez() 存储多个数组
>>> a = np.array([[1,2,3],[4,5,6]])
>>> b = np.arange(0, 1.0, 0.1)
>>> c = np.sin(b)
>>> np.savez("result.npz", a, b, sin_array = c)
>>> r = np.load("result.npz")
>>> r["arr_0"] # 数组a
array([[1, 2, 3],
[4, 5, 6]])
>>> r["arr_1"] # 数组b
array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
>>> r["sin_array"] # 数组c
array([ 0. , 0.09983342, 0.19866933, 0.29552021, 0.38941834,
0.47942554, 0.56464247, 0.64421769, 0.71735609, 0.78332691])
若是你用解压软件打开result.npz文件的话，会发现其中有三个文件：arr_0.npy， arr_1.npy，
sin_array.npy，其中分别保存着数组a, b, c的内容。
# 读写txt文件
使用numpy.savetxt和numpy.loadtxt能够读写1维和2维的数组：
>>> a = np.arange(0,12,0.5).reshape(4,-1)
>>> np.savetxt("a.txt", a) # 缺省按照'%.18e'格式保存数据，以空格分隔
>>> np.loadtxt("a.txt")
array([[ 0. , 0.5, 1. , 1.5, 2. , 2.5],
[ 3. , 3.5, 4. , 4.5, 5. , 5.5],
[ 6. , 6.5, 7. , 7.5, 8. , 8.5],
[ 9. , 9.5, 10. , 10.5, 11. , 11.5]])
>>> np.savetxt("a.txt", a, fmt="%d", delimiter=",") #改成保存为整数，以逗号分隔
>>> np.loadtxt("a.txt",delimiter=",") # 读入的时候也须要指定逗号分隔
array([[ 0., 0., 1., 1., 2., 2.],
[ 3., 3., 4., 4., 5., 5.],
[ 6., 6., 7., 7., 8., 8.],
[ 9., 9., 10., 10., 11., 11.]])

本节介绍所举的例子都是传递的文件名，也能够传递已经打开的文件对象，例如对于load和save
函数来讲，若是使用文件对象的话，能够将多个数组储存到一个npy文件中：
>>> a = np.arange(8)
>>> b = np.add.accumulate(a)
>>> c = a + b
>>> f = file("result.npy", "wb")
>>> np.save(f, a) # 顺序将a,b,c保存进文件对象f
>>> np.save(f, b)
>>> np.save(f, c)
>>> f.close()
>>> f = file("result.npy", "rb")
>>> np.load(f) # 顺序从文件对象f中读取内容
array([0, 1, 2, 3, 4, 5, 6, 7])
>>> np.load(f)
array([ 0, 1, 3, 6, 10, 15, 21, 28])
>>> np.load(f)
array([ 0, 2, 5, 9, 14, 20, 27, 35])