根据列名来选择某列的数据python
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) # 选择A列数据 print("A列数据:") print(data["A"])
输出结果:code
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 A列数据: 2017-01-08 0 2017-01-09 4 2017-01-10 8 2017-01-11 12 2017-01-12 16 2017-01-13 20 Freq: D, Name: A, dtype: int32
也能够用点符号来进行:索引
print(data.A)
上面的功能跟data["A"]同样。pandas
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("选择0至3行的数据:") print(data[0:3])
输出为:io
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 选择0至3行的数据: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11
也能够根据索引号范围来选择某几行的数据。
好比,以下的例子中咱们就选择出2017-01-10到2017-01-12的数据:class
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("按照索引选择数据:") print(data["2017-01-10":"2017-01-12"])
输出为:import
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 按照索引选择数据: A B C D 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19
使用loc选择某几行的数据:date
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("按照索引选择数据:") print(data.loc["2017-01-10":"2017-01-12"])
输出:numpy
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 按照索引选择数据: A B C D 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19
也能够按照列进行选择数据,好比,咱们想要选择其中B和C列的数据:方法
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("选择某两列的数据:") print(data.loc[:, ["B", "C"]])
输出为:
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 选择某两列的数据: B C 2017-01-08 1 2 2017-01-09 5 6 2017-01-10 9 10 2017-01-11 13 14 2017-01-12 17 18 2017-01-13 21 22
若是只想选择某几行中某几列的数据,能够对上面的例子进行一下稍微的修改就能实现:
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("选择某几行某几列的数据:") print(data.loc["2017-01-09":"2017-01-12", ["B", "C"]])
输出为:
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 选择某几行某几列的数据: B C 2017-01-09 5 6 2017-01-10 9 10 2017-01-11 13 14 2017-01-12 17 18
位置索引的方法为iloc,例如,选择第3行第2列的数据:
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("选择第3行第2列的数据:") print(data.iloc[3, 1])
输出:
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 选择第3行第2位的数据: 13
固然,咱们也能够在iloc中使用切片,好比,我想选择出从第3行以后的第2列数据:
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("选择第3行以后第2列的数据:") print(data.iloc[3:, 1])
输出为:
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 选择第3行以后第2列的数据: 2017-01-11 13 2017-01-12 17 2017-01-13 21 Freq: D, Name: B, dtype: int32
咱们也能够单独地选择某几行的数据,例如:
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("选择第1,3,5行第1到第3列的数据:") print(data.iloc[[1, 3, 5], 1:3])
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 选择第3行以后第2列的数据: B C 2017-01-09 5 6 2017-01-11 13 14 2017-01-13 21 22
好比行用数字来筛选,而列用标签来进行筛选,例如:
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("选择第1,3,5行第1到第3列的数据:") print(data.ix[[1, 3, 5], ["A", "C"]])
输出为:
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 选择第1,3,5行第1到第3列的数据: A C 2017-01-09 4 6 2017-01-11 12 14 2017-01-13 20 22
相似于SQL中where column < xxx这种类型的选择。
例如,选择出A列小于8的数据:
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("根据某列中的数值进行筛选:") print(data[data.A < 8])
输出为:
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 选择根据某列中的数值进行筛选: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7
若是想要进行联合索引,好比where A<8 and B < 5,则:
import pandas as pd import numpy as np dates = pd.date_range("2017-01-08", periods=6) data = pd.DataFrame(np.arange(24).reshape(6, 4), index=dates, columns=["A", "B", "C", "D"]) print("data:") print(data) print("根据某列中的数值进行筛选:") data = data[data.A < 8] print(data[data.B < 5])
输出为:
data: A B C D 2017-01-08 0 1 2 3 2017-01-09 4 5 6 7 2017-01-10 8 9 10 11 2017-01-11 12 13 14 15 2017-01-12 16 17 18 19 2017-01-13 20 21 22 23 根据某列中的数值进行筛选: A B C D 2017-01-08 0 1 2 3