原文地址python
SQL 是用于访问和处理数据库的标准的计算机语言。 经常使用于数据库管理系统(RDBMS)中, 这类数据库包括 MySQL、SQL Server、Oracle等。mysql
Pandas 是一个开源的,为 Python 提供高性能的,数据结构以及数据分析工具。sql
在熟练地使用 SQL 的同时,为知足一些的业务需求,经常须要咱们将数据提取后,再对数据进行统计分析,那应该如何使用 Pandas 达到和 SQL 同样的效果呢?数据库
下面的速查表将会逐一使用 Pandas 对常见的 SQL 语句进行映射。数据结构
本篇将解构下面的 SQL 查询句式, 使用 Pandas 进行实现工具
SQL 查询句式post
SELECT DISTINCT [字段]
FROM [表] JOIN [bin] ON [链接条件]
WHERE [过滤条件]
GROUP BY [字段]
HAVING [条件]
ORDER BY [字段] DESC
LIMIT [个数] OFFSET [个数]
复制代码
import pandas as pd
import pymysql
复制代码
conn = pymysql.connect(host='127.0.0.1',
user='root',
password='12345678',
db='test_db')
df = pd.read_sql(sql="select * from student", con=conn)
复制代码
数据预览性能
df
复制代码
id | name | age | sex | city | money | |
---|---|---|---|---|---|---|
0 | 1 | 赵雷 | 1990-01-01 | 男 | 北京 | 20.0 |
1 | 2 | 钱电 | 1990-12-21 | 男 | 天津 | 30.0 |
2 | 3 | 孙风 | 1990-12-20 | 男 | 成都 | 2.0 |
3 | 4 | 李云 | 1990-12-06 | 男 | 北京 | 100.0 |
4 | 5 | 周梅 | 1991-12-01 | 女 | 成都 | 50.0 |
5 | 6 | 吴兰 | 1992-01-01 | 女 | 北京 | 3.0 |
6 | 7 | 郑竹 | 1989-01-01 | 女 | 成都 | 200.0 |
7 | 8 | 张三 | 2017-12-20 | 女 | 天津 | 20.0 |
8 | 9 | 李四 | 2017-12-25 | 女 | 西安 | 35.0 |
9 | 10 | 李四 | 2012-06-06 | 女 | 北京 | 40.0 |
10 | 11 | 赵六 | 2013-06-13 | 女 | 成都 | 5.0 |
11 | 12 | 孙七 | 2014-06-01 | 女 | 天津 | 210.0 |
SQL测试
SELECT * FROM student
SELECT id, name, sex FROM student
复制代码
Pandasspa
df
df[['id','name','sex']]
复制代码
SQL
SELECT DISTINCT name FROM student
复制代码
Pandas
df['name'].unique()
复制代码
SQL
SELECT COUNT(*) FROM student
SELECT SUM(money) FROM student
SELECT id, MAX(money) FROM student
SELECT id, MIN(money) FROM student
SELECT AVG(money) FROM student
复制代码
Pandas
df['id'].count()
df['money'].sum()
df[df['money'] == df['money'].max()]
df[df['money'] == df['money'].min()]
df['money'].mean()
复制代码
描述性统计数据
In [1]: df['money'].describe()
Out[1]: count 12.000000
mean 59.583333
std 72.963825
min 2.000000
25% 16.250000
50% 32.500000
75% 62.500000
max 210.000000
Name: money, dtype: float64
复制代码
例子: =
SQL
SELECT * FROM student WHERE sex = '男'
复制代码
Pandas
df[df['sex'] == ('男')]
复制代码
例子: in & not in
SQL
SELECT * FROM student WHERE id IN (2,4,6,8,10)
SELECT * FROM student WHERE id NOT IN (2,4,6,8,10)
复制代码
Pandas
df[df['id'].isin((2,4,6,8))]
df[~df['id'].isin((2,4,6,8))]
复制代码
多个条件
SQL
SELECT * FROM student WHERE sex = '男' and id IN (2,4,6,8,10)
复制代码
Pandas
df[(df['sex'] == ('男')) & (df['id'].isin((2,4,6,8)))]
复制代码
SQL
SELECT * FROM student ORDER BY id DESC LIMIT 3
SELECT * FROM student ORDER BY id DESC LIMIT 3 OFFSET 2
复制代码
Pandas
df.sort_values('id',ascending=False).head(3)
df.nlargest(2 + 3, 'id').tail(3)
复制代码
SQL
SELECT * FROM student WHERE sex = '男' LIMIT 3
SELECT id, name, sex FROM student WHERE sex ='男' LIMIT 3
复制代码
Pandas
df[df['sex'] == ('男')].head(3)
df[df['sex'] == ('男')][['id','name','sex']].head(3)
复制代码
SQL
SELECT * FROM student ORDER BY age
SELECT * FROM student ORDER BY age DESC
复制代码
Pandas
df.sort_values('age')
df.sort_values('age', ascending=False)
复制代码
GROYP BY & COUNT
SQL
SELECT city, COUNT(*) FROM student GROUP BY city
复制代码
Pandas
df.groupby(['city']).size().to_frame('size').reset_index()
复制代码
GROYP BY & SUM
SQL
SELECT city, SUM(money) FROM student GROUP BY city
复制代码
Pandas
df.groupby(['city'])['money'].agg('sum').reset_index()
复制代码
GROUP BY 单字段
SQL
SELECT city, COUNT(*) FROM student GROUP BY sex ORDER BY city
复制代码
Pandas
df.groupby(['city']).size().to_frame('size').reset_index().sort_values('city')
复制代码
GROUP BY 多字段
SQL
SELECT city, sex, COUNT(*) FROM student GROUP BY city, sex ORDER BY city
复制代码
Pandas
df.groupby(['city','sex']).size().to_frame('size').reset_index().sort_values('city')
复制代码
SQL
SELECT city, COUNT(*) FROM student GROUP BY city HAVING count(*) > 3
复制代码
Pandas
df.groupby('city').filter(lambda x:len(x)>3).groupby('city').size().to_frame('size').reset_index()
复制代码
系列文章