开发交流QQ群: 941879291html
SQLflow based on python development, support to Spark, as the underlying distributed computing engine, through a set of unified configuration file to complete the batch, flow calculation, the Rest service development.前端
主页:
<div align="center">
<img src="https://upload-images.jianshu...; alt="SQLflow Logo" width="500px"></img>
</div>
结果页:
<div align="center">
<img src="https://upload-images.jianshu...; alt="SQLflow Logo" width="500px"></img>
</div>python
SQLflow 基于python开发, 支持经过写sql的方式操做分布式集群, 数据处理, 机器学习、深度学习模型训练, 模型部署, 分布式爬虫, 数据可视化等。git
python3.6github
git clone https://github.com/lqkweb/sql...web
pip install -r requirements.txtajax
python manage.py算法
主页:http://127.0.0.1:5000
脚本页面:http://127.0.0.1:5000/script
单sql页面:http://127.0.0.1:5000/sql sql
【注意:一、下载apache spark文件配置manage.py中的SPARK_HOME路径。二、data.csv是放到sqlflow/data目录中】apache
在脚本执行页面:http://127.0.0.1:5000/script 输入 select from A limit 3; 或者 select from A limit 3 as B; 生成临时表A或者B
生成临时表A数据:
select * from A limit 3;
生成临时表B数据:
select * from A limit 3 as B;
打开单sql执行页面:http://127.0.0.1:5000/sql, 直接就能够用spark sql任意语法操做数据表A和数据表B了:
desc A select * from A limit 2 select * from B limit 2
[注] "as B" 至关于建立了一个 B 临时表。
一个简单的sql操做spark集群的Demo,是否是很简单。
[附] sparksql doc: https://spark.apache.org/docs...