为了分析海量数据,须要寻找一款分布式计算的开源项目,之前用的比较多的是hive,可是因为hive任务最终会被解析成MR任务,MR从硬盘读取数据并把中间结果写进硬盘,速度很慢,因此要寻找一款基于内存计算的开源项目,presto是Facebook开源的,基于内存的分布式计算框架。html
Presto优势java
1. 基于标准的ANSI SQL,有sql基础的都能快速使用node
2. 安装部署简单sql
3. 基于内存计算,不要依赖MR,速度比hive快不少app
4. 数据源解耦框架
安装使用参考:maven
https://prestodb.io/分布式
http://prestodb-china.com/docs/current/index.htmloop
安装url
解压修改核心配置:
etc/node.properties 配置每一个节点信息
node.environment=production node.id=datanode4 node.data-dir=/data/presto
etc/config.properties 配置server的配置信息
coordinator=true node-scheduler.include-coordinator=true http-server.http.port=9999 query.max-memory=4GB query.max-memory-per-node=1GB discovery-server.enabled=true discovery.uri=http://datanode4:9999 exchange.http-client.request-timeout=120s
etc/catalog/hive.properties hive链接器
connector.name=hive-hadoop2 hive.metastore.uri=thrift://datanode2:9083 hive.allow-drop-table=true hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
bin/launcher start
界面http://datanode4:9999/
使用
用hive的元数据,建立hive库:
create database if not exists monitor location '/apps/hive/warehouse/monitor';
建立hive表:
use monitor; create external table if not exists monitor.url_monitor_report (product STRING, url STRING, span INT, ymd STRING, hms STRING, succeed INT) Partitioned by (p_ymd STRING,p_hour STRING,p_minute STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' Location '/apps/hive/warehouse/monitor/url_monitor_report' ;
这个时候对应的hdfs目录已经存在了,
生成分区:
alter table monitor.url_monitor_report add if not exists partition (p_ymd='2016-06-23',p_hour='00',p_minute='00') location '/apps/hive/warehouse/monitor/url_monitor_report/2016-06-23/00/00' ......//省略 ;
数据直接写到对应的目录文件便可:
1. 命令行使用:
/opt/presto/bin/presto --server 172.172.178.72:9999 --catalog hive --schema monitor
(presto是presto-cli-excute.jar进行重命名,而且chmod后而来的,具体详细能够看presto-cli里面的pom.xml插件really-executable-jar-maven-plugin)
presto:monitor>select * from monitor.url_monitor_report where p_ymd>='2016-06-23' and p_ymd<='2016-06-23'
2. JDBC方式使用:
<dependency> <groupId>com.facebook.presto</groupId> <artifactId>presto-jdbc</artifactId> <version>0.144.1</version> </dependency>
代码:
public static void main(String[] args) throws SQLException { String sql = "select distinct(url) from monitor.url_monitor_report where p_ymd>='2016-06-23' and p_ymd<='2016-06-23'"; Connection conn = DriverManager.getConnection("jdbc:presto://172.172.178.72:9999/hive/monitor", "hive", "hive"); Statement stmt = conn.createStatement(); ResultSet result = stmt.executeQuery(sql); while (null != result && result.next()) { String url = result.getString("url"); System.out.println(url); } result.close(); stmt.close(); conn.close(); }