presto入门安装使用

为了分析海量数据,须要寻找一款分布式计算的开源项目,之前用的比较多的是hive,可是因为hive任务最终会被解析成MR任务,MR从硬盘读取数据并把中间结果写进硬盘,速度很慢,因此要寻找一款基于内存计算的开源项目,presto是Facebook开源的,基于内存的分布式计算框架。html

Presto优势java

1. 基于标准的ANSI SQL,有sql基础的都能快速使用node

2. 安装部署简单sql

3. 基于内存计算,不要依赖MR,速度比hive快不少app

4. 数据源解耦框架

安装使用参考:maven

https://prestodb.io/分布式

http://prestodb-china.com/docs/current/index.htmloop

安装url

解压修改核心配置:

etc/node.properties 配置每一个节点信息

node.environment=production
node.id=datanode4
node.data-dir=/data/presto

etc/config.properties 配置server的配置信息

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=9999
query.max-memory=4GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://datanode4:9999
exchange.http-client.request-timeout=120s

etc/catalog/hive.properties hive链接器

connector.name=hive-hadoop2
hive.metastore.uri=thrift://datanode2:9083
hive.allow-drop-table=true
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

bin/launcher start

界面http://datanode4:9999/

使用

用hive的元数据,建立hive库:

create database if not exists monitor location '/apps/hive/warehouse/monitor';

建立hive表:

use monitor;
create external table  if not exists monitor.url_monitor_report
(product  STRING,
url      STRING,
span  INT,
ymd    STRING,
hms    STRING,
succeed INT)
Partitioned by (p_ymd STRING,p_hour   STRING,p_minute STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY  '\t'
Location  '/apps/hive/warehouse/monitor/url_monitor_report'
;

这个时候对应的hdfs目录已经存在了,

 

生成分区:

 

alter table monitor.url_monitor_report add if not exists
partition (p_ymd='2016-06-23',p_hour='00',p_minute='00')  location '/apps/hive/warehouse/monitor/url_monitor_report/2016-06-23/00/00'
......//省略
;

数据直接写到对应的目录文件便可:

1. 命令行使用:

/opt/presto/bin/presto --server 172.172.178.72:9999 --catalog hive --schema monitor

(presto是presto-cli-excute.jar进行重命名,而且chmod后而来的,具体详细能够看presto-cli里面的pom.xml插件really-executable-jar-maven-plugin)

presto:monitor>select * from monitor.url_monitor_report where p_ymd>='2016-06-23' and p_ymd<='2016-06-23'

2. JDBC方式使用:

<dependency>
	<groupId>com.facebook.presto</groupId>
	<artifactId>presto-jdbc</artifactId>
	<version>0.144.1</version>
</dependency>

代码:

public static void main(String[] args) throws SQLException {
	String sql = "select distinct(url) from monitor.url_monitor_report where p_ymd>='2016-06-23' and p_ymd<='2016-06-23'";
	Connection conn = DriverManager.getConnection("jdbc:presto://172.172.178.72:9999/hive/monitor", "hive", "hive");
	Statement stmt = conn.createStatement();
	ResultSet result = stmt.executeQuery(sql);
	while (null != result && result.next()) {
		String url = result.getString("url");
		System.out.println(url);
	}
	result.close();
	stmt.close();
	conn.close();
}
相关文章
相关标签/搜索