1. 简介php
Hive是基于Hadoop的一个数据仓库工具,能够将结构化的数据文件映射为一张数据库表,并提供完整的sql查询功能,能够将sql语句转换为MapReduce任务进行运行。 其优势是学习成本低,能够经过类SQL语句快速实现简单的MapReduce统计,没必要开发专门的MapReduce应用,十分适合数据仓库的统计分析。html
Hive与HBase的整合功能的实现是利用二者自己对外的API接口互相进行通讯,相互通讯主要是依靠hive_hbase-handler.jar工具类, 大体意思如图所示:java
2. Hive项目介绍node
Hive配置文件介绍
•hive-site.xml hive的配置文件
•hive-env.sh hive的运行环境文件
•hive-default.xml.template 默认模板
•hive-env.sh.template hive-env.sh默认配置
•hive-exec-log4j.properties.template exec默认配置
• hive-log4j.properties.template log默认配置
hive-site.xml
< property>
<name>javax.jdo.option.ConnectionURL</name> <value>jdbc:MySQL://localhost:3306/hive?createData baseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>test</value>
<description>password to use against metastore database</description>
</property>
mysql
hive-env.sh
•配置Hive的配置文件路径
•export HIVE_CONF_DIR= your path
•配置Hadoop的安装路径
•HADOOP_HOME=your hadoop homelinux
咱们按数据元的存储方式不一样安装。sql
3. 使用Derby数据库安装shell
Hadoop集群配置:http://blog.csdn.net/hguisu/article/details/723739数据库
hbase安装配置:http://blog.csdn.net/hguisu/article/details/7244413apache
hive目前最新的版本是0.12,咱们先从http://mirror.bit.edu.cn/apache/hive/hive-0.12.0/ 上下载hive-0.12.0.tar.gz,可是请注意,此版本基因而基于hadoop1.3和hbase0.94的(若是安装hadoop2.X ,咱们须要修改相应的内容)
tar zxvf hive-0.12.0.tar.gz
cd hive-0.12.0
拷贝protobuf.**.jar和zookeeper-3.4.5.jar到hive/lib下。
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- Hive Execution Parameters --> <property> <name>hive.exec.reducers.bytes.per.reducer</name> <value>1000000000</value> <description>size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers.</description> </property> <property> <name>hive.exec.reducers.max</name> <value>999</value> <description>max number of reducers will be used. If the one specified in the configuration parameter mapred.reduce.tasks is negative, hive will use this one as the max number of reducers when automatically determine number of reducers.</description> </property> <property> <name>hive.exec.scratchdir</name> <value>/hive/scratchdir</value> <description>Scratch space for Hive jobs</description> </property> <property> <name>hive.exec.local.scratchdir</name> <value>/tmp/${user.name}</value> <description>Local scratch space for Hive jobs</description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:derby:;databaseName=metastore_db;create=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>org.apache.derby.jdbc.EmbeddedDriver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.PersistenceManagerFactoryClass</name> <value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value> <description>class implementing the jdo persistence</description> </property> <property> <name>javax.jdo.option.DetachAllOnCommit</name> <value>true</value> <description>detaches all objects from session so that they can be used after transaction is committed</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>APP</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>mine</value> <description>password to use against metastore database</description> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/hive/warehousedir</value> <description>location of default database for the warehouse</description> </property> <property> <name>hive.aux.jars.path</name> <value> file:///home/hadoop/hive-0.12.0/lib/hive-ant-0.13.0-SNAPSHOT.jar, file:///home/hadoop/hive-0.12.0/lib/protobuf-java-2.4.1.jar, file:///home/hadoop/hive-0.12.0/lib/hbase-client-0.96.0-hadoop2.jar, file:///home/hadoop/hive-0.12.0/lib/hbase-common-0.96.0-hadoop2.jar, file:///home/hadoop/hive-0.12.0/lib/zookeeper-3.4.5.jar, file:///home/hadoop/hive-0.12.0/lib/guava-11.0.2.jar </value> </property>
1).单节点启动
#bin/hive -hiveconf hbase.master=master:490001
2) 集群启动:
#bin/hive -hiveconf hbase.zookeeper.quorum=node1,node2,node3
如何hive-site.xml文件中没有配置hive.aux.jars.path,则能够按照以下方式启动。
bin/hive --auxpath /usr/local/hive/lib/hive-hbase-handler-
0.96
.
0
.jar, /usr/local/hive/lib/hbase-
0.96
.jar, /usr/local/hive/lib/zookeeper-
3.3
.
2
.jar -hiveconf hbase.zookeeper.quorum=node1,node2,node3
注:使用derby存储方式时,运行hive会在当前目录生成一个derby文件和一个metastore_db目录。这种存储方式的弊端是在同一个目录下同时只能有一个hive客户端能使用数据库,不然报错。
4. 使用MYSQL数据库的方式安装
安装MySQL
• Ubuntu 采用apt-get安装
• sudo apt-get install mysql-server
• 创建数据库hive
• create database hivemeta
• 建立hive用户,并受权
• grant all on hive.* to hive@'%' identified by 'hive';
• flush privileges;
咱们直接修改hive-site.xml就能够啦。
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.metastore.warehouse.dir</name> <value>/hive/warehousedir</value> </property> <property> <name>hive.metastore.local</name> <value>false</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://192.168.1.214:9083</value> </property> </configuration>
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hive.exec.scratchdir</name> <value>/hive/scratchdir</value> <description>Scratch space for Hive jobs</description> </property> <property> <name>hive.exec.local.scratchdir</name> <value>/tmp/${user.name}</value> <description>Local scratch space for Hive jobs</description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.1.214:3306/hiveMeta?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> <description>password to use against metastore database</description> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/hive/warehousedir</value> <description>location of default database for the warehouse</description> </property> <property> <name>hive.aux.jars.path</name> <value> file:///home/hadoop/hive-0.12.0/lib/hive-ant-0.13.0-SNAPSHOT.jar, file:///home/hadoop/hive-0.12.0/lib/protobuf-java-2.4.1.jar, file:///home/hadoop/hive-0.12.0/lib/hbase-client-0.96.0-hadoop2.jar, file:///home/hadoop/hive-0.12.0/lib/hbase-common-0.96.0-hadoop2.jar, file:///home/hadoop/hive-0.12.0/lib/zookeeper-3.4.5.jar, file:///home/hadoop/hive-0.12.0/lib/guava-11.0.2.jar </value> <property> <name>hive.metastore.uris</name> <value>thrift://192.168.1.214:9083</value> </property> </property>
4. 与Hbase整合
以前咱们测试建立表的都是建立本地表,非hbase对应表。如今咱们整合回到hbase。
CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "xyz");
hbase.table.name 定义在hbase的table名称
hbase.columns.mapping 定义在hbase的列族
在hbase 下也能看到,两边新增数据都能实时看到。
能够登陆Hbase去查看数据了
#bin/hbase shell
hbase(main):001:0> describe 'xyz'
hbase(main):002:0> scan 'xyz'
hbase(main):003:0> put 'xyz','100','cf1:val','www.360buy.com'
这时在Hive中能够看到刚才在Hbase中插入的数据了。
使用sql导入hbase_table_1:
hive> INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=86;
使用CREATE EXTERNAL TABLE:
CREATE EXTERNAL TABLE hbase_table_2(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val") TBLPROPERTIES("hbase.table.name" = "some_existing_table");
内容参考:http://wiki.apache.org/hadoop/Hive/HBaseIntegration
5. 问题
bin/hive 执行show tables 报错:
Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
若是是使用Derby数据库的安装方式,查看
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/hive/warehousedir</value>
<description>location of default database for the warehouse</description>
</property>
配置是否正确,
或者
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
是否有权限访问。
若是配置了mysql的Metastore方式,检查的权限:
bin/hive -hiveconf hive.root.logger=DEBUG,console
而后show tables 就会看到ava.sql.SQLException: Access denied for user 'hive'@'××××8' (using password: YES) 之类从错误消息。
执行
CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");
报错:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:org.apache.hadoop.hbase.MasterNotRunningException: Retried 10 times
出现这个错误的缘由是引入的hbase包和hive自带的hive包冲突,删除hive/lib下的 hbase-0.94.×××.jar, OK了。
同时也要移走hive-0.12**.jar 包。
执行
hive>select uid from user limit 100;
Java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
解决:修改$HIVE_HOME/conf/hive-env.sh文件,加入
export HADOOP_HOME=hadoop的安装目录
5. 经过thrift访问hive(使用php作客户端)
使用php链接hive的条件:
wget http://mirror.bjtu.edu.cn/apache//thrift/0.9.1/thrift-0.9.1.tar.gz
tar -xzf thrift-0.9.1.tar.gz
若是是源码编译的,首先要使用./boostrap.sh建立文件./configure ,咱们这下载的tar包,自带有configure文件了。((能够查阅README文件))
If you are building from the first time out of the source repository, you will
need to generate the configure scripts. (This is not necessary if you
downloaded a tarball.) From the top directory, do:
./bootstrap.sh
./configure
1 须要安装thrift 安装步骤
# ./configure --without-ruby
不要使用ruby,
make ; make install
若是没有安装libevent libevent-devel的应该先安装这两个依赖库yum -y install libevent libevent-devel
其实Thrift就是使用来生成客户端和服务器端代码的。在这里没用到。
安装好后启动hive thrift
# ./hive --service hiveserver >/dev/null 2>/dev/null &
查看hiveserver默认端口是否打开10000 若是打开表示成功,在官网wiki有介绍文章:https://cwiki.apache.org/confluence/display/Hive/HiveServer
HiveServer is an optional service that allows a remote client to submit requests to Hive, using a variety of programming languages, and retrieve results. HiveServer is built on Apache ThriftTM(http://thrift.apache.org/), therefore it is sometimes called the Thrift server although this can lead to confusion because a newer service named HiveServer2 is also built on Thrift.
Thrift's interface definition language (IDL) file for HiveServer is hive_service.thrift
, which is installed in $HIVE_HOME/service/if/
.
Once Hive has been built using steps in Getting Started, the Thrift server can be started by running the following:
$ build
/dist/bin/hive
--service hiveserver --help
usage: hiveserver
-h,--help Print help information
--hiveconf <property=value> Use value
for
given property
--maxWorkerThreads <arg> maximum number of worker threads,
default:2147483647
--minWorkerThreads <arg> minimum number of worker threads,
default:100
-p <port> Hive Server port number, default:10000
-
v
,--verbose Verbose mode
$ bin
/hive
--service hiveserver
|
下载php客户端包:
其实hive-0.12包中自带的php lib,经测试,该包报php语法错误。命名空间的名称居然是空的。
我上传php客户端包:http://download.csdn.net/detail/hguisu/6913673(源下载http://download.csdn.net/detail/jiedushi/3409880)
php链接hive客户端代码
<?php // php链接hive thrift依赖包路径 ini_set('display_errors', 1); error_reporting(E_ALL); $GLOBALS['THRIFT_ROOT'] = dirname(__FILE__). "/"; // load the required files for connecting to Hive require_once $GLOBALS['THRIFT_ROOT'] . 'packages/hive_service/ThriftHive.php'; require_once $GLOBALS['THRIFT_ROOT'] . 'transport/TSocket.php'; require_once $GLOBALS['THRIFT_ROOT'] . 'protocol/TBinaryProtocol.php'; // Set up the transport/protocol/client $transport = new TSocket('192.168.1.214', 10000); $protocol = new TBinaryProtocol($transport); //$protocol = new TBinaryProtocolAccelerated($transport); $client = new ThriftHiveClient($protocol); $transport->open(); // run queries, metadata calls etc $client->execute('show tables'); var_dump($client->fetchAll()); $transport->close(); ?>
打开浏览器浏览http://localhost/Thrift/test.php就能够看到查询结果了