Hive是什么?java
由facebook开源,最初用于解决海量结构化的日志数据统计问题;
ETL (Extraction-Transformation-Loading )工具mysql
构建在Hadoop之上的数据仓库;
数据计算使用MR ,数据存储使用HDFSsql
Hive 定义了一种类 SQL 查询语言——HQL;
相似SQL , 但不彻底相同数据库
一般用于进行离线数据处理(采用MapReduce);express
可认为是一个HQL MR的语言翻译器。apache
Hive典型应用场景bash
日志分析
统计网站一个时间段内的pv、uv
多维度数据分析
大部分互联网公司使用Hive进行日志分析,包括百度、淘宝等架构
其余场景
海量结构化数据离线分析
低成本进行数据分析(不直接编写MR)app
为何使用Hive?less
简单、容易上手
提供了类SQL 查询语言HQL ;
为超大数据集设计的计算/扩展能力
MR 做为计算引擎,HDFS
Hive各模块组成
用户接口
包括 CLI ,JDBC/ODBC ,WebUI
元数据存储(metastore)
默认存储在自带的数据库derby 中,线上使用时通常换为MySQL
驱动器(Driver)
解释器、编译器、优化器、执行器
Hadoop
用 MapReduce进行计算,用HDFS进行存储
Hive部署架构-实验环境
数据类型(不断增长中……)
数据定义语句(DDL)
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name (col_name data_type, ...) [PARTITIONED BY (col_name data_type, ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...)] [ [ROW FORMAT row_format] [STORED AS file_format] ] [LOCATION hdfs_path]
1:下载地址
http://archive.apache.org/dist/hive
2:解压
3:配置hive的环境变量
在当前用户的.bashrc中配置以下内容
export HIVE_HOME=/home/hadoop/bd/apache-hive-2.1.0-bin
4:配置hive安装目录下的conf目录下的hive-env.sh文件
该文件能够经过复制hive-env.sh.template更名得来
配置内容以下:
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/home/hadoop/bd/hadoop-2.7.3
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/home/hadoop/bd/apache-hive-2.1.0-bin/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/home/hadoop/bd/apache-hive-2.1.0-bin/lib
5:修改hive的日志文件存放的地址
cp hive-log4j2.properties.template hive-log4j2.properties
经过vi修改日志的存放文件
property.hive.log.dir = /home/hadoop/bd/apache-hive-2.1.0-bin/logs
6:启动hadoop集群
7:安装默认的derby数据库为hive的元数据库
能够先经过./schematool --help 命令来查看schematool命令的一些选项
./schematool -dbType derby -initSchema,使用这个命令来安装derby数据库为元数据
8:执行bin目录下的hive命令,进入hive命令行
./hive
若是没有问题的话,hive就安装成功了
1:建立表
create table 表名
指定分隔符建立表:create table teacher (id int, name string) row format delimited fields terminated by '\t';
二:更改元数据库为mysql
1:复制文件hive-default.xml.template改名为hive-site.xml
cp hive-default.xml.template hive-site.xml
2:清空hive-site.xml里面的配置信息
添加咱们自定义的信息
<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hm02:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123</value> </property> </configuration>
3:拷贝mysql驱动jar包到hive安装目录下的lib目录
4:mysql受权以及实例化metastore
1)若是以前对该主机和用户进行了受权,那么能够不用再次受权,不然进行受权,参考sqoop那章
(grant all privileges on *.* to root@'主机名' identified by '密码')
前提是use mysql这个库。
2)实例化metastore命令:
./schematool -dbType mysql -initSchema
5:关于mysql数据库做为元数据库的几点说明
1)hive当中建立的表的信息,在元数据库的TBLS表里面
2)这个表的字段信息,在元数据库的COLUMNS_V2表里面
3)这个表在HDFS上面的位置信息,在元数据库的SDS表里面
hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --><configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hm02:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123</value> </property> </configuration>
hive-site-back.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --><configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hm:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123</value> </property> </configuration>