一、建立数据库shell
create database 数据库名;
二、使用数据库数据库
use 数据库名;
三、建立表oop
内部表:表目录安装hive的规范来部署,位于hive仓库目录/user/hive/warehouse中测试
create table t_pv_log(ip string,url string,access_time string ) row format delimited fields terminated by ',';
外部表:表目录由用户指定网站
在hdfs上建立文件夹url
hadoop fs -mkdir -p /pvlog/2017-09-16
准备测试数据code
192.168.33.1,http://sina.com/a,2017-09-16 12:52:01
192.168.33.2,http://sina.com/a,2017-09-16 12:51:01
192.168.33.1,http://sina.com/a,2017-09-16 12:50:01
192.168.33.2,http://sina.com/b,2017-09-16 12:49:01
192.168.33.1,http://sina.com/b,2017-09-16 12:48:01
192.168.33.4,http://sina.com/a,2017-09-16 12:47:01
192.168.33.3,http://sina.com/a,2017-09-16 12:46:01
192.168.33.2,http://sina.com/b,2017-09-16 12:45:01
192.168.33.2,http://sina.com/a,2017-09-16 12:44:01
192.168.33.1,http://sina.com/a,2017-09-16 13:43:01orm
将数据上传至hdfs中/pvlog/2017-09-16ip
hadoop fs -put ./pv.log /pvlog/2017-09-16
建立外部表:hadoop
create external table t_pv_log(ip string,url string,access_time string ) row format delimited fields terminated by ',' location '/pvlog/2017-09-16';
内部表和外部表区别:
内部表删除时表和数据同时删除
外部表只删除表,数据文件依旧存在于hdfs系统中
四、分区表
分区表的实质是:在表目录中为数据文件建立分区子目录,以便于在查询时,MR程序能够针对分区子目录中的数据进行处理,缩减读取数据的范围。
好比,网站天天产生的浏览记录,浏览记录应该建一个表来存放,可是,有时,咱们可能只须要对每一天的浏览记录进行分析
这时,就能够将这个表建为分区表,天天的数据导入其中的一个分区
准备数据:
192.168.33.1,http://sina.com/a,2017-09-16 12:52:01
192.168.33.2,http://sina.com/a,2017-09-16 12:51:01
192.168.33.1,http://sina.com/a,2017-09-16 12:50:01
192.168.33.2,http://sina.com/b,2017-09-16 12:49:01
192.168.33.1,http://sina.com/b,2017-09-15 12:48:01
192.168.33.4,http://sina.com/a,2017-09-15 12:47:01
192.168.33.3,http://sina.com/a,2017-09-15 12:46:01
192.168.33.2,http://sina.com/b,2017-09-15 12:45:01
192.168.33.2,http://sina.com/a,2017-09-15 12:44:01
192.168.33.1,http://sina.com/a,2017-09-15 13:43:01
建立分区表
create table t_pv_log(ip string,url string ,access_time string) partitioned by(day string) row format delimited fields terminated by ',';
将数据加载入新建的表中:
load data local inpath '/usr/local/hivetest/pv.log.15' into table t_pv_log partition(day='20170916');
经过分区字段查询数据:
0: jdbc:hive2://hadoop00:10000> select * from t_pv_log where day ='20170916'; +---------------+--------------------+-----------------------+---------------+--+ | t_pv_log.ip | t_pv_log.url | t_pv_log.access_time | t_pv_log.day | +---------------+--------------------+-----------------------+---------------+--+ | 192.168.33.1 | http://sina.com/a | 2017-09-16 12:52:01 | 20170916 | | 192.168.33.2 | http://sina.com/a | 2017-09-16 12:51:01 | 20170916 | | 192.168.33.1 | http://sina.com/a | 2017-09-16 12:50:01 | 20170916 | | 192.168.33.2 | http://sina.com/b | 2017-09-16 12:49:01 | 20170916 | | 192.168.33.1 | http://sina.com/b | 2017-09-16 12:48:01 | 20170916 | | 192.168.33.4 | http://sina.com/a | 2017-09-16 12:47:01 | 20170916 | | 192.168.33.3 | http://sina.com/a | 2017-09-16 12:46:01 | 20170916 | | 192.168.33.2 | http://sina.com/b | 2017-09-16 12:45:01 | 20170916 | | 192.168.33.2 | http://sina.com/a | 2017-09-16 12:44:01 | 20170916 | | 192.168.33.1 | http://sina.com/a | 2017-09-16 13:43:01 | 20170916 | +---------------+--------------------+-----------------------+---------------+--+
五、文件导入
方式1:
手动用hdfs命令,将文件放入表目录。
方式2:在hive的交互式shell中用hive命令来导入本地数据到表目录
load data local inpath '/usr/local/data/' into table order;
方式3:用hive命令导入hdfs中的数据文件到表目录
load data inpath ‘access.log’ into table t_access partition(day='20170916');
注意导入本地文件和导HDFS文件区别:
本地文件导入表:复制
HDFS文件导入表:移动