hive简单hql整理

时间 2019-11-24

标签 hive 简单 hql 整理栏目 Hadoop 繁體版

原文原文链接

DDL操做：增删改数据库表和数据库(hive中ddl操做是能够操做数据库的)node

DML操做：增删改数据数据库

HIVE中特别的字段集合类型：数组

Strutc(first String,last String): 由first 和last 组成一个字段session

Map(key,value,key,value...)：由key value 组成字段，须要指定哪一个是key 哪一个是valueapp

Array(value String,value String,value String)：由平级的字段来组成 spa

安装好hive后 (安装hive 过程不详述)输入 hive 进入hive：rest

进入hive后默认进入default数据库orm

use mydb;--使用本身建立的数据库ip

set hive.cli.print.currend.db=true;设置hive命令后显示数据库名ci

HIVE数据库相关操做

建立数据库

create database if not exists mydb

location '/my/databases'--指定数据文件存放位置。一般该位置是由hive.metastore.warehouse.dir 指定。也能够临时指定

comment 'my databases'--描述个人数据库

with dbproperties('creator' = 'pang','date' = '2015-4-15');--建立信息

查看数据库列表

show databases in mydb; --也能够show databases like 'my.*'展出以my开头的 .* 是匹配项。in mydb 是只列出mybd下的表

查看数据库信息

describe database extended mydb; --若不加extended 则只能看的到comment

删除数据库

drop database if exists financials cascade;删除数据库 cascade级联删除，若是数据库里面有表有数据须要经过级联删除才能删掉。

修改数据库

alter database mydb set dbproperties('edited-by','pang');--修改数据库信息

HIVE表相关操做

建立外部表

creat external table mytable(name String,age int)--建立外部表，用处是除了hive外的程序部件也可使用外部表

建立表：

create table mytable(name String,age int,friends Array(String),deduction Map(String,float),address Strutc(street:String ,city:String,state:String,zip:int))

row format delimited--每一行之间用分隔符来替代

fields terminated by '\001'--域之间用 \001 来分割即 ^A

collection items terminated by '\002'--数组之间元素用 \002 来分割即 ^B

map keys terminated by '\003'--key 和 value之间用 \003 来分割即 ^C

lines terminated by '\n'--行结束用回车

stored as textfile;--以textfile 格式来存储这个数据文件.

这些特定字段分割都是缺省分隔。若是不指定建立语句下方的分隔规则，则使用该分隔规则。

复制建立新表

cteate table if not exists mydb.mytable1 like mytable;--使用mydb. 能够指定数据库

查看表描述

describe extended mydb.mytable; --mydb.mytable.name这样既可查看某个字段的描述

更改表名

alter table mytables rename to tables;

更改列名列类型

alter table mytable change columns hms hours_minutes_seconds int after severity;

添加列

alter table mytable add columns(

app_name String comment 'application name',

session_id long

)

hive不支持行级别的insert、delete、update。将数据放入表中的惟一办法是批量载入。

加载本地文件到表中对应的分区下

load data local inpath '${env:home}/california-employees' overwrite into table mytable partition(country='us')

从另一张表查询出数据插入到表中

insert overwrite TABLE table partition (country='us') select * from mytables

建立表的同时插入数据

creat table mytable as select name,age from students where age='18'

修改某列信息

alter table mytable replace columns(

message String comment 'the rest of the message'

);

HIVE表分区

建立分区表

create table mytable(name String,age int,friends Array(String),deduction Map(String,float),address Strutc(street:String ,city:String,state:String,zip:int))

partitioned by(country String);--以国家为分区。分区关键字不必定在表字段上体现。分区表的存储会变成一个子目录里面的一系列文件

分区表查询存在一个strict模式(严格模式)概念。

set hive.mapred.mode=strict;

select e.name,e.salary from mytable e limit 100; 会报错

FAILED:ERRORin semantic analysis:No partition predicate found for alias "e" table "mytable"

set hive.mapred.mode=nonstrict; 则能够查询

可是：既然存在分区，则应该尽可能使用分区来查询，不用条件乱查效率低下。

列出分区

show partitions mytable partition (country='us');列出美国分区下的表

增长分区

alter table mutables add if not exists

partition(year = 2015 ,mouth =4 ,day =1) location '2015/4/1'

partition(year = 2015 ,mouth =4 ,day =2) location '2015/4/2'

partition(year = 2015 ,mouth =4 ,day =3) location '2015/4/3'...;

能够将一根非分区表按条件批量插入到分区表中

FROM mytable mt

insert overwrite TABLE table partition (country = 'us') select * where mt.cnty = 'us'

insert overwrite TABLE table partition (country = 'china') select * where mt.cnty = 'china'

insert overwrite TABLE table partition (country = 'japan') select * where mt.cnty = 'japan';

以上分区方法是否是太累赘？没办法hive就是这么笨拙。不过也不是没有办法：动态分区插入

hive> set hive.exec.dynamic.partition=true; 动态分区插入功能开启

hive> set hive.exec.dynamic.partition.mode=nonstrict; 全部的分区均可以动态插入

hive> set hive.exec.max.dynamic.partitions.oernode=1000;某一个表最大分区数(可选设置)

而后使用动态分区插入语句:

hive>insert overwrite Table mytable

>partition(country)

>select ...,cty,st

>from table;

HIVE数据导出

hive的数据文件是明文文本，能够直接拷贝导出。

若是须要改动数据格式，也可使用insert overwrite

insert overwrite local directory '/tmp/pang'

select name,salary,address from mytable where name like '%王%'

手动整理可能有打错的单词不要太认真！！hive更新较快语句持续更新！！