hive array、map、struct使用等数据类型

时间 2019-11-22

标签 hive array map struct 使用数据类型栏目 Hadoop 繁體版

原文原文链接

hive array、map、struct使用html

传统数据库是写时候校验，hive是读取时候校验java

describe extended h5_gif; 查看表的详细信息nginx

describe formatted h5_gif; 查看表的详细信息sql

普通表，分区表，外部表(建表须要:external)数据库

set hive.mapred.mode=strict; 禁止不加分区提交spa

show partitions nginx_log; 查看一个表所拥有的全部分区code

建表的例子
CREATE TABLE user(
name string,
info struct<name:STRING, age:INT>,
string      string
)
PARTITIONED BY(p_hour STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ':'
LINES TERMINATED BY '\n'
STORED AS RCFILE; // textFILE

load data local inpath '/root/java/testhive/user.log' overwrite into table user partition(p_hour="02")

select * from user where p_hour="02";

./hive -S -e "select * from user where p_hour='02'"; -S 去掉 “OK”，“time tiken”等orm

set hive.cli.print.header=true; 打印clnhtm

order by , sort by ,distribute by ,Cluster Byblog

order by 会对输入作全局排序，所以只有一个reducer（多个reducer没法保证全局有序）数据大的时候，计算时间长

sort by 对于在到reduce 前排序，保证reduce 输出是有序的

distribute by 根据指定的字段，将数据进入不一样的reduce

cluster by 除了具备 distribute by 的功能外还兼具 sort by 的功能。

可是排序只能是倒序排序，不能指定排序规则为asc 或者desc。

浮点数转化为整数不要用cast，而是用 round（）和 floor（）

采样通常用 rand（）和 bucket

hive array、map、struct使用 等数据类型

hive array、map、struct使用等数据类型