Hive设置变量

时间 2019-11-10

标签 hive 设置变量栏目 Hadoop 繁體版

原文原文链接

hive --define --hivevar --hiveconf

set

一、hivevar命名空间

用户自定义变量

     
     
     
     
      
      
      
      
      
      
      
      
      
      
      
      
     
     
     
     
hive -d name=zhangsanhive --define name=zhangsanhive -d a=1 -d b=2

效果跟hivevar是同样的

     
     
     
     
      
      
      
      
     
     
     
     
hive --hivevar a=1 --hivevar b=2

引用hivevar命名空间的变量时，变量名前面能够加hivevar:也能够不加

     
     
     
     
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
     
     
     
     
set name;set name=zhangsan;set hivevar:name;set hivevar:name=zhangsan;

在代码中使用${}引用，变量名前面能够加hivevar:也能够不加

    
    
    
    
     
     
     
     
    
    
    
    
create table ${a} ($(b) int);

二、hiveconf命名空间

hive的配置参数，覆盖hive-site.xml（hive-default.xml）中的参数值

    
    
    
    
     
     
     
     
    
    
    
    
hive --hiveconf hive.cli.print.current.db=true --hiveconf hive.cli.print.header=true

    
    
    
    
     
     
     
     
    
    
    
    
hive --hiveconf hive.root.logger=INFO,console

启动时指定用户目录，不一样的用户不一样的目录

    
    
    
    
     
     
     
     
    
    
    
    
hive --hiveconf hive.metastore.warehouse.dir=/hive/$USER

引用hiveconf命名空间的变量时，变量名前面能够加hiveconf:也能够不加

    
    
    
    
     
     
     
     
     
     
     
     
    
    
    
    
set hive.cli.print.header;set hive.cli.print.header=false;

三、sytem命名空间

JVM的参数，不能经过hive设置，只能读取

引用时，前面必须加system:

    
    
    
    
     
     
     
     
    
    
    
    
set sytem:user.name;

    
    
    
    
     
     
     
     
    
    
    
    
create table ${system:user.name} (a int);

四、env命名空间

shell环境变量，引用时必须加env:

   
   
   
   
    
    
    
    
    
    
    
    
   
   
   
   
set env:USER;set env:HADOOP_HOME;

    
    
    
    
     
     
     
     
    
    
    
    
create table ${env:USER} (${env:USER} string);

附录：经常使用的设置

在会话里输出日志信息

   
   
   
   
    
    
    
    
   
   
   
   
hive --hiveconf hive.root.logger=DEBUG,console

也能够修改$HIVE_HOME/conf/hive-log4j.properties的hive.root.logger属性，可是用set命令是不行的。

显示当前数据库

   
   
   
   
    
    
    
    
   
   
   
   
set hive.cli.print.current.db=true;

显示列名称

   
   
   
   
    
    
    
    
   
   
   
   
set hive.cli.print.header=true;

向桶表中插入数据前，须要启用桶

   
   
   
   
    
    
    
    
    
    
    
    
    
    
    
    
   
   
   
   
create table t1 (id int) clustered by (id) into 4 buckets;set hive.enforce.bucketing=true;insert into table t1 select * from t2;

向桶表insert数据时，hive自动根据桶表的桶数设置reduce的个数。不然须要手动设置reduce的个数：set mapreduce.job.reduces=N（桶表定义的桶数）或者mapred.reduce.tasks，而后在select语句后加clustered by

动态分区相关

   
   
   
   
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
   
   
   
   
set hive.exec.dynamic.partition=true #开启动态分区set hive.exec.dynamic.partition.mode=nostrict #动态分区模式：strict至少要有个静态分区，nostrict不限制set hive.exec.max.dynamic.partitions.pernode=100 #每一个mapper节点最多建立100个分区set hive.exec.max.dynamic.partitions=1000 #总共能够建立的分区数

from t insert overwrite table p partition(country, dt) select ... cuntry, dt

上面的查询在执行过程当中，单个map里的数量不受控制，可能会超过hive.exec.max.dynamic.partition.pernode配置的数量，能够经过对分区字段分区解决，上面的sql改为：

from t insert overwrite table p partition(country, dt) select ... cuntry, dt distributed by country, dt;

hive操做的执行模式

   
   
   
   
    
    
    
    
   
   
   
   
set hive.mapred.mode=strict

strict：不执行有风险（巨大的mapreduce任务）的操做，好比： 笛卡尔积、没有指定分区的查询、bigint和string比较、bigint和double比较、没有limit的orderby

nostrict：不限制

压缩mapreduce中间数据

   
   
   
   
    
    
    
    
   
   
   
   
set hive.exec.compress.intermediate=true;

    
    
    
    
     
     
     
     
    
    
    
    
setmapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec; #设置中间数据的压缩算法，默认是org.apache.hadoop.io.compress.DefaultCodec

压缩mapreduce输出结果

   
   
   
   
    
    
    
    
   
   
   
   
set hive.exec.compress.output=true;

    
    
    
    
     
     
     
     
    
    
    
    
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec #设置输出数据的压缩算法，使用GZip能够得到更好的压缩率，但对mapreduce而言是不可分隔的

     
     
     
     
      
      
      
      
     
     
     
     
set mapreduce.output.fileoutputformat.compress.type=BLOCK; #若是输出的是SequenceFile，则使用块级压缩

启用对分区归档

   
   
   
   
    
    
    
    
   
   
   
   
set hive.archive.enabled=true;

来自为知笔记(Wiz)