1.建立视图,视图是个虚表
create view v1 as select a.id aid,a.name,b.id bid,b.orderno from customers a left outer join default.ttt b on a.id=b.cid;//建立视图字段不要重复,若是有重复须要b.id bid进行重复命名
2.查看视图
show tables;
3.子查询,将查询出来的结果再进行一次过滤
select aid from v1 ;
4.笛卡尔积:两个表之间不加任何成分作链接,express
map端链接:
$hive>set hive.auto.convert.join=true; //设置自动转换链接,默认机试true
//使用mapjoin链接按时实现map链接
$hive> select /+mapjoin(customers)/ a.,b. from customers aleft outer join orders b on a.id = b.cid;
调优
-----------------------------
1.使用explain查看查询计划
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1apache
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: customers
Statistics: Num rows: 1 Data size: 38 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
Statistics: Num rows: 1 Data size: 38 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
aggregations: count()
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: bigint)
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeoop
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink优化
二、设置limit优化策略,避免所有查询 hive>set hive.limit.optimize.enable=true; 设置本地模式运行 $hive>set mapred.job.tracker=local; //设置为自动本地模式 $hive>set hive.exec.mode.local.auto=true; //并行执行 $hive>set hive.exec.parallel=true; //设置并行查询,只是执行不存在依赖关系的阶段 //严格模式:至少指定一个静态分区,不能都是动态的 $hive>set hive.mapred.mode =strict //1.严格模式必定要指定分区表,不然不让查询 //2.order by时必须使用limit子句 //不容许笛卡尔积查询 //设置MR的数量