运行impala tpch

1.安装git和下载tpc-h-impala脚步git

[root@ip-172-31-34-31 ~]# yum install gitgithub

[root@ip-172-31-34-31 ~]# git clone https://github.com/kj-ki/tpc-h-impalashell

[root@ip-172-31-34-31 ~]# cd tpc-h-impala/bash

[root@ip-172-31-34-31 tpc-h-impala]# ls
benchmark.conf confs data README.md tpch_benchmark.sh tpch_hive tpch_impala tpch_prepareoop

2.将tpch dbgen tool生成好的数据移动到指定目录
[root@ip-172-31-34-31 data]# mv /root/tpch_2_17_0/data10g/*.tbl /root/tpc-h-impala/dataspa

3.调整tpc-h-impala脚本code

因为涉及到权限问题,调整tpch_prepare_data.sh脚步:将第一行改成以下:
sudo -u hdfs /usr/bin/hadoop fs -mkdir /tpch/
并增长一行:
sudo -u hdfs /usr/bin/hadoop fs -chown root /tpchblog

4.运行脚步tpch_prepare_data.sh,将数据从本地写到HDFSip

[root@ip-172-31-34-31 data]# ./tpch_prepare_data.shhadoop

5.调整tpch_benchmark.sh脚本
因为在运行过程当中会在Hive上建表,这些表要对impala可见,须要运行invalidate metadata,在运行impala查询的语句前加入如下一行

$IMPALA_CMD -q 'invalidate metadata' 2>&1

 

#!/usr/bin/env bash

# set up configurations
source benchmark.conf;

if [ -e "$LOG_FILE" ]; then
        timestamp=`date "+%F-%R" --reference=$LOG_FILE`
        backupFile="$LOG_FILE.$timestamp"
        mv $LOG_FILE $LOG_DIR/$backupFile
fi

echo ""
echo "***********************************************"
echo "*          TPC-H benchmark on Impala          *"
echo "***********************************************"
echo "                                               "
echo "See $LOG_FILE for more details of query errors."
echo ""

trial=0
while [ $trial -lt $NUM_OF_TRIALS ]; do
        trial=`expr $trial + 1`
        echo "Executing Trial #$trial of $NUM_OF_TRIALS trial(s)..."

        for query in ${TPCH_QUERIES_ALL[@]}; do
                echo "Running query: $query" | tee -a $LOG_FILE

                echo "Running Hive prepare query: $query" >> $LOG_FILE
                $TIME_CMD $HIVE_CMD -f $BASE_DIR/tpch_prepare/${query}.hive 2>&1 | tee -a $LOG_FILE | grep '^Time:'
                returncode=${PIPESTATUS[0]}
                if [ $returncode -ne 0 ]; then
                        echo "ABOVE QUERY FAILED:$returncode"
                fi

                # If you want to use old beta, enable below.
                #$TIME_CMD $IMPALA_CMD -q 'refresh' 2>&1 | tee -a $LOG_FILE | grep '^Time:'
                #returncode=${PIPESTATUS[0]}
                #if [ $returncode -ne 0 ]; then
                #       echo "ABOVE QUERY FAILED:$returncode"
                #fi

                echo "Running Impala query: $query" >> $LOG_FILE
                $IMPALA_CMD -q 'invalidate metadata' 2>&1
                $TIME_CMD $IMPALA_CMD --query_file=$BASE_DIR/tpch_impala/${query}.impala 2>&1 | tee -a $LOG_FILE | grep '^Time:'
                returncode=${PIPESTATUS[0]}
                if [ $returncode -ne 0 ]; then
                        echo "ABOVE QUERY FAILED:$returncode"
                fi

                #echo "Running Hive query: $query" >> $LOG_FILE
                #$TIME_CMD $HIVE_CMD -f $BASE_DIR/tpch_hive/${query}.hive 2>&1 | tee -a $LOG_FILE | grep '^Time:'
                #returncode=${PIPESTATUS[0]}
                #if [ $returncode -ne 0 ]; then
                #       echo "ABOVE QUERY FAILED:$returncode"
                #fi
        done

done # TRIAL
echo "***********************************************"

 

6.修改配置文件benchmark.conf,使指向正确的impala master:

因为在impala-shell的集群上没有配置impala-daemon,因此须要这个修改
# impala
IMPALA_CMD="/usr/bin/impala-shell --impalad=172.31.25.244:21000"

 

7.mr,hive,impala
注意,要运行impala,hive必须先启动MR

8.运行benmark脚本[root@ip-172-31-34-31 tpc-h-impala]# pwd/root/tpc-h-impala[root@ip-172-31-34-31 tpc-h-impala]# ./tpch_benchmark.sh

相关文章
相关标签/搜索