yum安装CDH5.5 hive、impala

时间 2019-11-17

标签 yum 安装 cdh5.5 cdh hive impala 栏目 Unix 繁體版

原文原文链接

1、安装hive

组件安排以下：java

172.16.57.75  bd-ops-test-75  mysql-server
172.16.57.77  bd-ops-test-77  Hiveserver2 HiveMetaStore

1.安装hive

在77上安装hive：node

# yum install hive hive-metastore hive-server2 hive-jdbc hive-hbase -y

在其余节点上能够安装客户端：mysql

# yum install hive hive-server2 hive-jdbc hive-hbase -y

2.安装mysql

yum方式安装mysql：sql

# yum install mysql mysql-devel mysql-server mysql-libs -y

启动数据库：shell

# 配置开启启动
# chkconfig mysqld on
# service mysqld start

安装jdbc驱动：数据库

# yum install mysql-connector-java
 # ln -s /usr/share/java/mysql-connector-java.jar /usr/lib/hive/lib/mysql-connector-java.jar

设置mysql初始密码为bigdata：apache

# mysqladmin -uroot password 'bigdata'

进入数据库后执行以下：vim

CREATE DATABASE metastore;
USE metastore;
SOURCE /usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-1.1.0.mysql.sql;
CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive';
GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'localhost';
GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'%';
FLUSH PRIVILEGES;

注意：建立的用户为 hive，密码为 hive ，你能够按本身须要进行修改。bash

修改 hive-site.xml 文件中如下内容：app

<property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://172.16.57.75:3306/metastore?useUnicode=true&amp;characterEncoding=UTF-8</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

3.配置hive

修改/etc/hadoop/conf/hadoop-env.sh，添加环境变量 HADOOP_MAPRED_HOME，若是不添加，则当你使用 yarn 运行 mapreduce 时候会出现 UNKOWN RPC TYPE 的异常

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

在 hdfs 中建立 hive 数据仓库目录:

hive 的数据仓库在 hdfs 中默认为 /user/hive/warehouse,建议修改其访问权限为 1777，以便其余全部用户均可以建立、访问表，但不能删除不属于他的表。
每个查询 hive 的用户都必须有一个 hdfs 的 home 目录( /user 目录下，如 root 用户的为 /user/root)
hive 所在节点的 /tmp 必须是 world-writable 权限的。

建立目录并设置权限：

# sudo -u hdfs hadoop fs -mkdir /user/hive
# sudo -u hdfs hadoop fs -chown hive /user/hive

# sudo -u hdfs hadoop fs -mkdir /user/hive/warehouse
# sudo -u hdfs hadoop fs -chmod 1777 /user/hive/warehouse
# sudo -u hdfs hadoop fs -chown hive /user/hive/warehouse

修改hive-env设置jdk环境变量 :

# vim /etc/hive/conf/hive-env.sh
export JAVA_HOME=/opt/programs/jdk1.7.0_67

启动hive-server和metastore:

# service hive-metastore start
# service hive-server2 start

四、测试

$ hive -e'create table t(id int);'
$ hive -e'select * from t limit 2;'
$ hive -e'select id from t;'

访问beeline:

$ beeline
beeline> !connect jdbc:hive2://localhost:10000；

五、与hbase集成

先安装 hive-hbase:

# yum install hive-hbase -y

若是你是使用的 cdh4，则须要在 hive shell 里执行如下命令添加 jar：

$ ADD JAR /usr/lib/hive/lib/zookeeper.jar;
$ ADD JAR /usr/lib/hive/lib/hbase.jar;
$ ADD JAR /usr/lib/hive/lib/hive-hbase-handler-<hive_version>.jar
# guava 包的版本以实际版本为准。
$ ADD JAR /usr/lib/hive/lib/guava-11.0.2.jar;

若是你是使用的 cdh5，则须要在 hive shell 里执行如下命令添加 jar：

ADD JAR /usr/lib/hive/lib/zookeeper.jar;
ADD JAR /usr/lib/hive/lib/hive-hbase-handler.jar;
ADD JAR /usr/lib/hbase/lib/guava-12.0.1.jar;
ADD JAR /usr/lib/hbase/hbase-client.jar;
ADD JAR /usr/lib/hbase/hbase-common.jar;
ADD JAR /usr/lib/hbase/hbase-hadoop-compat.jar;
ADD JAR /usr/lib/hbase/hbase-hadoop2-compat.jar;
ADD JAR /usr/lib/hbase/hbase-protocol.jar;
ADD JAR /usr/lib/hbase/hbase-server.jar;

以上你也能够在 hive-site.xml 中经过 hive.aux.jars.path 参数来配置，或者你也能够在 hive-env.sh 中经过 export HIVE_AUX_JARS_PATH= 来设置。

2、安装impala

与Hive相似，Impala也能够直接与HDFS和HBase库直接交互。只不过Hive和其它创建在MapReduce上的框架适合须要长时间运行的批处理任务。例如：那些批量提取，转化，加载（ETL）类型的Job，而Impala主要用于实时查询。

组件分配以下：

172.16.57.74  bd-ops-test-74  impala-state-store impala-catalog impala-server 
172.16.57.75  bd-ops-test-75  impala-server
172.16.57.76  bd-ops-test-76  impala-server
172.16.57.77  bd-ops-test-77  impala-server

一、安装

在74节点安装：

yum install impala-state-store impala-catalog impala-server -y

在7五、7六、77节点上安装：

yum install  impala-server -y

二、配置

2.1修改配置文件

查看安装路径：

# find / -name impala
	/var/run/impala
	/var/lib/alternatives/impala
	/var/log/impala
	/usr/lib/impala
	/etc/alternatives/impala
	/etc/default/impala
	/etc/impala
	/etc/default/impala

impalad的配置文件路径由环境变量IMPALA_CONF_DIR指定，默认为/usr/lib/impala/conf，impala 的默认配置在/etc/default/impala，修改该文件中的 IMPALA_CATALOG_SERVICE_HOST 和 IMPALA_STATE_STORE_HOST

IMPALA_CATALOG_SERVICE_HOST=bd-ops-test-74
IMPALA_STATE_STORE_HOST=bd-ops-test-74
IMPALA_STATE_STORE_PORT=24000
IMPALA_BACKEND_PORT=22000
IMPALA_LOG_DIR=/var/log/impala

IMPALA_CATALOG_ARGS=" -log_dir=${IMPALA_LOG_DIR} -sentry_config=/etc/impala/conf/sentry-site.xml"
IMPALA_STATE_STORE_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}"
IMPALA_SERVER_ARGS=" \
    -log_dir=${IMPALA_LOG_DIR} \
    -use_local_tz_for_unix_timestamp_conversions=true \
    -convert_legacy_hive_parquet_utc_timestamps=true \
    -catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
    -state_store_port=${IMPALA_STATE_STORE_PORT} \
    -use_statestore \
    -state_store_host=${IMPALA_STATE_STORE_HOST} \
    -be_port=${IMPALA_BACKEND_PORT} \
    -server_name=server1\
    -sentry_config=/etc/impala/conf/sentry-site.xml"

ENABLE_CORE_DUMPS=false

# LIBHDFS_OPTS=-Djava.library.path=/usr/lib/impala/lib
# MYSQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar
# IMPALA_BIN=/usr/lib/impala/sbin
# IMPALA_HOME=/usr/lib/impala
# HIVE_HOME=/usr/lib/hive
# HBASE_HOME=/usr/lib/hbase
# IMPALA_CONF_DIR=/etc/impala/conf
# HADOOP_CONF_DIR=/etc/impala/conf
# HIVE_CONF_DIR=/etc/impala/conf
# HBASE_CONF_DIR=/etc/impala/conf

设置 impala 可使用的最大内存：在上面的 IMPALA_SERVER_ARGS 参数值后面添加 -mem_limit=70% 便可。

若是须要设置 impala 中每个队列的最大请求数，须要在上面的 IMPALA_SERVER_ARGS 参数值后面添加 -default_pool_max_requests=-1 ，该参数设置每个队列的最大请求数，若是为-1，则表示不作限制。

在节点74上建立hive-site.xml、core-site.xml、hdfs-site.xml的软连接至/etc/impala/conf目录并做下面修改在hdfs-site.xml文件中添加以下内容：

<property>
    <name>dfs.client.read.shortcircuit</name>
    <value>true</value>
</property>
 
<property>
    <name>dfs.domain.socket.path</name>
    <value>/var/run/hadoop-hdfs/dn._PORT</value>
</property>

<property>
  <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
  <value>true</value>
</property>

同步以上文件到其余节点。

2.2建立socket path

在每一个节点上建立/var/run/hadoop-hdfs:

# mkdir -p /var/run/hadoop-hdfs

2.3用户要求

impala 安装过程当中会建立名为 impala 的用户和组，不要删除该用户和组。

若是想要 impala 和 YARN 和 Llama 合做，须要把 impala 用户加入 hdfs 组。

impala 在执行 DROP TABLE 操做时，须要把文件移到到 hdfs 的回收站，因此你须要建立一个 hdfs 的目录 /user/impala，并将其设置为impala 用户可写。一样的，impala 须要读取 hive 数据仓库下的数据，故须要把 impala 用户加入 hive 组。

impala 不能以 root 用户运行，由于 root 用户不容许直接读。

建立 impala 用户家目录并设置权限：

sudo -u hdfs hadoop fs -mkdir /user/impala
sudo -u hdfs hadoop fs -chown impala /user/impala

查看 impala 用户所属的组：

# groups impala
impala : impala hadoop hdfs hive

由上可知，impala 用户是属于 imapal、hadoop、hdfs、hive 用户组的。

2.4启动服务

在 74节点启动：

# service impala-state-store start
# service impala-catalog start

2.5使用impala-shell

使用impala-shell启动Impala Shell，链接 74，并刷新元数据

#impala-shell 
Starting Impala Shell without Kerberos authentication
Connected to bd-dev-hadoop-70:21000
Server version: impalad version 2.3.0-cdh5.5.1 RELEASE (build 73bf5bc5afbb47aa7eab06cfbf6023ba8cb74f3c)
***********************************************************************************
Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights reserved.
(Impala Shell v2.3.0-cdh5.5.1 (73bf5bc) built on Wed Dec  2 10:39:33 PST 2015)

After running a query, type SUMMARY to see a summary of where time was spent.
***********************************************************************************
[bd-dev-hadoop-70:21000] > invalidate metadata;

当在 Hive 中建立表以后，第一次启动 impala-shell 时，请先执行 INVALIDATE METADATA 语句以便 Impala 识别出新建立的表(在 Impala 1.2 及以上版本，你只须要在一个节点上运行 INVALIDATE METADATA ，而不是在全部的 Impala 节点上运行)。

你也能够添加一些其余参数，查看有哪些参数：

#impala-shell -h
Usage: impala_shell.py [options]

Options:
  -h, --help            show this help message and exit
  -i IMPALAD, --impalad=IMPALAD
                        <host:port> of impalad to connect to
                        [default: bd-dev-hadoop-70:21000]
  -q QUERY, --query=QUERY
                        Execute a query without the shell [default: none]
  -f QUERY_FILE, --query_file=QUERY_FILE
                        Execute the queries in the query file, delimited by ;
                        [default: none]
  -k, --kerberos        Connect to a kerberized impalad [default: False]
  -o OUTPUT_FILE, --output_file=OUTPUT_FILE
                        If set, query results are written to the given file.
                        Results from multiple semicolon-terminated queries
                        will be appended to the same file [default: none]
  -B, --delimited       Output rows in delimited mode [default: False]
  --print_header        Print column names in delimited mode when pretty-
                        printed. [default: False]
  --output_delimiter=OUTPUT_DELIMITER
                        Field delimiter to use for output in delimited mode
                        [default: \t]
  -s KERBEROS_SERVICE_NAME, --kerberos_service_name=KERBEROS_SERVICE_NAME
                        Service name of a kerberized impalad [default: impala]
  -V, --verbose         Verbose output [default: True]
  -p, --show_profiles   Always display query profiles after execution
                        [default: False]
  --quiet               Disable verbose output [default: False]
  -v, --version         Print version information [default: False]
  -c, --ignore_query_failure
                        Continue on query failure [default: False]
  -r, --refresh_after_connect
                        Refresh Impala catalog after connecting
                        [default: False]
  -d DEFAULT_DB, --database=DEFAULT_DB
                        Issues a use database command on startup
                        [default: none]
  -l, --ldap            Use LDAP to authenticate with Impala. Impala must be
                        configured to allow LDAP authentication.
                        [default: False]
  -u USER, --user=USER  User to authenticate with. [default: root]
  --ssl                 Connect to Impala via SSL-secured connection
                        [default: False]
  --ca_cert=CA_CERT     Full path to certificate file used to authenticate
                        Impala's SSL certificate. May either be a copy of
                        Impala's certificate (for self-signed certs) or the
                        certificate of a trusted third-party CA. If not set,
                        but SSL is enabled, the shell will NOT verify Impala's
                        server certificate [default: none]
  --config_file=CONFIG_FILE
                        Specify the configuration file to load options. File
                        must have case-sensitive '[impala]' header. Specifying
                        this option within a config file will have no effect.
                        Only specify this as a option in the commandline.
                        [default: /root/.impalarc]
  --live_summary        Print a query summary every 1s while the query is
                        running. [default: False]
  --live_progress       Print a query progress every 1s while the query is
                        running. [default: False]
  --auth_creds_ok_in_clear
                        If set, LDAP authentication may be used with an
                        insecure connection to Impala. WARNING: Authentication
                        credentials will therefore be sent unencrypted, and
                        may be vulnerable to attack. [default: none]

使用 impala 导出数据：

impala-shell -i '172.16.57.74:21000' -r -q "select * from test" -B --output_delimiter="\t" -o result.txt