关于 Kerberos 的安装和 HDFS 配置 kerberos 认证,请参考 HDFS配置kerberos认证。html
关于 Kerberos 的安装和 YARN 配置 kerberos 认证,请参考 YARN配置kerberos认证。java
关于 Kerberos 的安装和 Hive 配置 kerberos 认证,请参考 Hive配置kerberos认证。node
请先完成 HDFS 、YARN、Hive 配置 Kerberos 认证,再来配置 Impala 集成 Kerberos 认证 !python
参考 使用yum安装CDH Hadoop集群 安装 hadoop 集群,集群包括三个节点,每一个节点的ip、主机名和部署的组件分配以下:android
192.168.56.121 cdh1 NameNode、Hive、ResourceManager、HBase、impala-state-store、impala-catalog、Kerberos Server 192.168.56.122 cdh2 DataNode、SSNameNode、NodeManager、HBase、impala-server 192.168.56.123 cdh3 DataNode、HBase、NodeManager、impala-server
注意:hostname 请使用小写,要否则在集成 kerberos 时会出现一些错误。ios
在每一个节点上运行下面的命令:git
$ yum install python-devel openssl-devel python-pip cyrus-sasl cyrus-sasl-gssapi cyrus-sasl-devel -y
github
$ pip-python install ssl
sql
在 cdh1 节点,即 KDC server 节点上执行下面命令:shell
$ cd /var/kerberos/krb5kdc/
kadmin.local -q "addprinc -randkey impala/cdh1@JAVACHEN.COM "
kadmin.local -q "addprinc -randkey impala/cdh2@JAVACHEN.COM "
kadmin.local -q "addprinc -randkey impala/cdh3@JAVACHEN.COM "
kadmin.local -q "xst -k impala-unmerge.keytab impala/cdh1@JAVACHEN.COM "
kadmin.local -q "xst -k impala-unmerge.keytab impala/cdh2@JAVACHEN.COM "
kadmin.local -q "xst -k impala-unmerge.keytab impala/cdh3@JAVACHEN.COM "
另外,若是你使用了haproxy来作负载均衡,参考官方文档Using Impala through a Proxy for High Availability,还需生成 proxy.keytab:
$ cd /var/kerberos/krb5kdc/
# proxy 为安装了 haproxy 的机器
kadmin.local -q "addprinc -randkey impala/proxy@JAVACHEN.COM "
kadmin.local -q "xst -k proxy.keytab impala/proxy@JAVACHEN.COM "
合并 proxy.keytab 和 impala-unmerge.keytab 生成 impala.keytab:
$ ktutil
ktutil: rkt proxy.keytab
ktutil: rkt impala-unmerge.keytab
ktutil: wkt impala.keytab
ktutil: quit
拷贝 impala.keytab 和 proxy_impala.keytab 文件到其余节点的 /etc/impala/conf 目录
$ scp impala.keytab cdh1:/etc/impala/conf
$ scp impala.keytab cdh2:/etc/impala/conf
$ scp impala.keytab cdh3:/etc/impala/conf
并设置权限,分别在 cdh一、cdh二、cdh3 上执行:
$ ssh cdh1 "cd /etc/impala/conf/;chown impala:hadoop *.keytab ;chmod 400 *.keytab"
$ ssh cdh2 "cd /etc/impala/conf/;chown impala:hadoop *.keytab ;chmod 400 *.keytab"
$ ssh cdh3 "cd /etc/impala/conf/;chown impala:hadoop *.keytab ;chmod 400 *.keytab"
因为 keytab 至关于有了永久凭证,不须要提供密码(若是修改 kdc 中的 principal 的密码,则该 keytab 就会失效),因此其余用户若是对该文件有读权限,就能够冒充 keytab 中指定的用户身份访问 hadoop,因此 keytab 文件须要确保只对 owner 有读权限(0400)
修改 cdh1 节点上的 /etc/default/impala,在 IMPALA_CATALOG_ARGS
、IMPALA_SERVER_ARGS
和 IMPALA_STATE_STORE_ARGS
中添加下面参数:
-kerberos_reinit_interval=60
-principal=impala/_HOST@JAVACHEN.COM
-keytab_file=/etc/impala/conf/impala.keytab
若是使用了 HAProxy(关于 HAProxy 的配置请参考 Hive使用HAProxy配置HA),则 IMPALA_SERVER_ARGS
参数须要修改成(proxy为 HAProxy 机器的名称,这里我是将 HAProxy 安装在 cdh1 节点上):
-kerberos_reinit_interval=60
-be_principal=impala/_HOST@JAVACHEN.COM
-principal=impala/proxy@JAVACHEN.COM
-keytab_file=/etc/impala/conf/impala.keytab
在 IMPALA_CATALOG_ARGS
中添加:
-state_store_host=${IMPALA_STATE_STORE_HOST} \
将修改的上面文件同步到其余节点。最后,/etc/default/impala 文件以下,这里,为了不 hostname 存在大写的状况,使用 hostname
变量替换 _HOST
:
IMPALA_CATALOG_SERVICE_HOST=cdh1
IMPALA_STATE_STORE_HOST=cdh1
IMPALA_STATE_STORE_PORT=24000
IMPALA_BACKEND_PORT=22000
IMPALA_LOG_DIR=/var/log/impala
IMPALA_MEM_DEF=$(free -m |awk 'NR==2{print $2-5120}')
hostname=`hostname -f |tr "[:upper:]" "[:lower:]"`
IMPALA_CATALOG_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_host=${IMPALA_STATE_STORE_HOST} \
-kerberos_reinit_interval=60\
-principal=impala/${hostname}@JAVACHEN.COM \
-keytab_file=/etc/impala/conf/impala.keytab
"
IMPALA_STATE_STORE_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}\
-statestore_subscriber_timeout_seconds=15 \
-kerberos_reinit_interval=60 \
-principal=impala/${hostname}@JAVACHEN.COM \
-keytab_file=/etc/impala/conf/impala.keytab
"
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-use_statestore \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT} \
-kerberos_reinit_interval=60 \
-be_principal=impala/${hostname}@JAVACHEN.COM \
-principal=impala/cdh1@JAVACHEN.COM \
-keytab_file=/etc/impala/conf/impala.keytab \
-mem_limit=${IMPALA_MEM_DEF}m
"
ENABLE_CORE_DUMPS=false
将修改的上面文件同步到其余节点:cdh二、cdh3:
$ scp /etc/default/impala cdh2:/etc/default/impala
$ scp /etc/default/impala cdh3:/etc/default/impala
更新 impala 配置文件下的文件并同步到其余节点:
cp /etc/hadoop/conf/core-site.xml /etc/impala/conf/
cp /etc/hadoop/conf/hdfs-site.xml /etc/impala/conf/
cp /etc/hive/conf/hive-site.xml /etc/impala/conf/
scp -r /etc/impala/conf cdh2:/etc/impala
scp -r /etc/impala/conf cdh3:/etc/impala
impala-state-store 是经过 impala 用户启动的,故在 cdh1 上先获取 impala 用户的 ticket 再启动服务:
$ kinit -k -t /etc/impala/conf/impala.keytab impala/cdh1@JAVACHEN.COM
$ service impala-state-store start
而后查看日志,确认是否启动成功。
$ tailf /var/log/impala/statestored.INFO
impala-catalog 是经过 impala 用户启动的,故在 cdh1 上先获取 impala 用户的 ticket 再启动服务:
$ kinit -k -t /etc/impala/conf/impala.keytab impala/cdh1@JAVACHEN.COM
$ service impala-catalog start
而后查看日志,确认是否启动成功。
$ tailf /var/log/impala/catalogd.INFO
impala-server 是经过 impala 用户启动的,故在 cdh1 上先获取 impala 用户的 ticket 再启动服务:
$ kinit -k -t /etc/impala/conf/impala.keytab impala/cdh1@JAVACHEN.COM
$ service impala-server start
而后查看日志,确认是否启动成功。
$ tailf /var/log/impala/impalad.INFO
在启用了 kerberos 以后,运行 impala-shell 时,须要添加 -k
参数:
$ impala-shell -k
Starting Impala Shell using Kerberos authentication
Using service name 'impala'
Connected to cdh1:21000
Server version: impalad version 1.3.1-cdh4 RELEASE (build 907481bf45b248a7bb3bb077d54831a71f484e5f)
Welcome to the Impala shell. Press TAB twice to see a list of available commands.
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Shell build version: Impala Shell v1.3.1-cdh4 (907481b) built on Wed Apr 30 14:23:48 PDT 2014)
[cdh1:21000] >
[cdh1:21000] > show tables;
Query: show tables
+------+
| name |
+------+
| a |
| b |
| c |
| d |
+------+
Returned 4 row(s) in 0.08s
若是出现下面异常:
[cdh1:21000] > select * from test limit 10; Query: select * from test limit 10 ERROR: AnalysisException: Failed to load metadata for table: default.test CAUSED BY: TableLoadingException: Failed to load metadata for table: test CAUSED BY: TTransportException: java.net.SocketTimeoutException: Read timed out CAUSED BY: SocketTimeoutException: Read timed out
则须要在 hive-site.xml 中添加下面参数:
<property>
<name>hive.metastore.client.socket.timeout</name>
<value>3600</value>
</property>
本文主要记录 CDH 5.2 Hadoop 集群中配置 Impala 和 Hive 集成 Sentry 的过程,包括 Sentry 的安装、配置以及和 Impala、Hive 集成后的测试。
使用 Sentry 来管理集群的权限,须要先在集群上配置好 Kerberos。
关于 Hadoop 集群上配置 kerberos 以及 ldap 的过程请参考本博客如下文章:
Sentry 会安装在三个节点的 hadoop 集群上,每一个节点的ip、主机名和部署的组件分配以下:
192.168.56.121 cdh1 NameNode、Hive、ResourceManager、HBase、impala-state-store、impala-catalog、Kerberos Server、sentry-store 192.168.56.122 cdh2 DataNode、SSNameNode、NodeManager、HBase、impala-server 192.168.56.123 cdh3 DataNode、HBase、NodeManager、impala-server
Sentry 的使用有两种方式,一是基于文件的存储方式(SimpleFileProviderBackend),一是基于数据库的存储方式(SimpleDbProviderBackend),若是使用基于文件的存储则只须要安装sentry
,不然还须要安装 sentry-store
。
在 cdh1 节点上安装 sentry-store 服务:
yum install sentry sentry-store -y
修改 Sentry 的配置文件 /etc/sentry/conf/sentry-store-site.xml
,下面的配置参考了 Sentry源码中的配置例子:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>sentry.service.admin.group</name>
<value>impala,hive,hue</value>
</property>
<property>
<name>sentry.service.allow.connect</name>
<value>impala,hive,hue</value>
</property>
<property>
<name>sentry.verify.schema.version</name>
<value>true</value>
</property>
<property>
<name>sentry.service.server.rpc-address</name>
<value>cdh1</value>
</property>
<property>
<name>sentry.service.server.rpc-port</name>
<value>8038</value>
</property>
<property>
<name>sentry.store.jdbc.url</name>
<value>jdbc:postgresql://cdh1/sentry</value>
</property>
<property>
<name>sentry.store.jdbc.driver</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>sentry.store.jdbc.user</name>
<value>sentry</value>
</property>
<property>
<name>sentry.store.jdbc.password</name>
<value>redhat</value>
</property>
<property>
<name>sentry.hive.server</name>
<value>server1</value>
</property>
<property>
<name>sentry.store.group.mapping</name>
<value>org.apache.sentry.provider.common.HadoopGroupMappingService</value>
</property>
</configuration>
建立数据库,请参考 Hadoop自动化安装shell脚本:
yum install postgresql-server postgresql-jdbc -y
ln -s /usr/share/java/postgresql-jdbc.jar /usr/lib/hive/lib/postgresql-jdbc.jar
ln -s /usr/share/java/postgresql-jdbc.jar /usr/lib/sentry/lib/postgresql-jdbc.jar
su -c "cd ; /usr/bin/pg_ctl start -w -m fast -D /var/lib/pgsql/data" postgres
su -c "cd ; /usr/bin/psql --command \"create user sentry with password 'redhat'; \" " postgres
su -c "cd ; /usr/bin/psql --command \"CREATE DATABASE sentry owner=sentry;\" " postgres
su -c "cd ; /usr/bin/psql --command \"GRANT ALL privileges ON DATABASE sentry TO sentry;\" " postgres
su -c "cd ; /usr/bin/psql -U sentry -d sentry -f /usr/lib/sentry/scripts/sentrystore/upgrade/sentry-postgres-1.4.0-cdh5.sql" postgres
su -c "cd ; /usr/bin/pg_ctl restart -w -m fast -D /var/lib/pgsql/data" postgres
/var/lib/pgsql/data/pg_hba.conf 内容以下:
# TYPE DATABASE USER CIDR-ADDRESS METHOD # "local" is for Unix domain socket connections only local all all md5 # IPv4 local connections: #host all all 0.0.0.0/0 trust host all all 127.0.0.1/32 md5 # IPv6 local connections: #host all all ::1/128 nd5
若是集群开启了 Kerberos 验证,则须要在该节点上生成 Sentry 服务的 principal 并导出为 ticket:
$ cd /etc/sentry/conf
kadmin.local -q "addprinc -randkey sentry/cdh1@JAVACHEN.COM "
kadmin.local -q "xst -k sentry.keytab sentry/cdh1@JAVACHEN.COM "
chown sentry:hadoop sentry.keytab ; chmod 400 *.keytab
而后,在/etc/sentry/conf/sentry-store-site.xml 中添加以下内容:
<property>
<name>sentry.service.security.mode</name>
<value>kerberos</value>
</property>
<property>
<name>sentry.service.server.principal</name>
<value>sentry/cdh1@JAVACHEN.COM</value>
</property>
<property>
<name>sentry.service.server.keytab</name>
<value>/etc/sentry/conf/sentry.keytab</value>
</property>
参考 Securing Impala for analysts,准备测试数据:
$ cat /tmp/events.csv
10.1.2.3,US,android,createNote
10.200.88.99,FR,windows,updateNote
10.1.2.3,US,android,updateNote
10.200.88.77,FR,ios,createNote
10.1.4.5,US,windows,updateTag
$ hive -S
hive> create database sensitive;
hive> create table sensitive.events (
ip STRING, country STRING, client STRING, action STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
hive> load data local inpath '/tmp/events.csv' overwrite into table sensitive.events;
hive> create database filtered;
hive> create view filtered.events as select country, client, action from sensitive.events;
hive> create view filtered.events_usonly as
select * from filtered.events where country = 'US';
在使用 Sentry 时,有以下要求:
一、须要修改 /user/hive/warehouse
权限:
hdfs dfs -chmod -R 770 /user/hive/warehouse
hdfs dfs -chown -R hive:hive /user/hive/warehouse
二、修改 hive-site.xml 文件,关掉 HiveServer2 impersonation
三、taskcontroller.cfg 文件中确保 min.user.id=0
。
修改 hive-site.xml,添加以下:
<property>
<name>hive.security.authorization.task.factory</name>
<value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>
</property>
<property>
<name>hive.server2.session.hook</name>
<value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>
</property>
<property>
<name>hive.sentry.conf.url</name>
<value>file:///etc/hive/conf/sentry-site.xml</value>
</property>
在 /etc/hive/conf/ 目录建立 sentry-site.xml:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>sentry.service.client.server.rpc-port</name>
<value>8038</value>
</property>
<property>
<name>sentry.service.client.server.rpc-address</name>
<value>cdh1</value>
</property>
<property>
<name>sentry.service.client.server.rpc-connection-timeout</name>
<value>200000</value>
</property>
<property>
<name>sentry.service.security.mode</name>
<value>kerberos</value>
</property>
<property>
<name>sentry.service.server.principal</name>
<value>sentry/_HOST@JAVACHEN.COM</value>
</property>
<property>
<name>sentry.service.server.keytab</name>
<value>/etc/sentry/conf/sentry.keytab</value>
</property>
<property>
<name>sentry.hive.provider</name>
<value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value>
</property>
<property>
<name>sentry.hive.provider.backend</name>
<value>org.apache.sentry.provider.db.SimpleDBProviderBackend</value>
</property>
<property>
<name>sentry.hive.server</name>
<value>server1</value>
</property>
<property>
<name>sentry.metastore.service.users</name>
<value>hive</value>
</property>
<property>
<name>sentry.hive.testing.mode</name>
<value>false</value>
</property>
</configuration>
在 beeline 中经过 hive(注意,在 sentry 中 hive 为管理员用户)的 ticket 链接 hive-server2,建立 role、group 等等,执行下面语句:
create role admin_role;
GRANT ALL ON SERVER server1 TO ROLE admin_role;
GRANT ROLE admin_role TO GROUP admin;
GRANT ROLE admin_role TO GROUP hive;
create role test_role;
GRANT ALL ON DATABASE filtered TO ROLE test_role;
GRANT ALL ON DATABASE sensitive TO ROLE test_role;
GRANT ROLE test_role TO GROUP test;
上面建立了两个角色,一个是 admin_role,具备管理员权限,能够读写全部数据库,并受权给 admin 和 hive 组(对应操做系统上的组);一个是 test_role,只能读写 filtered 和 sensitive 数据库,并受权给 test 组
在 ldap 服务器上建立系统用户 yy_test,并使用 migrationtools 工具将该用户导入 ldap,最后设置 ldap 中该用户密码。
# 建立 yy_test用户
useradd yy_test
grep -E "yy_test" /etc/passwd >/opt/passwd.txt
/usr/share/migrationtools/migrate_passwd.pl /opt/passwd.txt /opt/passwd.ldif
ldapadd -x -D "uid=ldapadmin,ou=people,dc=lashou,dc=com" -w secret -f /opt/passwd.ldif
#使用下面语句修改密码,填入上面生成的密码,输入两次:
ldappasswd -x -D 'uid=ldapadmin,ou=people,dc=lashou,dc=com' -w secret "uid=yy_test,ou=people,dc=lashou,dc=com" -S
在每台 datanode 机器上建立 test 分组,并将 yy_test 用户加入到 test 分组:
groupadd test ; useradd yy_test; usermod -G test,yy_test yy_test
经过 beeline 链接 hive-server2,进行测试:
# 切换到 test 用户进行测试
$ su test
$ kinit -k -t test.keytab test/cdh1@JAVACHEN.COM
$ beeline -u "jdbc:hive2://cdh1:10000/default;principal=test/cdh1@JAVACHEN.COM"
修改 /etc/default/impala 文件中的 IMPALA_SERVER_ARGS
参数,添加:
-server_name=server1
-sentry_config=/etc/impala/conf/sentry-site.xml
在 IMPALA_CATALOG_ARGS
中添加:
-sentry_config=/etc/impala/conf/sentry-site.xml
注意:server1 必须和 sentry-provider.ini 文件中的保持一致。
IMPALA_SERVER_ARGS
参数最后以下:
hostname=`hostname -f |tr "[:upper:]" "[:lower:]"`
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-use_statestore \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-kerberos_reinit_interval=60 \
-principal=impala/${hostname}@JAVACHEN.COM \
-keytab_file=/etc/impala/conf/impala.keytab \
-enable_ldap_auth=true -ldap_uri=ldaps://cdh1 -ldap_baseDN=ou=people,dc=javachen,dc=com \
-server_name=server1 \
-sentry_config=/etc/impala/conf/sentry-site.xml \
-be_port=${IMPALA_BACKEND_PORT} -default_pool_max_requests=-1 -mem_limit=60%"
建立 /etc/impala/conf/sentry-site.xml 内容以下:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>sentry.service.client.server.rpc-port</name>
<value>8038</value>
</property>
<property>
<name>sentry.service.client.server.rpc-address</name>
<value>cdh1</value>
</property>
<property>
<name>sentry.service.client.server.rpc-connection-timeout</name>
<value>200000</value>
</property>
<property>
<name>sentry.service.security.mode</name>
<value>kerberos</value>
</property>
<property>
<name>sentry.service.server.principal</name>
<value>sentry/_HOST@JAVACHEN.COM</value>
</property>
<property>
<name>sentry.service.server.keytab</name>
<value>/etc/sentry/conf/sentry.keytab</value>
</property>
</configuration>
请参考下午基于文件存储方式中 impala 的测试。
在 hive 的 /etc/hive/conf 目录下建立 sentry-site.xml 文件,内容以下:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>hive.sentry.server</name>
<value>server1</value>
</property>
<property>
<name>sentry.hive.provider.backend</name>
<value>org.apache.sentry.provider.file.SimpleFileProviderBackend</value>
</property>
<property>
<name>hive.sentry.provider</name>
<value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value>
</property>
<property>
<name>hive.sentry.provider.resource</name>
<value>/user/hive/sentry/sentry-provider.ini</value>
</property>
</configuration>
建立 sentry-provider.ini 文件并将其上传到 hdfs 的 /user/hive/sentry/
目录:
$ cat /tmp/sentry-provider.ini
[databases]
# Defines the location of the per DB policy file for the customers DB/schema
#db1 = hdfs://cdh1:8020/user/hive/sentry/db1.ini
[groups]
admin = any_operation
hive = any_operation
test = select_filtered
[roles]
any_operation = server=server1->db=*->table=*->action=*
select_filtered = server=server1->db=filtered->table=*->action=SELECT
select_us = server=server1->db=filtered->table=events_usonly->action=SELECT
[users]
test = test
hive= hive
$ hdfs dfs -rm -r /user/hive/sentry/sentry-provider.ini
$ hdfs dfs -put /tmp/sentry-provider.ini /user/hive/sentry/
$ hdfs dfs -chown hive:hive /user/hive/sentry/sentry-provider.ini
$ hdfs dfs -chmod 640 /user/hive/sentry/sentry-provider.ini
关于 sentry-provider.ini 文件的语法说明,请参考官方文档。这里我指定了 Hive 组有所有权限,并指定 Hive 用户属于 Hive 分组,而其余两个分组只有部分权限。
而后在 hive-site.xml 中添加以下配置:
<property>
<name>hive.security.authorization.task.factory</name>
<value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>
</property>
<property>
<name>hive.server2.session.hook</name>
<value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>
</property>
<property>
<name>hive.sentry.conf.url</name>
<value>file:///etc/hive/conf/sentry-site.xml</value>
</property>
将配置文件同步到其余节点,并重启 hive-server2 服务。
这里,我集群中 hive-server2 开启了 kerberos 认证,故经过 hive 用户来链接 hive-server2。
$ kinit -k -t /etc/hive/conf/hive.keytab hive/cdh1@JAVACHEN.COM
$ beeline -u "jdbc:hive2://cdh1:10000/default;principal=hive/cdh1@JAVACHEN.COM"
scan complete in 10ms
Connecting to jdbc:hive2://cdh1:10000/default;principal=hive/cdh1@JAVACHEN.COM
Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.2.0 by Apache Hive
5 rows selected (0.339 seconds)
0: jdbc:hive2://cdh1:10000/default> show databases;
+----------------+--+
| database_name |
+----------------+--+
| default |
| filtered |
| sensitive |
+----------------+--+
10 rows selected (0.145 seconds)
0: jdbc:hive2://cdh1:10000/default> use filtered
No rows affected (0.132 seconds)
0: jdbc:hive2://cdh1:10000/default> show tables;
+----------------+--+
| tab_name |
+----------------+--+
| events |
| events_usonly |
+----------------+--+
2 rows selected (0.158 seconds)
0: jdbc:hive2://cdh1:10000/default> use sensitive;
No rows affected (0.115 seconds)
0: jdbc:hive2://cdh1:10000/default> show tables;
+-----------+--+
| tab_name |
+-----------+--+
| events |
+-----------+--+
1 row selected (0.148 seconds)
修改 /etc/default/impala 文件中的 IMPALA_SERVER_ARGS
参数,添加:
-server_name=server1
-authorization_policy_file=/user/hive/sentry/sentry-provider.ini
-authorization_policy_provider_class=org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider
注意:server1 必须和 sentry-provider.ini 文件中的保持一致。
IMPALA_SERVER_ARGS
参数最后以下:
hostname=`hostname -f |tr "[:upper:]" "[:lower:]"`
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-use_statestore \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT} \
-server_name=server1 \
-authorization_policy_file=/user/hive/sentry/sentry-provider.ini \
-authorization_policy_provider_class=org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider \
-enable_ldap_auth=true -ldap_uri=ldaps://cdh1 -ldap_baseDN=ou=people,dc=javachen,dc=com \
-kerberos_reinit_interval=60 \
-principal=impala/${hostname}@JAVACHEN.COM \
-keytab_file=/etc/impala/conf/impala.keytab \
"
重启 impala-server 服务,而后进行测试。由于我这里 impala-server 集成了 kerberos 和 ldap,如今经过 ldap 来进行测试。
先经过 ldap 的 test 用户来测试:
impala-shell -l -u test
Starting Impala Shell using LDAP-based authentication
LDAP password for test:
Connected to cdh1:21000
Server version: impalad version 2.0.0-cdh5 RELEASE (build ecf30af0b4d6e56ea80297df2189367ada6b7da7)
Welcome to the Impala shell. Press TAB twice to see a list of available commands.
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Shell build version: Impala Shell v2.0.0-cdh5 (ecf30af) built on Sat Oct 11 13:56:06 PDT 2014)
[cdh1:21000] > show databases;
Query: show databases
+---------+
| name |
+---------+
| default |
+---------+
Fetched 1 row(s) in 0.11s
[cdh1:21000] > show tables;
Query: show tables
ERROR: AuthorizationException: User 'test' does not have privileges to access: default.*
[cdh1:21000] >
能够看到 test 用户没有权限查看和数据库,这是由于 sentry-provider.ini 文件中并无给 test 用户分配任何权限。
下面使用 hive 用户来测试。使用下面命令在 ldap 中建立 hive 用户和组并给 hive 用户设置密码。
$ grep hive /etc/passwd >/opt/passwd.txt
$ /usr/share/migrationtools/migrate_passwd.pl /opt/passwd.txt /opt/passwd.ldif
$ ldapadd -x -D "uid=ldapadmin,ou=people,dc=javachen,dc=com" -w secret -f /opt/passwd.ldif
$ grep hive /etc/group >/opt/group.txt
$ /usr/share/migrationtools/migrate_group.pl /opt/group.txt /opt/group.ldif
$ ldapadd -x -D "uid=ldapadmin,ou=people,dc=javachen,dc=com" -w secret -f /opt/group.ldif
# 修改 ldap 中 hive 用户密码
$ ldappasswd -x -D 'uid=ldapadmin,ou=people,dc=javachen,dc=com' -w secret "uid=hive,ou=people,dc=javachen,dc=com" -S
而后,使用 hive 用户测试:
$ impala-shell -l -u hive Starting Impala Shell using LDAP-based authentication LDAP password for hive: Connected to cdh1:21000 Server version: impalad version 2.0.0-cdh5 RELEASE (build ecf30af0b4d6e56ea80297df2189367ada6b7da7) Welcome to the Impala shell. Press TAB twice to see a list of available commands. Copyright (c) 2012 Cloudera, Inc. All rights reserved. (Shell build version: Impala Shell v2.0.0-cdh5 (ecf30af) built on Sat Oct 11 13:56:06 PDT 2014) [cdh1:21000] > show databases; Query: show databases +------------------+ | name | +------------------+ | _impala_builtins | | default | | filtered | | sensitive | +------------------+ Fetched 11 row(s) in 0.11s [cdh1:21000] > use sensitive; Query: use sensitive [cdh1:21000] > show tables; Query: show tables +--------+ | name | +--------+ | events | +--------+ Fetched 1 row(s) in 0.11s [cdh1:21000] > select * from events; Query: select * from events +--------------+---------+---------+------------+ | ip | country | client | action | +--------------+---------+---------+------------+ | 10.1.2.3 | US | android | createNote | | 10.200.88.99 | FR | windows | updateNote | | 10.1.2.3 | US | android | updateNote | | 10.200.88.77 | FR | ios | createNote | | 10.1.4.5 | US | windows | updateTag | +--------------+---------+---------+------------+ Fetched 5 row(s) in 0.76s
一样,你还可使用其余用户来测试。
也可使用 beeline 来链接 impala-server 来进行测试:
$ beeline -u "jdbc:hive2://cdh1:21050/default;" -n test -p test
scan complete in 2ms
Connecting to jdbc:hive2://cdh1:21050/default;
Connected to: Impala (version 2.0.0-cdh5)
Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.2.0 by Apache Hive
0: jdbc:hive2://cdh1:21050/default>
SHOW CREATE table hue.auth_permission;
ALTER TABLE hue.auth_permission DROP FOREIGN KEY content_type_id_refs_id_id value;
DELETE FROM hue.django_content_type;
ALTER TABLE hue.auth_permission ADD FOREIGN KEY (content_type_id) REFERENCES django_content_type (id);
为HDFS定义URI时,还必须指定NameNode。例如:
GRANT ALL ON URI文件:/// path / to / dir TO <role>
GRANT ALL ON URI hdfs:// namenode:port / path / to / dir TO <role>
GRANT ALL ON URI hdfs:// ha -nn-uri / path / to / dir TO <role>
管理用户的权限示例
在此示例中,SQL语句授予 entire_server 角色服务器中的数据库和URI的全部特权。
CREATE ROLE whole_server;
GRANT ROLE whole_server TO GROUP admin_group;
GRANT ALL ON SERVER server1 TO ROLE whole_server;
具备特定数据库和表的权限的用户
若是用户具备特定数据库中特定表的权限,则用户能够访问这些内容,但不能访问其余内容。他们能够在输出中看到表及其父数据库 显示表格 和 显示数据库, 使用 适当的数据库,并执行相关的行动(选择 和/或 插)基于表权限。要实际建立表须要全部 数据库级别的权限,所以您能够为用户设置单独的角色,以设置架构以及对表执行平常操做的其余用户或应用程序。
CREATE ROLE one_database;
GRANT ROLE one_database TO GROUP admin_group;
GRANT ALL ON DATABASE db1 TO ROLE one_database;
CREATE ROLE instructor;
GRANT ROLE instructor TO GROUP trainers;
GRANT ALL ON TABLE db1.lesson TO ROLE instructor;
# This particular course is all about queries, so the students can SELECT but not INSERT or CREATE/DROP.
CREATE ROLE student;
GRANT ROLE student TO GROUP visitors;
GRANT SELECT ON TABLE db1.training TO ROLE student;
使用外部数据文件的权限
经过数据插入数据时 负载数据 语句,或从普通Impala数据库目录以外的HDFS位置引用,用户还须要对与这些HDFS位置对应的URI的适当权限。
在这个例子中:
该 external_table 角色能够插入并查询Impala表, external_table.sample。
该 STAGING_DIR角色能够指定HDFS路径/用户/ Cloudera的/ external_data与负载数据声明。当Impala查询或加载数据文件时,它会对该目录中的全部文件进行操做,而不只仅是单个文件,所以任何Impala均可以位置 参数指的是目录而不是单个文件。
CREATE ROLE external_table;
GRANT ROLE external_table TO GROUP cloudera;
GRANT ALL ON TABLE external_table.sample TO ROLE external_table;
CREATE ROLE staging_dir;
GRANT ROLE staging TO GROUP cloudera;
GRANT ALL ON URI'hdfs://127.0.0.1:8020 / user / cloudera / external_data'TO ROLE staging_dir;
将管理员职责与读写权限分开
要建立数据库,您须要该数据库的彻底权限,而对该数据库中的表的平常操做能够在特定表上使用较低级别的权限执行。所以,您能够为每一个数据库或应用程序设置单独的角色:能够建立或删除数据库的管理角色,以及只能访问相关表的用户级角色。
在此示例中,职责分为3个不一样组中的用户:
CREATE ROLE training_sysadmin;
GRANT ROLE training_sysadmin TO GROUP supergroup;
GRANT ALL ON DATABASE training1 TO ROLE training_sysadmin;
CREATE ROLE instructor;
GRANT ROLE instructor TO GROUP cloudera;
GRANT ALL ON TABLE training1.course1 TO ROLE instructor;
CREATE ROLE visitor;
GRANT ROLE student TO GROUP visitor;
GRANT SELECT ON TABLE training1.course1 TO ROLE student;
server=server_name->db=database_name->table=table_name->action=SELECT
server=server_name->db=database_name->table=table_name->action=ALL
server=impala-host.example.com->db=default->table=t1->action=SELECT server=impala-host.example.com->db=*->table=audit_log->action=SELECT server=impala-host.example.com->db=default->table=t1->action=*