Impala和Hive集成Sentry、Kerberos认证

关于 Kerberos 的安装和 HDFS 配置 kerberos 认证,请参考 HDFS配置kerberos认证html

关于 Kerberos 的安装和 YARN 配置 kerberos 认证,请参考 YARN配置kerberos认证java

关于 Kerberos 的安装和 Hive 配置 kerberos 认证,请参考 Hive配置kerberos认证node

请先完成 HDFS 、YARN、Hive 配置 Kerberos 认证,再来配置 Impala 集成 Kerberos 认证 !python

参考 使用yum安装CDH Hadoop集群 安装 hadoop 集群,集群包括三个节点,每一个节点的ip、主机名和部署的组件分配以下:android

192.168.56.121        cdh1     NameNode、Hive、ResourceManager、HBase、impala-state-store、impala-catalog、Kerberos Server
192.168.56.122        cdh2     DataNode、SSNameNode、NodeManager、HBase、impala-server
192.168.56.123        cdh3     DataNode、HBase、NodeManager、impala-server

注意:hostname 请使用小写,要否则在集成 kerberos 时会出现一些错误。ios

1. 安装必须的依赖

在每一个节点上运行下面的命令:git

 
  1. $ yum install python-devel openssl-devel python-pip cyrus-sasl cyrus-sasl-gssapi cyrus-sasl-devel -ygithub

  2. $ pip-python install sslsql

2. 生成 keytab

在 cdh1 节点,即 KDC server 节点上执行下面命令:shell

 
  1. $ cd /var/kerberos/krb5kdc/

  2.  
  3. kadmin.local -q "addprinc -randkey impala/cdh1@JAVACHEN.COM "

  4. kadmin.local -q "addprinc -randkey impala/cdh2@JAVACHEN.COM "

  5. kadmin.local -q "addprinc -randkey impala/cdh3@JAVACHEN.COM "

  6.  
  7. kadmin.local -q "xst -k impala-unmerge.keytab impala/cdh1@JAVACHEN.COM "

  8. kadmin.local -q "xst -k impala-unmerge.keytab impala/cdh2@JAVACHEN.COM "

  9. kadmin.local -q "xst -k impala-unmerge.keytab impala/cdh3@JAVACHEN.COM "

另外,若是你使用了haproxy来作负载均衡,参考官方文档Using Impala through a Proxy for High Availability,还需生成 proxy.keytab:

 
  1. $ cd /var/kerberos/krb5kdc/

  2.  
  3. # proxy 为安装了 haproxy 的机器

  4. kadmin.local -q "addprinc -randkey impala/proxy@JAVACHEN.COM "

  5.  
  6. kadmin.local -q "xst -k proxy.keytab impala/proxy@JAVACHEN.COM "

合并 proxy.keytab 和 impala-unmerge.keytab 生成 impala.keytab:

 
  1. $ ktutil

  2. ktutil: rkt proxy.keytab

  3. ktutil: rkt impala-unmerge.keytab

  4. ktutil: wkt impala.keytab

  5. ktutil: quit

拷贝 impala.keytab 和 proxy_impala.keytab 文件到其余节点的 /etc/impala/conf 目录

 
  1. $ scp impala.keytab cdh1:/etc/impala/conf

  2. $ scp impala.keytab cdh2:/etc/impala/conf

  3. $ scp impala.keytab cdh3:/etc/impala/conf

并设置权限,分别在 cdh一、cdh二、cdh3 上执行:

 
  1. $ ssh cdh1 "cd /etc/impala/conf/;chown impala:hadoop *.keytab ;chmod 400 *.keytab"

  2. $ ssh cdh2 "cd /etc/impala/conf/;chown impala:hadoop *.keytab ;chmod 400 *.keytab"

  3. $ ssh cdh3 "cd /etc/impala/conf/;chown impala:hadoop *.keytab ;chmod 400 *.keytab"

因为 keytab 至关于有了永久凭证,不须要提供密码(若是修改 kdc 中的 principal 的密码,则该 keytab 就会失效),因此其余用户若是对该文件有读权限,就能够冒充 keytab 中指定的用户身份访问 hadoop,因此 keytab 文件须要确保只对 owner 有读权限(0400)

3. 修改 impala 配置文件

修改 cdh1 节点上的 /etc/default/impala,在 IMPALA_CATALOG_ARGS 、IMPALA_SERVER_ARGS 和 IMPALA_STATE_STORE_ARGS 中添加下面参数:

 
  1. -kerberos_reinit_interval=60

  2. -principal=impala/_HOST@JAVACHEN.COM

  3. -keytab_file=/etc/impala/conf/impala.keytab

若是使用了 HAProxy(关于 HAProxy 的配置请参考 Hive使用HAProxy配置HA),则 IMPALA_SERVER_ARGS 参数须要修改成(proxy为 HAProxy 机器的名称,这里我是将 HAProxy 安装在 cdh1 节点上):

 
  1. -kerberos_reinit_interval=60

  2. -be_principal=impala/_HOST@JAVACHEN.COM

  3. -principal=impala/proxy@JAVACHEN.COM

  4. -keytab_file=/etc/impala/conf/impala.keytab

在 IMPALA_CATALOG_ARGS 中添加:

-state_store_host=${IMPALA_STATE_STORE_HOST} \

将修改的上面文件同步到其余节点。最后,/etc/default/impala 文件以下,这里,为了不 hostname 存在大写的状况,使用 hostname 变量替换 _HOST

 
  1. IMPALA_CATALOG_SERVICE_HOST=cdh1

  2. IMPALA_STATE_STORE_HOST=cdh1

  3. IMPALA_STATE_STORE_PORT=24000

  4. IMPALA_BACKEND_PORT=22000

  5. IMPALA_LOG_DIR=/var/log/impala

  6.  
  7. IMPALA_MEM_DEF=$(free -m |awk 'NR==2{print $2-5120}')

  8. hostname=`hostname -f |tr "[:upper:]" "[:lower:]"`

  9.  
  10. IMPALA_CATALOG_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_host=${IMPALA_STATE_STORE_HOST} \

  11. -kerberos_reinit_interval=60\

  12. -principal=impala/${hostname}@JAVACHEN.COM \

  13. -keytab_file=/etc/impala/conf/impala.keytab

  14. "

  15.  
  16. IMPALA_STATE_STORE_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}\

  17. -statestore_subscriber_timeout_seconds=15 \

  18. -kerberos_reinit_interval=60 \

  19. -principal=impala/${hostname}@JAVACHEN.COM \

  20. -keytab_file=/etc/impala/conf/impala.keytab

  21. "

  22. IMPALA_SERVER_ARGS=" \

  23. -log_dir=${IMPALA_LOG_DIR} \

  24. -catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \

  25. -state_store_port=${IMPALA_STATE_STORE_PORT} \

  26. -use_statestore \

  27. -state_store_host=${IMPALA_STATE_STORE_HOST} \

  28. -be_port=${IMPALA_BACKEND_PORT} \

  29. -kerberos_reinit_interval=60 \

  30. -be_principal=impala/${hostname}@JAVACHEN.COM \

  31. -principal=impala/cdh1@JAVACHEN.COM \

  32. -keytab_file=/etc/impala/conf/impala.keytab \

  33. -mem_limit=${IMPALA_MEM_DEF}m

  34. "

  35.  
  36. ENABLE_CORE_DUMPS=false

将修改的上面文件同步到其余节点:cdh二、cdh3:

 
  1. $ scp /etc/default/impala cdh2:/etc/default/impala

  2. $ scp /etc/default/impala cdh3:/etc/default/impala

更新 impala 配置文件下的文件并同步到其余节点:

 
  1. cp /etc/hadoop/conf/core-site.xml /etc/impala/conf/

  2. cp /etc/hadoop/conf/hdfs-site.xml /etc/impala/conf/

  3. cp /etc/hive/conf/hive-site.xml /etc/impala/conf/

  4.  
  5. scp -r /etc/impala/conf cdh2:/etc/impala

  6. scp -r /etc/impala/conf cdh3:/etc/impala

4. 启动服务

启动 impala-state-store

impala-state-store 是经过 impala 用户启动的,故在 cdh1 上先获取 impala 用户的 ticket 再启动服务:

 
  1. $ kinit -k -t /etc/impala/conf/impala.keytab impala/cdh1@JAVACHEN.COM

  2. $ service impala-state-store start

而后查看日志,确认是否启动成功。

$ tailf /var/log/impala/statestored.INFO

启动 impala-catalog

impala-catalog 是经过 impala 用户启动的,故在 cdh1 上先获取 impala 用户的 ticket 再启动服务:

 
  1. $ kinit -k -t /etc/impala/conf/impala.keytab impala/cdh1@JAVACHEN.COM

  2. $ service impala-catalog start

而后查看日志,确认是否启动成功。

$ tailf /var/log/impala/catalogd.INFO

启动 impala-server

impala-server 是经过 impala 用户启动的,故在 cdh1 上先获取 impala 用户的 ticket 再启动服务:

 
  1. $ kinit -k -t /etc/impala/conf/impala.keytab impala/cdh1@JAVACHEN.COM

  2. $ service impala-server start

而后查看日志,确认是否启动成功。

$ tailf /var/log/impala/impalad.INFO

5. 测试

测试 impala-shell

在启用了 kerberos 以后,运行 impala-shell 时,须要添加 -k 参数:

 
  1. $ impala-shell -k

  2. Starting Impala Shell using Kerberos authentication

  3. Using service name 'impala'

  4. Connected to cdh1:21000

  5. Server version: impalad version 1.3.1-cdh4 RELEASE (build 907481bf45b248a7bb3bb077d54831a71f484e5f)

  6. Welcome to the Impala shell. Press TAB twice to see a list of available commands.

  7.  
  8. Copyright (c) 2012 Cloudera, Inc. All rights reserved.

  9.  
  10. (Shell build version: Impala Shell v1.3.1-cdh4 (907481b) built on Wed Apr 30 14:23:48 PDT 2014)

  11. [cdh1:21000] >

  12. [cdh1:21000] > show tables;

  13. Query: show tables

  14. +------+

  15. | name |

  16. +------+

  17. | a |

  18. | b |

  19. | c |

  20. | d |

  21. +------+

  22. Returned 4 row(s) in 0.08s

6. 排除

若是出现下面异常:

[cdh1:21000] > select * from test limit 10;
Query: select * from test limit 10
ERROR: AnalysisException: Failed to load metadata for table: default.test
CAUSED BY: TableLoadingException: Failed to load metadata for table: test
CAUSED BY: TTransportException: java.net.SocketTimeoutException: Read timed out
CAUSED BY: SocketTimeoutException: Read timed out

则须要在 hive-site.xml 中添加下面参数:

 
  1. <property>

  2. <name>hive.metastore.client.socket.timeout</name>

  3. <value>3600</value>

  4. </property>

本文主要记录 CDH 5.2 Hadoop 集群中配置 Impala 和 Hive 集成 Sentry 的过程,包括 Sentry 的安装、配置以及和 Impala、Hive 集成后的测试。

使用 Sentry 来管理集群的权限,须要先在集群上配置好 Kerberos。

关于 Hadoop 集群上配置 kerberos 以及 ldap 的过程请参考本博客如下文章:

Sentry 会安装在三个节点的 hadoop 集群上,每一个节点的ip、主机名和部署的组件分配以下:

192.168.56.121        cdh1     NameNode、Hive、ResourceManager、HBase、impala-state-store、impala-catalog、Kerberos Server、sentry-store
192.168.56.122        cdh2     DataNode、SSNameNode、NodeManager、HBase、impala-server
192.168.56.123        cdh3     DataNode、HBase、NodeManager、impala-server

Sentry 的使用有两种方式,一是基于文件的存储方式(SimpleFileProviderBackend),一是基于数据库的存储方式(SimpleDbProviderBackend),若是使用基于文件的存储则只须要安装sentry,不然还须要安装 sentry-store

1. 基于数据库的存储方式

1.1 安装服务

在 cdh1 节点上安装 sentry-store 服务:

yum install sentry sentry-store -y

修改 Sentry 的配置文件 /etc/sentry/conf/sentry-store-site.xml,下面的配置参考了 Sentry源码中的配置例子

 
  1. <?xml version="1.0" encoding="UTF-8"?>

  2. <configuration>

  3. <property>

  4. <name>sentry.service.admin.group</name>

  5. <value>impala,hive,hue</value>

  6. </property>

  7. <property>

  8. <name>sentry.service.allow.connect</name>

  9. <value>impala,hive,hue</value>

  10. </property>

  11. <property>

  12. <name>sentry.verify.schema.version</name>

  13. <value>true</value>

  14. </property>

  15. <property>

  16. <name>sentry.service.server.rpc-address</name>

  17. <value>cdh1</value>

  18. </property>

  19. <property>

  20. <name>sentry.service.server.rpc-port</name>

  21. <value>8038</value>

  22. </property>

  23. <property>

  24. <name>sentry.store.jdbc.url</name>

  25. <value>jdbc:postgresql://cdh1/sentry</value>

  26. </property>

  27. <property>

  28. <name>sentry.store.jdbc.driver</name>

  29. <value>org.postgresql.Driver</value>

  30. </property>

  31. <property>

  32. <name>sentry.store.jdbc.user</name>

  33. <value>sentry</value>

  34. </property>

  35. <property>

  36. <name>sentry.store.jdbc.password</name>

  37. <value>redhat</value>

  38. </property>

  39. <property>

  40. <name>sentry.hive.server</name>

  41. <value>server1</value>

  42. </property>

  43. <property>

  44. <name>sentry.store.group.mapping</name>

  45. <value>org.apache.sentry.provider.common.HadoopGroupMappingService</value>

  46. </property>

  47. </configuration>

建立数据库,请参考 Hadoop自动化安装shell脚本

 
  1. yum install postgresql-server postgresql-jdbc -y

  2.  
  3. ln -s /usr/share/java/postgresql-jdbc.jar /usr/lib/hive/lib/postgresql-jdbc.jar

  4. ln -s /usr/share/java/postgresql-jdbc.jar /usr/lib/sentry/lib/postgresql-jdbc.jar

  5.  
  6. su -c "cd ; /usr/bin/pg_ctl start -w -m fast -D /var/lib/pgsql/data" postgres

  7. su -c "cd ; /usr/bin/psql --command \"create user sentry with password 'redhat'; \" " postgres

  8. su -c "cd ; /usr/bin/psql --command \"CREATE DATABASE sentry owner=sentry;\" " postgres

  9. su -c "cd ; /usr/bin/psql --command \"GRANT ALL privileges ON DATABASE sentry TO sentry;\" " postgres

  10. su -c "cd ; /usr/bin/psql -U sentry -d sentry -f /usr/lib/sentry/scripts/sentrystore/upgrade/sentry-postgres-1.4.0-cdh5.sql" postgres

  11. su -c "cd ; /usr/bin/pg_ctl restart -w -m fast -D /var/lib/pgsql/data" postgres

/var/lib/pgsql/data/pg_hba.conf 内容以下:

# TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD

# "local" is for Unix domain socket connections only
local   all         all                               md5
# IPv4 local connections:
#host    all         all         0.0.0.0/0             trust
host    all         all         127.0.0.1/32          md5

# IPv6 local connections:
#host    all         all         ::1/128               nd5

若是集群开启了 Kerberos 验证,则须要在该节点上生成 Sentry 服务的 principal 并导出为 ticket:

 
  1. $ cd /etc/sentry/conf

  2.  
  3. kadmin.local -q "addprinc -randkey sentry/cdh1@JAVACHEN.COM "

  4. kadmin.local -q "xst -k sentry.keytab sentry/cdh1@JAVACHEN.COM "

  5.  
  6. chown sentry:hadoop sentry.keytab ; chmod 400 *.keytab

而后,在/etc/sentry/conf/sentry-store-site.xml 中添加以下内容:

 
  1. <property>

  2. <name>sentry.service.security.mode</name>

  3. <value>kerberos</value>

  4. </property>

  5. <property>

  6. <name>sentry.service.server.principal</name>

  7. <value>sentry/cdh1@JAVACHEN.COM</value>

  8. </property>

  9. <property>

  10. <name>sentry.service.server.keytab</name>

  11. <value>/etc/sentry/conf/sentry.keytab</value>

  12. </property>

1.2. 准备测试数据

参考 Securing Impala for analysts,准备测试数据:

 
  1. $ cat /tmp/events.csv

  2. 10.1.2.3,US,android,createNote

  3. 10.200.88.99,FR,windows,updateNote

  4. 10.1.2.3,US,android,updateNote

  5. 10.200.88.77,FR,ios,createNote

  6. 10.1.4.5,US,windows,updateTag

  7.  
  8. $ hive -S

  9. hive> create database sensitive;

  10. hive> create table sensitive.events (

  11. ip STRING, country STRING, client STRING, action STRING

  12. ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

  13.  
  14. hive> load data local inpath '/tmp/events.csv' overwrite into table sensitive.events;

  15. hive> create database filtered;

  16. hive> create view filtered.events as select country, client, action from sensitive.events;

  17. hive> create view filtered.events_usonly as

  18. select * from filtered.events where country = 'US';

1.3 Hive-server2 集成 sentry

要求

在使用 Sentry 时,有以下要求:

一、须要修改 /user/hive/warehouse 权限:

 
  1. hdfs dfs -chmod -R 770 /user/hive/warehouse

  2. hdfs dfs -chown -R hive:hive /user/hive/warehouse

二、修改 hive-site.xml 文件,关掉 HiveServer2 impersonation

三、taskcontroller.cfg 文件中确保 min.user.id=0

修改配置文件

修改 hive-site.xml,添加以下:

 
  1. <property>

  2. <name>hive.security.authorization.task.factory</name>

  3. <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>

  4. </property>

  5. <property>

  6. <name>hive.server2.session.hook</name>

  7. <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>

  8. </property>

  9. <property>

  10. <name>hive.sentry.conf.url</name>

  11. <value>file:///etc/hive/conf/sentry-site.xml</value>

  12. </property>

在 /etc/hive/conf/ 目录建立 sentry-site.xml:

 
  1. <?xml version="1.0" encoding="UTF-8"?>

  2. <configuration>

  3. <property>

  4. <name>sentry.service.client.server.rpc-port</name>

  5. <value>8038</value>

  6. </property>

  7. <property>

  8. <name>sentry.service.client.server.rpc-address</name>

  9. <value>cdh1</value>

  10. </property>

  11. <property>

  12. <name>sentry.service.client.server.rpc-connection-timeout</name>

  13. <value>200000</value>

  14. </property>

  15. <property>

  16. <name>sentry.service.security.mode</name>

  17. <value>kerberos</value>

  18. </property>

  19. <property>

  20. <name>sentry.service.server.principal</name>

  21. <value>sentry/_HOST@JAVACHEN.COM</value>

  22. </property>

  23. <property>

  24. <name>sentry.service.server.keytab</name>

  25. <value>/etc/sentry/conf/sentry.keytab</value>

  26. </property>

  27. <property>

  28. <name>sentry.hive.provider</name>

  29. <value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value>

  30. </property>

  31. <property>

  32. <name>sentry.hive.provider.backend</name>

  33. <value>org.apache.sentry.provider.db.SimpleDBProviderBackend</value>

  34. </property>

  35. <property>

  36. <name>sentry.hive.server</name>

  37. <value>server1</value>

  38. </property>

  39. <property>

  40. <name>sentry.metastore.service.users</name>

  41. <value>hive</value>

  42. </property>

  43. <property>

  44. <name>sentry.hive.testing.mode</name>

  45. <value>false</value>

  46. </property>

  47. </configuration>

sentry-store 中建立角色和组

在 beeline 中经过 hive(注意,在 sentry 中 hive 为管理员用户)的 ticket 链接 hive-server2,建立 role、group 等等,执行下面语句:

 
  1. create role admin_role;

  2. GRANT ALL ON SERVER server1 TO ROLE admin_role;

  3. GRANT ROLE admin_role TO GROUP admin;

  4. GRANT ROLE admin_role TO GROUP hive;

  5.  
  6. create role test_role;

  7. GRANT ALL ON DATABASE filtered TO ROLE test_role;

  8. GRANT ALL ON DATABASE sensitive TO ROLE test_role;

  9. GRANT ROLE test_role TO GROUP test;

上面建立了两个角色,一个是 admin_role,具备管理员权限,能够读写全部数据库,并受权给 admin 和 hive 组(对应操做系统上的组);一个是 test_role,只能读写 filtered 和 sensitive 数据库,并受权给 test 组

在 ldap 建立测试用户

在 ldap 服务器上建立系统用户 yy_test,并使用 migrationtools 工具将该用户导入 ldap,最后设置 ldap 中该用户密码。

 
  1. # 建立 yy_test用户

  2. useradd yy_test

  3.  
  4. grep -E "yy_test" /etc/passwd >/opt/passwd.txt

  5. /usr/share/migrationtools/migrate_passwd.pl /opt/passwd.txt /opt/passwd.ldif

  6. ldapadd -x -D "uid=ldapadmin,ou=people,dc=lashou,dc=com" -w secret -f /opt/passwd.ldif

  7.  
  8. #使用下面语句修改密码,填入上面生成的密码,输入两次:

  9.  
  10. ldappasswd -x -D 'uid=ldapadmin,ou=people,dc=lashou,dc=com' -w secret "uid=yy_test,ou=people,dc=lashou,dc=com" -S

在每台 datanode 机器上建立 test 分组,并将 yy_test 用户加入到 test 分组:

groupadd test ; useradd yy_test; usermod -G test,yy_test yy_test

测试

经过 beeline 链接 hive-server2,进行测试:

 
  1. # 切换到 test 用户进行测试

  2. $ su test

  3.  
  4. $ kinit -k -t test.keytab test/cdh1@JAVACHEN.COM

  5.  
  6. $ beeline -u "jdbc:hive2://cdh1:10000/default;principal=test/cdh1@JAVACHEN.COM"

1.4 Impala 集成 Sentry

修改配置

修改 /etc/default/impala 文件中的 IMPALA_SERVER_ARGS 参数,添加:

 
  1. -server_name=server1

  2. -sentry_config=/etc/impala/conf/sentry-site.xml

在 IMPALA_CATALOG_ARGS 中添加:

-sentry_config=/etc/impala/conf/sentry-site.xml

注意:server1 必须和 sentry-provider.ini 文件中的保持一致。

IMPALA_SERVER_ARGS 参数最后以下:

 
  1. hostname=`hostname -f |tr "[:upper:]" "[:lower:]"`

  2.  
  3. IMPALA_SERVER_ARGS=" \

  4. -log_dir=${IMPALA_LOG_DIR} \

  5. -catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \

  6. -state_store_port=${IMPALA_STATE_STORE_PORT} \

  7. -use_statestore \

  8. -state_store_host=${IMPALA_STATE_STORE_HOST} \

  9. -kerberos_reinit_interval=60 \

  10. -principal=impala/${hostname}@JAVACHEN.COM \

  11. -keytab_file=/etc/impala/conf/impala.keytab \

  12. -enable_ldap_auth=true -ldap_uri=ldaps://cdh1 -ldap_baseDN=ou=people,dc=javachen,dc=com \

  13. -server_name=server1 \

  14. -sentry_config=/etc/impala/conf/sentry-site.xml \

  15. -be_port=${IMPALA_BACKEND_PORT} -default_pool_max_requests=-1 -mem_limit=60%"

建立 /etc/impala/conf/sentry-site.xml 内容以下:

 
  1. <?xml version="1.0" encoding="UTF-8"?>

  2. <configuration>

  3. <property>

  4. <name>sentry.service.client.server.rpc-port</name>

  5. <value>8038</value>

  6. </property>

  7. <property>

  8. <name>sentry.service.client.server.rpc-address</name>

  9. <value>cdh1</value>

  10. </property>

  11. <property>

  12. <name>sentry.service.client.server.rpc-connection-timeout</name>

  13. <value>200000</value>

  14. </property>

  15. <property>

  16. <name>sentry.service.security.mode</name>

  17. <value>kerberos</value>

  18. </property>

  19. <property>

  20. <name>sentry.service.server.principal</name>

  21. <value>sentry/_HOST@JAVACHEN.COM</value>

  22. </property>

  23. <property>

  24. <name>sentry.service.server.keytab</name>

  25. <value>/etc/sentry/conf/sentry.keytab</value>

  26. </property>

  27. </configuration>

测试

请参考下午基于文件存储方式中 impala 的测试。

2. 基于文件存储方式

2.1 hive 集成 sentry

修改配置文件

在 hive 的 /etc/hive/conf 目录下建立 sentry-site.xml 文件,内容以下:

 
  1. <?xml version="1.0" encoding="UTF-8"?>

  2. <configuration>

  3. <property>

  4. <name>hive.sentry.server</name>

  5. <value>server1</value>

  6. </property>

  7. <property>

  8. <name>sentry.hive.provider.backend</name>

  9. <value>org.apache.sentry.provider.file.SimpleFileProviderBackend</value>

  10. </property>

  11. <property>

  12. <name>hive.sentry.provider</name>

  13. <value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value>

  14. </property>

  15. <property>

  16. <name>hive.sentry.provider.resource</name>

  17. <value>/user/hive/sentry/sentry-provider.ini</value>

  18. </property>

  19. </configuration>

建立 sentry-provider.ini 文件并将其上传到 hdfs 的 /user/hive/sentry/ 目录:

 
  1. $ cat /tmp/sentry-provider.ini

  2. [databases]

  3. # Defines the location of the per DB policy file for the customers DB/schema

  4. #db1 = hdfs://cdh1:8020/user/hive/sentry/db1.ini

  5.  
  6. [groups]

  7. admin = any_operation

  8. hive = any_operation

  9. test = select_filtered

  10.  
  11. [roles]

  12. any_operation = server=server1->db=*->table=*->action=*

  13. select_filtered = server=server1->db=filtered->table=*->action=SELECT

  14. select_us = server=server1->db=filtered->table=events_usonly->action=SELECT

  15.  
  16. [users]

  17. test = test

  18. hive= hive

  19.  
  20. $ hdfs dfs -rm -r /user/hive/sentry/sentry-provider.ini

  21. $ hdfs dfs -put /tmp/sentry-provider.ini /user/hive/sentry/

  22. $ hdfs dfs -chown hive:hive /user/hive/sentry/sentry-provider.ini

  23. $ hdfs dfs -chmod 640 /user/hive/sentry/sentry-provider.ini

关于 sentry-provider.ini 文件的语法说明,请参考官方文档。这里我指定了 Hive 组有所有权限,并指定 Hive 用户属于 Hive 分组,而其余两个分组只有部分权限。

而后在 hive-site.xml 中添加以下配置:

 
  1. <property>

  2. <name>hive.security.authorization.task.factory</name>

  3. <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>

  4. </property>

  5. <property>

  6. <name>hive.server2.session.hook</name>

  7. <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>

  8. </property>

  9. <property>

  10. <name>hive.sentry.conf.url</name>

  11. <value>file:///etc/hive/conf/sentry-site.xml</value>

  12. </property>

将配置文件同步到其余节点,并重启 hive-server2 服务。

测试

这里,我集群中 hive-server2 开启了 kerberos 认证,故经过 hive 用户来链接 hive-server2。

 
  1. $ kinit -k -t /etc/hive/conf/hive.keytab hive/cdh1@JAVACHEN.COM

  2.  
  3. $ beeline -u "jdbc:hive2://cdh1:10000/default;principal=hive/cdh1@JAVACHEN.COM"

  4. scan complete in 10ms

  5. Connecting to jdbc:hive2://cdh1:10000/default;principal=hive/cdh1@JAVACHEN.COM

  6. Connected to: Apache Hive (version 0.13.1-cdh5.2.0)

  7. Driver: Hive JDBC (version 0.13.1-cdh5.2.0)

  8. Transaction isolation: TRANSACTION_REPEATABLE_READ

  9. Beeline version 0.13.1-cdh5.2.0 by Apache Hive

  10. 5 rows selected (0.339 seconds)

  11.  
  12. 0: jdbc:hive2://cdh1:10000/default> show databases;

  13. +----------------+--+

  14. | database_name |

  15. +----------------+--+

  16. | default |

  17. | filtered |

  18. | sensitive |

  19. +----------------+--+

  20. 10 rows selected (0.145 seconds)

  21.  
  22. 0: jdbc:hive2://cdh1:10000/default> use filtered

  23. No rows affected (0.132 seconds)

  24.  
  25. 0: jdbc:hive2://cdh1:10000/default> show tables;

  26. +----------------+--+

  27. | tab_name |

  28. +----------------+--+

  29. | events |

  30. | events_usonly |

  31. +----------------+--+

  32. 2 rows selected (0.158 seconds)

  33. 0: jdbc:hive2://cdh1:10000/default> use sensitive;

  34. No rows affected (0.115 seconds)

  35.  
  36. 0: jdbc:hive2://cdh1:10000/default> show tables;

  37. +-----------+--+

  38. | tab_name |

  39. +-----------+--+

  40. | events |

  41. +-----------+--+

  42. 1 row selected (0.148 seconds)

2.3 impala 集成 sentry

修改配置文件

修改 /etc/default/impala 文件中的 IMPALA_SERVER_ARGS 参数,添加:

 
  1. -server_name=server1

  2. -authorization_policy_file=/user/hive/sentry/sentry-provider.ini

  3. -authorization_policy_provider_class=org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider

注意:server1 必须和 sentry-provider.ini 文件中的保持一致。

IMPALA_SERVER_ARGS 参数最后以下:

 
  1. hostname=`hostname -f |tr "[:upper:]" "[:lower:]"`

  2.  
  3. IMPALA_SERVER_ARGS=" \

  4. -log_dir=${IMPALA_LOG_DIR} \

  5. -catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \

  6. -state_store_port=${IMPALA_STATE_STORE_PORT} \

  7. -use_statestore \

  8. -state_store_host=${IMPALA_STATE_STORE_HOST} \

  9. -be_port=${IMPALA_BACKEND_PORT} \

  10. -server_name=server1 \

  11. -authorization_policy_file=/user/hive/sentry/sentry-provider.ini \

  12. -authorization_policy_provider_class=org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider \

  13. -enable_ldap_auth=true -ldap_uri=ldaps://cdh1 -ldap_baseDN=ou=people,dc=javachen,dc=com \

  14. -kerberos_reinit_interval=60 \

  15. -principal=impala/${hostname}@JAVACHEN.COM \

  16. -keytab_file=/etc/impala/conf/impala.keytab \

  17. "

测试

重启 impala-server 服务,而后进行测试。由于我这里 impala-server 集成了 kerberos 和 ldap,如今经过 ldap 来进行测试。

先经过 ldap 的 test 用户来测试:

 
  1. impala-shell -l -u test

  2. Starting Impala Shell using LDAP-based authentication

  3. LDAP password for test:

  4. Connected to cdh1:21000

  5. Server version: impalad version 2.0.0-cdh5 RELEASE (build ecf30af0b4d6e56ea80297df2189367ada6b7da7)

  6. Welcome to the Impala shell. Press TAB twice to see a list of available commands.

  7.  
  8. Copyright (c) 2012 Cloudera, Inc. All rights reserved.

  9.  
  10. (Shell build version: Impala Shell v2.0.0-cdh5 (ecf30af) built on Sat Oct 11 13:56:06 PDT 2014)

  11.  
  12. [cdh1:21000] > show databases;

  13. Query: show databases

  14. +---------+

  15. | name |

  16. +---------+

  17. | default |

  18. +---------+

  19. Fetched 1 row(s) in 0.11s

  20.  
  21. [cdh1:21000] > show tables;

  22.  
  23. Query: show tables

  24. ERROR: AuthorizationException: User 'test' does not have privileges to access: default.*

  25.  
  26. [cdh1:21000] >

能够看到 test 用户没有权限查看和数据库,这是由于 sentry-provider.ini 文件中并无给 test 用户分配任何权限。

下面使用 hive 用户来测试。使用下面命令在 ldap 中建立 hive 用户和组并给 hive 用户设置密码。

 
  1. $ grep hive /etc/passwd >/opt/passwd.txt

  2. $ /usr/share/migrationtools/migrate_passwd.pl /opt/passwd.txt /opt/passwd.ldif

  3.  
  4. $ ldapadd -x -D "uid=ldapadmin,ou=people,dc=javachen,dc=com" -w secret -f /opt/passwd.ldif

  5.  
  6. $ grep hive /etc/group >/opt/group.txt

  7. $ /usr/share/migrationtools/migrate_group.pl /opt/group.txt /opt/group.ldif

  8.  
  9. $ ldapadd -x -D "uid=ldapadmin,ou=people,dc=javachen,dc=com" -w secret -f /opt/group.ldif

  10.  
  11. # 修改 ldap 中 hive 用户密码

  12. $ ldappasswd -x -D 'uid=ldapadmin,ou=people,dc=javachen,dc=com' -w secret "uid=hive,ou=people,dc=javachen,dc=com" -S

而后,使用 hive 用户测试:

$ impala-shell -l -u hive
    Starting Impala Shell using LDAP-based authentication
    LDAP password for hive:
    Connected to cdh1:21000
    Server version: impalad version 2.0.0-cdh5 RELEASE (build ecf30af0b4d6e56ea80297df2189367ada6b7da7)
    Welcome to the Impala shell. Press TAB twice to see a list of available commands.

    Copyright (c) 2012 Cloudera, Inc. All rights reserved.

    (Shell build version: Impala Shell v2.0.0-cdh5 (ecf30af) built on Sat Oct 11 13:56:06 PDT 2014)

[cdh1:21000] > show databases;
    Query: show databases
    +------------------+
    | name             |
    +------------------+
    | _impala_builtins |
    | default          |
    | filtered         |
    | sensitive        |
    +------------------+
    Fetched 11 row(s) in 0.11s

[cdh1:21000] > use sensitive;
    Query: use sensitive

[cdh1:21000] > show tables;
    Query: show tables
    +--------+
    | name   |
    +--------+
    | events |
    +--------+
    Fetched 1 row(s) in 0.11s

[cdh1:21000] > select * from events;
    Query: select * from events
    +--------------+---------+---------+------------+
    | ip           | country | client  | action     |
    +--------------+---------+---------+------------+
    | 10.1.2.3     | US      | android | createNote |
    | 10.200.88.99 | FR      | windows | updateNote |
    | 10.1.2.3     | US      | android | updateNote |
    | 10.200.88.77 | FR      | ios     | createNote |
    | 10.1.4.5     | US      | windows | updateTag  |
    +--------------+---------+---------+------------+
    Fetched 5 row(s) in 0.76s

一样,你还可使用其余用户来测试。

也可使用 beeline 来链接 impala-server 来进行测试:

 
  1. $ beeline -u "jdbc:hive2://cdh1:21050/default;" -n test -p test

  2. scan complete in 2ms

  3. Connecting to jdbc:hive2://cdh1:21050/default;

  4. Connected to: Impala (version 2.0.0-cdh5)

  5. Driver: Hive JDBC (version 0.13.1-cdh5.2.0)

  6. Transaction isolation: TRANSACTION_REPEATABLE_READ

  7. Beeline version 0.13.1-cdh5.2.0 by Apache Hive

  8. 0: jdbc:hive2://cdh1:21050/default>

3. 参考文章

  • Securing Impala for analysts
  • Setting Up Hive Authorization with Sentry
  • Sentry源码中的配置例子

    impala受权

    SHOW CREATE table hue.auth_permission;
    ALTER TABLE hue.auth_permission DROP FOREIGN KEY content_type_id_refs_id_id value;
    DELETE FROM hue.django_content_type;
    ALTER TABLE hue.auth_permission ADD FOREIGN KEY (content_type_id) REFERENCES django_content_type (id);

    为HDFS定义URI时,还必须指定NameNode。例如:
    GRANT ALL ON URI文件:/// path / to / dir TO <role>
    GRANT ALL ON URI hdfs:// namenode:port / path / to / dir TO <role>
    GRANT ALL ON URI hdfs:// ha -nn-uri / path / to / dir TO <role>
    管理用户的权限示例
    在此示例中,SQL语句授予 entire_server 角色服务器中的数据库和URI的全部特权。

    CREATE ROLE whole_server;
    GRANT ROLE whole_server TO GROUP admin_group;
    GRANT ALL ON SERVER server1 TO ROLE whole_server;


    具备特定数据库和表的权限的用户
    若是用户具备特定数据库中特定表的权限,则用户能够访问这些内容,但不能访问其余内容。他们能够在输出中看到表及其父数据库 显示表格 和 显示数据库, 使用 适当的数据库,并执行相关的行动(选择 和/或 插)基于表权限。要实际建立表须要全部 数据库级别的权限,所以您能够为用户设置单独的角色,以设置架构以及对表执行平常操做的其余用户或应用程序。
    CREATE ROLE one_database;
    GRANT ROLE one_database TO GROUP admin_group;
    GRANT ALL ON DATABASE db1 TO ROLE one_database;

    CREATE ROLE instructor;
    GRANT ROLE instructor TO GROUP trainers;
    GRANT ALL ON TABLE db1.lesson TO ROLE instructor;

    # This particular course is all about queries, so the students can SELECT but not INSERT or CREATE/DROP.
    CREATE ROLE student;
    GRANT ROLE student TO GROUP visitors;
    GRANT SELECT ON TABLE db1.training TO ROLE student;

    使用外部数据文件的权限
    经过数据插入数据时 负载数据 语句,或从普通Impala数据库目录以外的HDFS位置引用,用户还须要对与这些HDFS位置对应的URI的适当权限。

    在这个例子中:

    该 external_table 角色能够插入并查询Impala表, external_table.sample。
    该 STAGING_DIR角色能够指定HDFS路径/用户/ Cloudera的/ external_data与负载数据声明。当Impala查询或加载数据文件时,它会对该目录中的全部文件进行操做,而不只仅是单个文件,所以任何Impala均可以位置 参数指的是目录而不是单个文件。
    CREATE ROLE external_table; 
    GRANT ROLE external_table TO GROUP cloudera; 
    GRANT ALL ON TABLE external_table.sample TO ROLE external_table; 

    CREATE ROLE staging_dir; 
    GRANT ROLE staging TO GROUP cloudera; 
    GRANT ALL ON URI'hdfs://127.0.0.1:8020 / user / cloudera / external_data'TO ROLE staging_dir;

    将管理员职责与读写权限分开
    要建立数据库,您须要该数据库的彻底权限,而对该数据库中的表的平常操做能够在特定表上使用较低级别的权限执行。所以,您能够为每一个数据库或应用程序设置单独的角色:能够建立或删除数据库的管理角色,以及只能访问相关表的用户级角色。

    在此示例中,职责分为3个不一样组中的用户:
    CREATE ROLE training_sysadmin;
    GRANT ROLE training_sysadmin TO GROUP supergroup;
    GRANT ALL ON DATABASE training1 TO ROLE training_sysadmin;

    CREATE ROLE instructor;
    GRANT ROLE instructor TO GROUP cloudera;
    GRANT ALL ON TABLE training1.course1 TO ROLE instructor;

    CREATE ROLE visitor;
    GRANT ROLE student TO GROUP visitor;
    GRANT SELECT ON TABLE training1.course1 TO ROLE student;

    server=server_name->db=database_name->table=table_name->action=SELECT
    server=server_name->db=database_name->table=table_name->action=ALL

    server=impala-host.example.com->db=default->table=t1->action=SELECT server=impala-host.example.com->db=*->table=audit_log->action=SELECT server=impala-host.example.com->db=default->table=t1->action=*

相关文章
相关标签/搜索