转载请注明出处:http://www.cnblogs.com/xiaodf/html
以前的博客介绍了经过Kerberos + Sentry的方式实现了hive server2的身份认证和权限管理功能,本文主要介绍Spark SQL JDBC方式操做Hive库时的身份认证和权限管理实现。java
ThriftServer是一个JDBC/ODBC接口,用户能够经过JDBC/ODBC链接ThriftServer来访问SparkSQL的数据。ThriftServer在启动的时候,会启动了一个sparkSQL的应用程序,而经过JDBC/ODBC链接进来的客户端共同分享这个sparkSQL应用程序的资源,也就是说不一样的用户之间能够共享数据;ThriftServer启动时还开启一个侦听器,等待JDBC客户端的链接和提交查询。因此,在配置ThriftServer的时候,至少要配置ThriftServer的主机名和端口,若是要使用hive数据的话,还要提供hive metastore的uris。node
前提:sql
本文是在如下几个部署前提下进行的实验:docker
(1)CDH 开启了Kerberos身份认证,并安装了Sentry;数据库
(2)Hive权限经过Sentry服务控制;apache
(3)HDFS开启了HDFS ACL与Sentry的权限同步功能,经过sql语句更改Hive表的权限,会同步到相应的HDFS文件。session
以上各项配置可参考我以前博客:http://www.cnblogs.com/xiaodf/p/5968248.htmlless
CDH自带的spark不支持thrift server,因此须要自行下载spark编译好的安装包,下载地址以下:http://spark.apache.org/downloads.htmlide
本文下载的spark版本为1.5.2,
将集群hive-site.xml文件拷贝到spark目录的conf下
[root@t162 spark-1.5.2-bin-hadoop2.6]# cd conf/
[root@t162 conf]# ll
total 52
-rw-r--r-- 1 root root 202 Oct 25 13:05 docker.properties.template
-rw-r--r-- 1 root root 303 Oct 25 13:05 fairscheduler.xml.template
-rw-r--r-- 1 root root 5708 Oct 25 13:08 hive-site.xml
-rw-r--r-- 1 root root 949 Oct 25 13:05 log4j.properties.template
-rw-r--r-- 1 root root 5886 Oct 25 13:05 metrics.properties.template
-rw-r--r-- 1 root root 80 Oct 25 13:05 slaves.template
-rw-r--r-- 1 root root 507 Oct 25 13:05 spark-defaults.conf.template
-rwxr-xr-x 1 root root 4299 Oct 25 13:08 spark-env.sh
-rw-r--r-- 1 root root 3418 Oct 25 13:05 spark-env.sh.template
-rwxr-xr-x 1 root root 119 Oct 25 13:09 stopjdbc.sh
修改hive-site.xml参数hive.server2.enable.doAs为true,注意doAs务必是true,不然spark jdbc用户权限控制会失效。
<property> <name>hive.server2.enable.doAs</name> <value>true</value> </property>
生成spark-env.sh文件,并添加参数
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
此处HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
调用start-thriftserver.sh脚本启动thrift server
#!/bin/sh #start Spark-thriftserver export YARN_CONF_DIR=/etc/hadoop/conf file="hive-site.xml" dir=$(pwd) cd conf/ if [ ! -e "$file" ] then cp /etc/hive/conf/hive-site.xml $dir/conf/ fi cd ../sbin ./start-thriftserver.sh --name SparkJDBC --master yarn-client --num-executors 10 --executor-memory 2g --executor-cores 4 --driver-memory 10g
--driver-cores 2 --conf spark.storage.memoryFraction=0.2 --conf spark.shuffle.memoryFraction=0.6 --hiveconf hive.server2.thrift.port=10001
--hiveconf hive.server2.logging.operation.enabled=true --hiveconf hive.server2.authentication.kerberos.principal=hive/t162@HADOOP.COM
--hiveconf hive.server2.authentication.kerberos.keytab /home/hive.keytab
上面脚本实际上就是提交了一个spark job,其中主要参数以下:
master :指定spark提交模式为yarn-client hive.server2.thrift.port : 指定thrift server的端口 hive.server2.authentication.kerberos.principal:指定启动thrift server的超级管理员principal,此处超级管理员为hive hive.server2.authentication.kerberos.keytab : 超级管理员对应的keytab
执行startjdbc.sh须要kinit到hive库的超管来执行,hive库的超管须要在开启sentry与hdfs权限同步基础上,被赋予整个hive库的权限,即对hive库的hdfs整个目录也有全部权限。
#!/bin/sh # Stop SparkJDBC cd sbin ./spark-daemon.sh stop org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 1
Spark SQL Thriftserver认证,目的是让不一样的用户,使用不一样的身份来登陆beeline。使用Kerberos,的确能够解决服务互相认证、用户认证的功能。
使用使用管理员帐户启动,已配置在启动脚本中。thriftserver实际是个spark Job,经过spark-submit提交到YARN上去,须要这个帐户用来访问YARN和HDFS;若是使用一些普通帐户,因为HDFS权限不足,可能启动不了,由于须要往HDFS写一些东西。
[root@t162 spark-1.5.2-bin-hadoop2.6]# ./startjdbc.sh starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /home/iie/spark-1.5.2-bin-hadoop2/spark-1.5.2-bin-hadoop2.6/sbin/../logs/
spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-t162.out ...... 16/10/25 16:56:07 INFO thrift.ThriftCLIService: Starting ThriftBinaryCLIService on port 10001 with 5...500 worker threads
能够经过输出日志查看服务启动状况
[root@t162 spark-1.5.2-bin-hadoop2.6]# tailf /home/iie/spark-1.5.2-bin-hadoop2/spark-1.5.2-bin-hadoop2.6/sbin/../logs/
spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-t162.out
由于服务启动了kerberos身份认证,没有认证时链接服务会报错,以下所示:
[root@t161 ~]# beeline -u "jdbc:hive2://t162:10001/;principal=hive/t162@HADOOP.COM" 16/10/25 16:59:04 WARN mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present. Continuing without it. scan complete in 2ms Connecting to jdbc:hive2://t162:10001/;principal=hive/t162@HADOOP.COM 16/10/25 16:59:06 [main]: ERROR transport.TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
咱们用user1用户进行认证,就能够链接了,用户事先已建立,建立方式见http://www.cnblogs.com/xiaodf/p/5968282.html
[root@t161 ~]# kinit user1 Password for user1@HADOOP.COM: [root@t161 ~]# beeline -u "jdbc:hive2://t162:10001/;principal=hive/t162@HADOOP.COM" 16/10/25 17:01:46 WARN mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present. Continuing without it. scan complete in 3ms Connecting to jdbc:hive2://t162:10001/;principal=hive/t162@HADOOP.COM Connected to: Spark SQL (version 1.5.2) Driver: Hive JDBC (version 1.1.0-cdh5.7.2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.1.0-cdh5.7.2 by Apache Hive 0: jdbc:hive2://t162:10001/>
不一样的用户经过kinit使用本身的Principal+密码经过Kerberos的AS认证拿到TGT,就能够登陆到spark sql thriftserver上去查看库、表;
不过因为sts还不支持sqlbased authorization,因此还只能作到底层hdfs的权限隔离,比较惋惜;相对来讲hive的完整度高一些,支持SQLstandard authorization。
由于事先咱们已经开启了HDFS ACL与Sentry的权限同步功能,因此spark sql jdbc 的用户权限经过hive2的权限设置来实现。即先jdbc登陆hive2 ,再利用hive sql语句进行用户权限设置,而后表和数据库的权限会同步到对应的HDFS目录和文件,从而实现spark sql thriftserver基于底层hdfs的用户权限隔离。
以下所示,user1对test库的table1表有权限,对test库的table2表无权限,读table2表时显示无hdfs权限,即权限设置成功!
0: jdbc:hive2://node1:10000/> select * from test.table1 limit 1; +--------------+-------------+---------------------+----------+-----------+----------+---------------------------+-----------+-----------+------------------------+------------+---------------+-------------+--+ | cint | cbigint | cfloat | cdouble | cdecimal | cstring | cvarchar | cboolean | ctinyint | ctimestamp | csmallint | cipv4 | cdate | +--------------+-------------+---------------------+----------+-----------+----------+---------------------------+-----------+-----------+------------------------+------------+---------------+-------------+--+ | 15000000001 | 1459107060 | 1.8990000486373901 | 1.7884 | 1.92482 | 中文测试1 | /browser/addBasicInfo.do | true | -127 | 2014-05-14 00:53:21.0 | -63 | 0 | 2014-05-14 | +--------------+-------------+---------------------+----------+-----------+----------+---------------------------+-----------+-----------+------------------------+------------+---------------+-------------+--+ 1 row selected (3.165 seconds) 0: jdbc:hive2://node1:10000/> select * from test.table2 limit 10; Error: org.apache.hadoop.security.AccessControlException: Permission denied: user=user1, access=READ_EXECUTE, inode="/user/hive/warehouse/test.db/table2":hive:hive:drwxrwx--x at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkAccessAcl(DefaultAuthorizationProvider.java:365) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:258) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:175) at org.apache.sentry.hdfs.SentryAuthorizationProvider.checkPermission(SentryAuthorizationProvider.java:178) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6617) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6599) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6524) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:5061) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:5022) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:882) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getListing(AuthorizationProviderProxyClientProtocol.java:335) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:615) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) (state=,code=0)
权限测试可参考以前博客:http://www.cnblogs.com/xiaodf/p/5968282.html,此处略
使用spark1.6.0版本,启动thrift server服务后,执行“show databases”报以下错误:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException:
Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
查询资料说这多是1.6版本的一个bug,换成1.5.2版本后,没有这个问题了。下面为此问题的查询连接:https://forums.databricks.com/questions/7207/spark-thrift-server-on-kerberos-enabled-hadoophive.html
Spark SQL ThriftServer服务启动7天后,用户在用beeline命令去链接服务报错连不上了。
服务日志报一下错误:
17/01/18 13:46:08 INFO HiveMetaStore.audit: ugi=hive/t162@HADOOP.COM ip=unknown-ip-addr cmd=Metastore shutdown complete. 17/01/18 13:46:08 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before. 17/01/18 13:46:08 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before. 17/01/18 13:46:08 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before. 17/01/18 13:46:08 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before. 17/01/18 13:46:09 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before. 17/01/18 13:46:12 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before. 17/01/18 13:46:17 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before. 17/01/18 13:46:19 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 7 days before. 17/01/18 13:46:19 WARN ipc.Client: Couldn't setup connection for hive/t162@HADOOP.COM to t162/t161:8020 17/01/18 13:46:19 WARN thrift.ThriftCLIService: Error opening session: org.apache.hive.service.cli.HiveSQLException: Failed to open new session: java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException:
Failed on local exception: java.io.IOException: Couldn't setup connection for hive/t162@HADOOP.COM to t162/t161:8020; Host Details :
local host is: "t162/t161"; destination host is: "t162":8020; at org.apache.hive.service.cli.session.SessionManager.openSession(SessionManager.java:264) at org.apache.spark.sql.hive.thriftserver.SparkSQLSessionMa
缘由:建立kerberos库时咱们设置了principal的认证有效期和最大renew时间,以下/etc/krb5.conf文件内容所示:
[libdefaults] default_realm = HADOOP.COM dns_lookup_realm = false dns_lookup_kdc = false ticket_lifetime = 24h renew_lifetime = 7d forwardable = true renewable=true
7天后认证没法renew致使服务认证失败,用户连不上服务了。未解决这个问题咱们须要定时从新kinit下服务principal,咱们对服务启动脚本进行一下修改,添加定时认证脚本,以下所示:
#!/bin/sh #start Spark-thriftserver export YARN_CONF_DIR=/etc/hadoop/conf file="hive-site.xml" dir=$(pwd) cd conf/ if [ ! -e "$file" ] then cp /etc/hive/conf/hive-site.xml $dir/conf/ fi cd ../sbin ./start-thriftserver.sh --name SparkJDBC --master yarn-client --num-executors 10 --executor-memory 2g --executor-cores 4 --driver-memory 10g --driver-cores 2 --conf spark.storage.memoryFraction=0.2 --conf spark.shuffle.memoryFraction=0.6 --hiveconf hive.server2.thrift.port=10001 --hiveconf hive.server2.logging.operation.enabled=true --hiveconf hive.server2.authentication.kerberos.principal=hive/t162@HADOOP.COM --hiveconf hive.server2.authentication.kerberos.keytab=/home/hive.keytab while(true) do kinit -kt /home/hive.keytab hive/t162@HADOOP.COM sleep 6*24h done &
经测试,问题解决!