Java进行Hive的链接和访问

时间 2019-11-09

标签 java 进行 hive 链接访问栏目 Java 繁體版

原文原文链接

今天看了一遍不错的文章，关于Java访问Hive的，正好要用到这一块，分享到此以便更多的人能够学习和应用java

很是感谢博主的总结和分享python

博文连接： https://www.jianshu.com/p/4ef28607fc04程序员

Hive内置服务与HiveServer2应用

内置服务介绍

咱们执行hive --service help查看内置的服务帮助，图中的Service List右侧罗列了不少Hive支持的服务列表，种类不少。web

下面介绍最有用的一些服务：sql

（1）clishell

cli是Command Line Interface 的缩写，是Hive的命令行界面，用的比较多，是默认服务，直接能够在命令行里使用。数据库

（2）hiveserverapache

这个可让Hive以提供Thrift服务的服务器形式来运行，能够容许许多个不一样语言编写的客户端进行通讯，使用须要启动HiveServer服务以和客户端联系，咱们能够经过设置HIVE_PORT环境变量来设置服务器所监听的端口，在默认状况下，端口号为10000。
咱们可使用以下的指令启动该服务：hive --service hiveserver -p 10002，其中-p参数也是用来指定监听端口的。编程

（3）hwi浏览器

其实就是hive web interface的缩写它是hive的web借口，是hive cli的一个web替代方案。

（4）jar

与hadoop jar等价的Hive接口，这是运行类路径中同时包含Hadoop 和Hive类的Java应用程序的简便方式。

（5）metastore

在默认的状况下，metastore和hive服务运行在同一个进程中，使用这个服务，可让metastore做为一个单独的进程运行，咱们能够经过METASTOE——PORT来指定监听的端口号。

Hive的三种启动方式

hive shell模式

bin/hive 或者 bin/hive –-service cli

hive web界面启动模式

bin/hive –-service hwi &， & 表示后台运行。咱们后台启动hwi服务，而后输入jps查看进程发现多了一个RunJar，代表咱们的hive hwi启动成功。

用于经过浏览器来访问hive，感受没多大用途，浏览器访问地址是：http://huatec01:9999/hwi/

启动示意图：

浏览器访问：

hive远程服务 (端口号10000) 启动方式

bin/hive --service hiveserver2 &

用java，python等程序实现经过jdbc等驱动的访问hive就用这种起动方式了，这个是程序员最须要的方式了。

HiveServer与HiveServer2

HiveServer2介绍

HiveServer与HiveServer2，二者都容许远程客户端使用多种编程语言，经过HiveServer或者HiveServer2，客户端能够在不启动CLI的状况下对Hive中的数据进行操做，连这个和都容许远程客户端使用多种编程语言如java，python等向hive提交请求，取回结果。

官方说明：

HiveServer is scheduled to be removed from Hive releases starting Hive 0.15. See HIVE-6977. Please switch over to HiveServer2.

从hive0.15起就再也不支持hiveserver了(个人hive版本为2.1.1)，可是在这里咱们仍是要说一下hiveserver,其实在前面的Server List中就不包含hiveserver。

咱们也能够尝试执行bin/hive –-service hiveserver，会输出日志提示Service hiveserver not found。

HiveServer或者HiveServer2都是基于Thrift的，但HiveSever有时被称为Thrift server，而HiveServer2却不会。既然已经存在HiveServer，为何还须要HiveServer2呢？

这是由于HiveServer不能处理多于一个客户端的并发请求，这是因为HiveServer使用的Thrift接口所致使的限制，不能经过修改HiveServer的代码修正。所以在Hive-0.11.0版本中重写了HiveServer代码获得了HiveServer2，进而解决了该问题。HiveServer2支持多客户端的并发和认证，为开放API客户端如JDBC、ODBC提供更好的支持。

HiveServer与HiveServer2的区别

Hiveserver和hiveserver2的JDBC区别：

HiveServer version               Connection URL                    Driver Class 
HiveServer2                          jdbc:hive2://:                          org.apache.hive.jdbc.HiveDriver
HiveServer                          jdbc:hive://:                            org.apache.hadoop.hive.jdbc.HiveDriver

HiveServer2的配置

Hiveserver2容许在配置文件hive-site.xml中进行配置管理，具体的参数为：

hive.server2.thrift.min.worker.threads– 最小工做线程数，默认为5。  
hive.server2.thrift.max.worker.threads – 最小工做线程数，默认为500。  
hive.server2.thrift.port– TCP 的监听端口，默认为10000。  
hive.server2.thrift.bind.host– TCP绑定的主机，默认为localhost

咱们能够在hive-site.xml文件中搜索“hive.server2.thrift.min.worker.threads”属性（hive-site.xml文件配置属性达到5358行，太长了，建议搜索），而后进行编辑，示例以下：

从Hive-0.13.0开始，HiveServer2支持经过HTTP传输消息，该特性当客户端和服务器之间存在代理中介时特别有用。与HTTP传输相关的参数以下：

hive.server2.transport.mode – 默认值为binary（TCP），可选值HTTP。  
hive.server2.thrift.http.port– HTTP的监听端口，默认值为10001。  
hive.server2.thrift.http.path – 服务的端点名称，默认为 cliservice。  
hive.server2.thrift.http.min.worker.threads– 服务池中的最小工做线程，默认为5。  
hive.server2.thrift.http.max.worker.threads– 服务池中的最小工做线程，默认为500。

咱们同理能够进行搜索，而后进行配置。

启动HiveServer2

启动Hiveserver2有两种方式，一种是上面已经介绍过的hive --service hiveserver2，另外一种更为简洁，为hiveserver2。

咱们采用第二种方式启动hiveserver2,以下图所示：

启动后hiveserver2会在前台运行，咱们开启一个新的SSH连接，使用jps查看会发现多出一个RunJar进程，它表明的就是HiveServer2服务。

使用hive--service hiveserver2 –H或hive--service hiveserver2 –help查看帮助信息。

默认状况下，HiveServer2以提交查询的用户执行查询（true），若是hive.server2.enable.doAs设置为false，查询将以运行hiveserver2进程的用户运行。为了防止非加密模式下的内存泄露，能够经过设置下面的参数为true禁用文件系统的缓存

fs.hdfs.impl.disable.cache – 禁用HDFS文件系统缓存，默认值为false。  
fs.file.impl.disable.cache – 禁用本地文件系统缓存，默认值为false。

浏览器查看http://huatec01:10002，以下图所示：

配置和使用HiveServer2

配置坚挺端口和路径

<property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
  </property>
  <property>
    <name>hive.server2.thrift.bind.host</name>
    <value>huatec01</value>
    <description>Bind host on which to run the HiveServer2 Thrift service.</description>
  </property>

第一个属性默认便可，第二个将主机名改成咱们当前安装hive的节点。

设置impersonation

这样hive server会以提交用户的身份去执行语句，若是设置为false，则会以起hive server daemon的admin user来执行语句。

<property>
    <name>hive.server2.enable.doAs</name>
    <value>true</value>
    <description>
      Setting this property to true will have HiveServer2 execute
      Hive operations as the user making the calls to it.
    </description>
  </property>

咱们将值改成true。

hiveserver2节点配置

Hiveserver2已经再也不须要hive.metastore.local这个配置项了,咱们配置hive.metastore.uris，若是该属性值为空，则表示是metastore在本地，不然就是远程。

<property>
    <name>hive.metastore.uris</name>
    <value/>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
  </property>

默认留空，也就是metastore在本地，使用默认便可。

若是想要配置为远程的话，参考以下：

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://xxx.xxx.xxx.xxx:9083</value>
</property>

zookeeper配置

<property>
    <name>hive.support.concurrency</name>
    <value>true</value>
    <description>
      Whether Hive supports concurrency control or not. 
      A ZooKeeper instance must be up and running when using zookeeper Hive lock manager 
    </description>
  </property>
 <property>
    <name>hive.zookeeper.quorum</name>
    <value>huatec03:2181,huatec04:2181,huatec05:2181</value>
    <description>
      List of ZooKeeper servers to talk to. This is needed for: 
      1. Read/write locks - when hive.lock.manager is set to 
      org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager, 
      2. When HiveServer2 supports service discovery via Zookeeper.
      3. For delegation token storage if zookeeper store is used, if
      hive.cluster.delegation.token.store.zookeeper.connectString is not set
      4. LLAP daemon registry service
    </description>
  </property>

属性1设置支持并发，属性2设置Zookeeper集群。

注意：没有配置hive.zookeeper.quorum会致使没法并发执行hive ql请求和致使数据异常。

hiveserver2的Web UI配置

Hive 2.0 之后才支持Web UI的，在之前的版本中并不支持。

<property>
    <name>hive.server2.webui.host</name>
    <value>0.0.0.0</value>
    <description>The host address the HiveServer2 WebUI will listen on</description>
  </property>
  <property>
    <name>hive.server2.webui.port</name>
    <value>10002</value>
    <description>The port the HiveServer2 WebUI will listen on. This can beset to 0 or a negative integer to disable the web UI</description>
  </property>

默认便可，咱们经过浏览器访问：http://huatec01:10002便可访问hiveserver2，这个前面已经试过了。

启动服务

启动metastore

bin/hive --service metastore &

启动hiveserver2

bin/hive --service hiveserver2 &

WebUI：http://huatec01:10002

使用beeline控制台控制hiveserver2

首先咱们必须启动metastore和hiveserver2

而后启动beeline

bin/beeline

尝试链接metastore：

!connect jdbc:hive2://huatec01:10000 root root

以下图代表链接成功！

beeline错误1

beeline链接hiveserver2失败，报错以下：

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: master is not allowed to impersonate hive (state=,code=0)

解决方法：

关闭hadoop集群
修改core-site.xml文件，增长以下内容：

<property>
      <name>hadoop.proxyuser.hadoop.groups</name>
      <value>root</value>
      <description>Allow the superuser oozie to impersonate any members of the group group1 and group2</description>
 </property>
 
 <property>
      <name>hadoop.proxyuser.hadoop.hosts</name>
      <value>huatec01,127.0.0.1,localhost</value>
      <description>The superuser can connect only from host1 and host2 to impersonate a user</description>
  </property>

注意全部节点的core-site.xml都修改。

重启hadoop集群
启动metastore和hiveserver2,从新链接hiveserver2。

beeline错误2

beeline链接hiveserver2成功，可是执行sql语句报错，错误以下：

0: jdbc:hive2://huatec01:10000> show databases;
Error: java.io.IOException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:user.name%7D (state=,code=0)

解决方法：

修改hive-site.xml中的hive.exec.local.scratchdir属性值。将${system:user.name}改成${user.name},以下所示：

<property>
    <name>hive.exec.local.scratchdir</name>
    <value>/huatec/apache-hive-2.1.1-bin/tmp/${user.name}</value>
    <description>Local scratch space for Hive jobs</description>
  </property>

从新使用beeline链接hiveserver2,执行sql语句，以下图所示：

Java编程操做MetaStore

用java，python等程序实现经过jdbc等驱动的访问hive，这须要咱们启动hiveserver2。若是咱们可以使用beeline控制hiveserver2,那么咱们毫无疑问是能够经过Java代码来访问hive了。

若是beeline控制hiveserver2出现错误，也没法执行sql，那么请先解决这方面的错误，而后再进行代码编程。

准备工做

新建maven java app项目，而后添加Hive依赖，咱们编写junitc俄式代码，因此也添加junit依赖，以下所示：

<!--junit-->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <!--hive jdbc-->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>2.1.1</version>
        </dependency>

编写测试类

完整的类代码以下：

package com.huatec.hive;

import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.sql.*;
/**
 * Created by zhusheng on 2018/1/2.
 */
public class HiveJDBC {
    private static String driverName = "org.apache.hive.jdbc.HiveDriver";
    private static String url = "jdbc:hive2://huatec01:10000/hive_jdbc_test";
    private static String user = "root";
    private static String password = "root";

    private static Connection conn = null;
    private static Statement stmt = null;
    private static ResultSet rs = null;

    // 加载驱动、建立链接
    @Before
    public void init() throws Exception {
        Class.forName(driverName);
        conn = DriverManager.getConnection(url,user,password);
        stmt = conn.createStatement();
    }

    // 建立数据库
    @Test
    public void createDatabase() throws Exception {
        String sql = "create database hive_jdbc_test";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 查询全部数据库
    @Test
    public void showDatabases() throws Exception {
        String sql = "show databases";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getString(1));
        }
    }

    // 建立表
    @Test
    public void createTable() throws Exception {
        String sql = "create table emp(\n" +
                "empno int,\n" +
                "ename string,\n" +
                "job string,\n" +
                "mgr int,\n" +
                "hiredate string,\n" +
                "sal double,\n" +
                "comm double,\n" +
                "deptno int\n" +
                ")\n" +
                "row format delimited fields terminated by '\\t'";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 查询全部表
    @Test
    public void showTables() throws Exception {
        String sql = "show tables";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getString(1));
        }
    }

    // 查看表结构
    @Test
    public void descTable() throws Exception {
        String sql = "desc emp";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getString(1) + "\t" + rs.getString(2));
        }
    }

    // 加载数据
    @Test
    public void loadData() throws Exception {
        String filePath = "/home/hadoop/data/emp.txt";
        String sql = "load data local inpath '" + filePath + "' overwrite into table emp";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 查询数据
    @Test
    public void selectData() throws Exception {
        String sql = "select * from emp";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        System.out.println("员工编号" + "\t" + "员工姓名" + "\t" + "工做岗位");
        while (rs.next()) {
            System.out.println(rs.getString("empno") + "\t\t" + rs.getString("ename") + "\t\t" + rs.getString("job"));
        }
    }

    // 统计查询（会运行mapreduce做业）
    @Test
    public void countData() throws Exception {
        String sql = "select count(1) from emp";
        System.out.println("Running: " + sql);
        rs = stmt.executeQuery(sql);
        while (rs.next()) {
            System.out.println(rs.getInt(1) );
        }
    }

    // 删除数据库
    @Test
    public void dropDatabase() throws Exception {
        String sql = "drop database if exists hive_jdbc_test";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 删除数据库表
    @Test
    public void deopTable() throws Exception {
        String sql = "drop table if exists emp";
        System.out.println("Running: " + sql);
        stmt.execute(sql);
    }

    // 释放资源
    @After
    public void destory() throws Exception {
        if ( rs != null) {
            rs.close();
        }
        if (stmt != null) {
            stmt.close();
        }
        if (conn != null) {
            conn.close();
        }
    }
}

须要注意的是，由于hive默认只有一个数据库default，从前面的beeline访问hiveserver2的时候咱们也能够看出。若是咱们须要对默认数据库进行操做的话，咱们的数据库链接为：

private static String url = "jdbc:hive2://huatec01:10000/default";

这里我写了一个建立数据库的测试方法，其它的Sql操做都是基于该数据库的，因此我修改个人数据库链接为我新建的数据库。

private static String url = "jdbc:hive2://huatec01:10000/hive_jdbc_test";

测试函数比较多，我本地进行了测试都是能够成功的，我选取其中的createTable测试函数为例，截图以下：

做者：Jusen 连接：https://www.jianshu.com/p/4ef28607fc04 来源：简书简书著做权归做者全部，任何形式的转载都请联系做者得到受权并注明出处。