基于【CentOS-7+ Ambari 2.7.0 + HDP 3.0】搭建HAWQ数据仓库04 —— 安装HAWQ插件PXF3.3.0.0

1、 安装PXF3.3.0.0,这里所安装的pxf的包文件都包含在apache-hawq-rpm-2.3.0.0-incubating.tar.gz里面
下面步骤都是以root身份执行
这里注意,pxf插件要用到tomcat服务,必须使用安装包里面的7.0.62, 不能安装或升级为 tomcat8,这会形成依赖的catalina.jar包 版本不匹配,以至pxf启动!node

安装时因为pxf的包都是el6版本的,可是我用的centos7,因此rpm带参数“--nodeps”以忽略RPM命令对依赖包的检测。web

cd /opt/gpadmin/hawq_rpm_packages

rpm -ivh  apache-tomcat-7.0.62-el6.noarch.rpm 
rpm -ivh --nodeps pxf-service-3.3.0.0-1.el6.noarch.rpm
rpm -ivh --nodeps pxf-hdfs-3.3.0.0-1.el6.noarch.rpm 
rpm -ivh --nodeps pxf-hive-3.3.0.0-1.el6.noarch.rpm 
rpm -ivh --nodeps pxf-hbase-3.3.0.0-1.el6.noarch.rpm 
rpm -ivh --nodeps pxf-jdbc-3.3.0.0-1.el6.noarch.rpm 
rpm -ivh --nodeps pxf-json-3.3.0.0-1.el6.noarch.rpm
rpm -ivh --nodeps pxf-3.3.0.0-1.el6.noarch.rpm

2、 配置PXFapache

1,因为hadoop为HDP版本,因此使用hdp相关的jar包配置json

[root@ep-bd01 ~]cp  /etc/pxf/conf/pxf-privatehdp.classpath  /etc/pxf/conf/pxf-private.classpath
[root@ep-bd01 ~]cp  /etc/pxf/conf/pxf-profiles.xml   /etc/pxf/conf/pxf-profiles.default.xml

2,修改pxf目录全部者:vim

[root@ep-bd01 ~] chown -R pxf:pxf /opt/pxf-3.3.0.0
[root@ep-bd01 ~] chown -R pxf:pxf /tmp/logs

3,创建软链接目录:centos

[root@ep-bd01 ~] ln -s /etc/pxf-3.3.0.0/conf /opt/pxf-3.3.0.0/conf
[root@ep-bd01 ~] ln -s /usr/lib/pxf  /opt/pxf-3.3.0.0/lib

4,创建一个init过程须要的目录和template文件api

[root@ep-bd01 ~] mkdir /opt/pxf-3.3.0.0/conf-templates
[root@ep-bd01 ~] cp /opt/pxf/conf/pxf-privatehdp.classpath /opt/pxf/conf-templates/pxf-private-hdp.classpath.template

5,修改pxf-service文件为pxf,由于init过程须要创建同名的目录,同时须要修改init.d目录中的连接:浏览器

[root@ep-bd01 ~] mv /opt/pxf/pxf-service  /opt/pxf/pxf
[root@ep-bd01 ~] unlink /etc/init.d/pxf-service 
[root@ep-bd01 ~] ln -s /opt/pxf/pxf   /etc/init.d/pxf-service

6,修改tomcat/conf目录的权限,同时在pxf目录中创建连接tomcat

[root@ep-bd01 ~] chmod 755 -R /opt/apache-tomcat/conf/
[root@ep-bd01 ~] ln -s /opt/apache-tomcat /opt/pxf-3.3.0.0/apache-tomcat

7,编辑/etc/pxf/conf/pxf-env.sh,修改 PARENT_SCRIPT_DIR和LD_LIBRARY_PATH的值oracle

[root@ep-bd01 ~] vim /etc/pxf/conf/pxf-env.sh 
export PARENT_SCRIPT_DIR=/opt/pxf-3.3.0.0
export PXF_HOME=/opt/pxf-3.3.0.0 export LD_LIBRARY_PATH
=/usr/hdp/current/hadoop-client/lib/native:${LD_LIBRARY_PATH}

8,修改pxf脚本文件,设置PXF_HOME

[root@ep-bd01 ~] vim /opt/pxf-3.3.0.0/pxf
export PARENT_SCRIPT_DIR=/opt/pxf-3.3.0.0
export PXF_HOME=/opt/pxf-3.3.0.0

9,编辑/etc/pxf/conf/pxf-public.classpath,添加一系列的jar包

[root@ep-bd01 pxf-3.3.0.0]# vim /etc/pxf-3.3.0.0/conf/pxf-public.classpath
/usr/hdp/current/hadoop-client/lib/commons-beanutils-1.9.3.jar
/usr/hdp/current/hadoop-client/lib/commons-cli-1.2.jar
/usr/hdp/current/hadoop-client/lib/commons-codec-1.11.jar
/usr/hdp/current/hadoop-client/lib/commons-collections-3.2.2.jar
/usr/hdp/current/hadoop-client/lib/commons-compress-1.4.1.jar
/usr/hdp/current/hadoop-client/lib/commons-configuration2-2.1.1.jar
/usr/hdp/current/hadoop-client/lib/commons-io-2.5.jar
/usr/hdp/current/hadoop-client/lib/commons-lang-2.6.jar
/usr/hdp/current/hadoop-client/lib/commons-lang3-3.4.jar
/usr/hdp/current/hadoop-client/lib/commons-logging-1.1.3.jar
/usr/hdp/current/hadoop-client/lib/commons-math3-3.1.1.jar
/usr/hdp/current/hadoop-client/lib/commons-net-3.6.jar
/usr/hdp/current/hadoop-client/lib/jersey-core-1.19.jar
/usr/hdp/current/hadoop-client/lib/jersey-json-1.19.jar
/usr/hdp/current/hadoop-client/lib/jersey-server-1.19.jar
/usr/hdp/current/hadoop-client/lib/jersey-servlet-1.19.jar
/usr/hdp/current/hadoop-client/lib/jsr311-api-1.1.1.jar
/usr/hdp/current/hadoop-client/lib/woodstox-core-5.0.3.jar
/usr/hdp/current/hadoop-client/lib/stax2-api-3.1.4.jar
/usr/hdp/current/hadoop-client/lib/htrace-core4-4.1.0-incubating.jar
/usr/hdp/current/hadoop-client/lib/re2j-1.1.jar
/usr/hdp/3.0.0.0-1634/hbase/lib/atlas-hbase-plugin-impl/commons-configuration-1.10.jar
/usr/hdp/current/hadoop-hdfs-datanode/hadoop-hdfs-rbf.jar
/usr/hdp/current/hadoop-hdfs-datanode/hadoop-hdfs-nfs.jar
/usr/hdp/current/hadoop-hdfs-datanode/hadoop-hdfs-native-client.jar
/usr/hdp/current/hadoop-hdfs-datanode/hadoop-hdfs-httpfs.jar
/usr/hdp/current/hadoop-hdfs-datanode/hadoop-hdfs-client-3.1.0.3.0.0.0-1634.jar
/opt/pxf-3.3.0.0/lib/pxf-service-3.3.0.0.jar
/opt/pxf-3.3.0.0/lib/pxf-api-3.3.0.0.jar
:wq

 10,复制pxf-profiles.xml 为pxf-profiles-default.xml并编辑,添加profile配置

[root@ep-bd05 pxf-3.3.0.0]# cp /etc/pxf/conf/pxf-profiles.xml   /etc/pxf/conf/pxf-profiles-default.xml 
[root@ep-bd05 pxf-3.3.0.0]# vim /etc/pxf/conf/pxf-profiles-default.xml 
<profiles>
  <profile>
        <name>HdfsTextSimple</name>
        <description>This profile is suitable for using when reading delimited single line records from plain text files
            on HDFS
        </description>
        <plugins>
            <fragmenter>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
            <accessor>org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor</accessor>
            <resolver>org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</resolver>
        </plugins>
    </profile>
    <profile>
        <name>HdfsTextMulti</name>
        <description>This profile is suitable for using when reading delimited single or multi line records (with quoted
            linefeeds) from plain text files on HDFS. It is not splittable (non parallel) and slower than HdfsTextSimple.
        </description>
        <plugins>
            <fragmenter>org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter</fragmenter>
            <accessor>org.apache.hawq.pxf.plugins.hdfs.QuotedLineBreakAccessor</accessor>
            <resolver>org.apache.hawq.pxf.plugins.hdfs.StringPassResolver</resolver>
        </plugins>
    </profile>
</profiles>

 

3、初始化pxf,必须使用pxf用户

1,设置pxf的密码

passwd pxf

 

2,初始化,须要使用用户pxf

[root@ep-bd03 pxf]# source /etc/pxf/conf/pxf-env.sh
[root@ep-bd03 pxf]# sudo
-u pxf service pxf-service init


Generating /opt/pxf-3.3.0.0/conf/pxf-private.classpath file from /opt/pxf-3.3.0.0/conf-templates/pxf-private-hdp.classpath.template ...

cp /opt/pxf/pxf-service/webapps/pxf/WEB-INF/lib/*.jar  /opt/pxf/lib/

4、启动PXF service

1,启动:

sudo -u pxf service pxf-service start

Checking if tomcat is up and running...
tomcat not responding, re-trying after 1 second (attempt number 1)
Checking if PXF webapp is up and running...
PXF webapp is listening on port 51200

 2,测试:

使用pxf插件访问已经事先从oracle导入到HDFS上的数据(使用了sqoop的--compress选项,是gz压缩格式,可是HdfsTextSimple能够直接访问),下面是创建hawq外部表的命令,注意路径中的星号。

drop external table ext.yx_bw;
create external table ext.yx_bw (occur_time date, ......    ) 
location ('pxf://192.168.58.15:51200/var/data/ext/yx_bw/*?profile=hdfstextsimple') format 'text'(delimiter ',' null '');

**注意** ,此处的主机地址,我直接使用的是主机的地址,若是使用主机名称则hawq访问失败,据我观察应该是没有正确转换,一直没能解决此问题,若是哪位大侠知道请必定不吝赐教,先谢过了!若是地址使用location ('pxf://bd05:51200/var/data/ext/yx_bw/*?profile=hdfstextsimple') ,外部表能够创建,可是访问数据时显示以下错误,且没有详细信息,pxf服务的log也找不到访问失败的记录!

epbd=> select * from ext.yx_bw;      
ERROR:  remote component error (0): (libchurl.c:897)

 

下面是系统中的/etc/hosts文件和/etc/host.conf文件,因为本集群能够访问外网,能够看到nslookup返回了错误地址,可是ping和curl访问都是正确的。

root@ep-bd05 pg_log]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.58.11  ep-bd01 bd01
192.168.58.12  ep-bd02 bd02
192.168.58.13  ep-bd03 bd03
192.168.58.14  ep-bd04 bd04
192.168.58.15  ep-bd05 bd05
[root@ep-bd05 pg_log]# cat /etc/host.conf 
multi off
[root@ep-bd05 pg_log]# nslookup bd01
Server:         211.137.160.5
Address:        211.137.160.5#53

Non-authoritative answer:
Name:   bd01
Address: 211.137.170.246

[root@ep-bd05 pg_log]# ping bd01    
PING ep-bd01 (192.168.58.11) 56(84) bytes of data.
64 bytes from ep-bd01 (192.168.58.11): icmp_seq=1 ttl=64 time=0.156 ms
64 bytes from ep-bd01 (192.168.58.11): icmp_seq=2 ttl=64 time=0.160 ms
^C
--- ep-bd01 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.156/0.158/0.160/0.002 ms
[root@ep-bd05 pg_log]# curl http://bd01:51200/pxf/v0
Wrong version v0, supported version is v15
[root@ep-bd05 pg_log]#

 

5、可选操做:

1,修改/opt/pxf/pxf-service/conf/catalina.properties,修改 base.shutdown.port 

#base.shutdown.port=-1
base.shutdown.port=8005

 

2,修改/opt/pxf/pxf-service/conf/tomcat-users.xml,给用户tomcat添加角色manager-gui ,以即可以在浏览器中管理webapps

[root@ep-bd01 ~] vim /opt/pxf/pxf-service/conf/tomcat-users.xml 
<role rolename="tomcat"/>
<role rolename="manager-gui"/>
<user username="tomcat" password="tomcat" roles="tomcat,manager-gui"/>
<user username="both" password="tomcat" roles="tomcat,role1"/>
<user username="role1" password="tomcat" roles="role1"/>
相关文章
相关标签/搜索