Spring Hadoop官网地址以下:html
Spring Hadoop简化了Apache Hadoop,提供了一个统一的配置模型以及简单易用的API来使用HDFS、MapReduce、Pig以及Hive。还集成了其它Spring生态系统项目,如Spring Integration和Spring Batch.。spring
特色:shell
Spring Hadoop2.5的官方文档及API地址:apache
https://docs.spring.io/spring-hadoop/docs/2.5.0.RELEASE/reference/html/
https://docs.spring.io/spring-hadoop/docs/2.5.0.RELEASE/api/api
建立一个maven工程,配置依赖以下:安全
<repositories> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </repository> </repositories> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <hadoop.version>2.6.0-cdh5.7.0</hadoop.version> </properties> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> <scope>provided</scope> </dependency> <!-- 添加UserAgent解析的依赖 --> <dependency> <groupId>com.kumkee</groupId> <artifactId>UserAgentParser</artifactId> <version>0.0.1</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.10</version> <scope>test</scope> </dependency> <!-- 添加Spring Hadoop的依赖 --> <dependency> <groupId>org.springframework.data</groupId> <artifactId>spring-data-hadoop</artifactId> <version>2.5.0.RELEASE</version> </dependency> </dependencies> <!-- mvn assembly:assembly --> <build> <plugins> <plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <archive> <manifest> <mainClass></mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> </plugins> </build> </project>
在工程中建立resource目录以及配置文件,配置文件的名能够自定义,配置文件中增长以下内容:bash
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hdp="http://www.springframework.org/schema/hadoop" xmlns:context="http://www.springframework.org/schema/context" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd"> <!-- 加载属性文件 --> <context:property-placeholder location="application.properties"/> <hdp:configuration id="hadoopConfiguration"> <!-- 服务器的url --> fs.defaultFS=${spring.hadoop.fsUri} </hdp:configuration> <!-- 装配文件系统bean以及操做用户 --> <hdp:file-system id="fileSystem" configuration-ref="hadoopConfiguration" user="root"/> </beans>
而后再建立一个属性文件application.properties(文件名称可自定义),把一些容易改变的配置信息配置在属性文件下,例如我这里是将服务器的url配置在属性文件里,内容以下:服务器
spring.hadoop.fsUri=hdfs://192.168.77.128:8020
完成以上操做以后,咱们的Spring Hadoop开发环境就算是搭建完成了,毕竟使用Maven就是方便。app
接下来咱们来建立个测试类,测试一下是否可以正常对HDFS文件系统进行操做:
package org.zero01.spring; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.junit.After; import org.junit.Before; import org.junit.Test; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; import java.io.IOException; /** * @program: hadoop-train * @description: 使用Spring Hadoop来访问HDFS文件系统 * @author: 01 * @create: 2018-04-04 17:39 **/ public class SpringHadoopApp { private ApplicationContext ctx; private FileSystem fileSystem; @Before public void setUp() { ctx = new ClassPathXmlApplicationContext("beans.xml"); fileSystem = (FileSystem) ctx.getBean("fileSystem"); } @After public void tearDown() throws IOException { ctx = null; fileSystem.close(); } /** * 在HDFS上建立一个目录 * @throws Exception */ @Test public void testMkdirs()throws Exception{ fileSystem.mkdirs(new Path("/SpringHDFS/")); } }
以上的代码是执行成功的,而后到服务器上查看一下根目录下是否有SpringHDFS这个目录:
[root@hadoop000 ~]# hdfs dfs -ls / Found 7 items -rw-r--r-- 3 root supergroup 2769741 2018-04-02 21:13 /10000_access.log drwxr-xr-x - root supergroup 0 2018-04-04 17:50 /SpringHDFS drwxr-xr-x - root supergroup 0 2018-04-02 21:22 /browserout drwxr-xr-x - root supergroup 0 2018-04-02 20:29 /data drwxr-xr-x - root supergroup 0 2018-04-02 20:31 /logs drwx------ - root supergroup 0 2018-04-02 20:39 /tmp drwxr-xr-x - root supergroup 0 2018-04-02 20:39 /user [root@hadoop000 ~]# hdfs dfs -ls /SpringHDFS [root@hadoop000 ~]#
能够看到SpringHDFS目录已经成功被建立了,这就表明咱们配置的工程没有问题。
既然建立目录没有问题,咱们就再来写一个测试方法,用来读取HDFS上某个文件的内容,代码以下:
/** * 读取HDFS上的文件内容 * @throws Exception */ @Test public void testText()throws Exception{ FSDataInputStream in = fileSystem.open(new Path("/browserout/part-r-00000")); IOUtils.copyBytes(in, System.out, 1024); in.close(); }
以上的代码执行成功,控制台输出结果以下:
Chrome 2775 Firefox 327 MSIE 78 Safari 115 Unknown 6705
读和写都没有问题了,这下就能愉快的在工程里使用Spring Hadoop简化咱们的开发了。
以上介绍了Spring Hadoop访问HDFS,接下来再简单介绍一下使用Spring Boot访问HDFS,使用Spring Boot会更加简单。
首先须要在pom.xml文件中,加入Spring Boot的依赖:
<!-- 添加Spring Boot的依赖 --> <dependency> <groupId>org.springframework.data</groupId> <artifactId>spring-data-hadoop-boot</artifactId> <version>2.5.0.RELEASE</version> </dependency>
package org.zero01.spring; import org.apache.hadoop.fs.FileStatus; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.CommandLineRunner; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.data.hadoop.fs.FsShell; /** * @program: hadoop-train * @description: 使用spring boot来访问HDFS * @author: 01 * @create: 2018-04-04 18:45 **/ @SpringBootApplication public class SpringBootHDFSApp implements CommandLineRunner { @Autowired FsShell fsShell; // 用于执行hdfs shell命令的对象 public void run(String... strings) throws Exception { // 查看根目录下的全部文件 for (FileStatus fileStatus : fsShell.ls("/")) { System.out.println("> " + fileStatus.getPath()); } } public static void main(String[] args) { SpringApplication.run(SpringBootHDFSApp.class, args); } }
控制台输出以下:
> hdfs://192.168.77.128:8020/ > hdfs://192.168.77.128:8020/10000_access.log > hdfs://192.168.77.128:8020/SpringHDFS > hdfs://192.168.77.128:8020/browserout > hdfs://192.168.77.128:8020/data > hdfs://192.168.77.128:8020/logs > hdfs://192.168.77.128:8020/tmp > hdfs://192.168.77.128:8020/user