大数据技术原理与应用 - 实验二熟悉经常使用的 HDFS 操做

时间 2019-11-09

标签数据技术原理应用实验熟悉经常使用 hdfs 栏目 Hadoop 繁體版

原文原文链接

1、实验目的

（1）理解HDFS在Hadoop体系结构中的角色。（2）熟练使用HDFS操做经常使用的Shell命令。（3）熟悉HDFS操做经常使用的Java API。java

2、实验平台

操做系统：Linux。 Hadoop 版本：2.7.3 或以上版本。 JDK 版本：1.7 或以上版本。 Java IDE：IDEAshell

3、实验内容和要求

（1）编程实现如下指定功能，并利用 Hadopp 提供的 Shell 命令完成相同的任务。

1. 向HDFS中上传任意文本文件，若是指定的文件在 HDFS 中已经存在，由用户指定是追加到原有文件末尾仍是覆盖原有的文件。

shell：apache

hadoop fs -put /User/Binguner/Desktop/test.txt /test
hadoop fs -appendToFile /User/Binguner/Desktop/test.txt /test/test.txt
hadoop fs -copyFromLocal -f /User/Binguner/Desktop/test.txt / input/test.txt
复制代码

/** * @param fileSystem * @param srcPath 本地文件地址 * @param desPath 目标文件地址 */
    private static void test1(FileSystem fileSystem,Path srcPath, Path desPath){
        try {
            if (fileSystem.exists(new Path("/test/test.txt"))){
                System.out.println("Do you want to overwrite the existed file? ( y / n )");
                if (new Scanner(System.in).next().equals("y")){
                    fileSystem.copyFromLocalFile(false,true,srcPath,desPath);
                }else {
                    FileInputStream inputStream = new FileInputStream(srcPath.toString());
                    FSDataOutputStream outputStream  = fileSystem.append(new Path("/test/test.txt"));
                    byte[] bytes = new byte[1024];
                    int read = -1;
                    while ((read = inputStream.read(bytes)) > 0){
                        outputStream.write(bytes,0,read);
                    }
                    inputStream.close();
                    outputStream.close();
                }
            }else {
                fileSystem.copyFromLocalFile(srcPath,desPath);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
复制代码

运行结果：编程

HDFS 中原来到文件列表：
缓存

第一次运行：app

HDFS 中文件列表：dom

第二次运行：oop

此时 HDFS 中的目录：学习

2. 从HDFS中下载指定文件，若是本地文件与要下载的文件名称相同，则自动对下载的文件重命名。

shell:大数据

hadoop fs -copyToLocal /input/test.txt /User/binguner/Desktop/test.txt
复制代码

/** * @param fileSystem * @param remotePath HDFS 中文件的地址 * @param localPath 本地要保存的文件的地址 */
    private static void test2(FileSystem fileSystem,Path remotePath, Path localPath){
        try {
            if (fileSystem.exists(remotePath)){
                fileSystem.copyToLocalFile(remotePath,localPath);
            }else {
                System.out.println("Can't find this file in HDFS!");
            }
        } catch (FileAlreadyExistsException e){
            try {
                System.out.println(localPath.toString());
                fileSystem.copyToLocalFile(remotePath,new Path("src/test"+ new Random().nextInt()+".txt"));
            } catch (IOException e1) {
                e1.printStackTrace();
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
复制代码

执行前本地目录：

第一次执行：

第二次执行：

3. 将HDFS中指定文件的内容输出到终端中。

shell:

hadoop fs -cat /test/test.txt
复制代码

/** * @param fileSystem * @param remotePath 目标文件地址 */
    private static void test3(FileSystem fileSystem,Path remotePath){
        try {
            FSDataInputStream inputStream= fileSystem.open(remotePath);
            BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
            String line;
            while ((line = bufferedReader.readLine()) != null){
                System.out.println(line);
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
复制代码

运行结果：

4. 显示HDFS中指定的文件的读写权限、大小、建立时间、路径等信息。

shell:

hadoop fs -ls -h /test/test.txt
复制代码

/** * @param fileSystem * @param remotePath 目标文件地址 */
    private static void test4(FileSystem fileSystem, Path remotePath){
        try {
            FileStatus[] fileStatus = fileSystem.listStatus(remotePath);
            for (FileStatus status : fileStatus){
                System.out.println(status.getPermission());
                System.out.println(status.getBlockSize());
                System.out.println(status.getAccessTime());
                System.out.println(status.getPath());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
复制代码

运行结果：

5. 给定HDFS中某一个目录，输出该目录下的全部文件的读写权限、大小、建立时间、路径等信息，若是该文件是目录，则递归输出该目录下全部文件相关信息。

shell:

hadoop fs -lsr -h /
复制代码

/** * @param fileSystem * @param remotePath 目标文件地址 */
    private static void test5(FileSystem fileSystem, Path remotePath){
        try {
            RemoteIterator<LocatedFileStatus> iterator = fileSystem.listFiles(remotePath,true);
            while (iterator.hasNext()){
                FileStatus status = iterator.next();
                System.out.println(status.getPath());
                System.out.println(status.getPermission());
                System.out.println(status.getLen());
                System.out.println(status.getModificationTime());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

    }
复制代码

运行结果：

6. 提供一个HDFS内的文件的路径，对该文件进行建立和删除操做。若是文件所在目录不存在，则自动建立目录。

shell:

hadoop fs -touchz /test/test.txt
hadoop fs -mkdir /test
hadoop fs -rm -R /test/text.txt
复制代码

/** * @param fileSystem * @param remoteDirPath 目标文件夹地址 * @param remoteFilePath 目标文件路径 */
    private static void test6(FileSystem fileSystem, Path remoteDirPath, Path remoteFilePath){
        try {
            if (fileSystem.exists(remoteDirPath)){
                System.out.println("Please choose your option: 1.create. 2.delete");
                int i = new Scanner(System.in).nextInt();
                switch (i){
                    case 1:
                        fileSystem.create(remoteFilePath);
                        break;
                    case 2:
                        fileSystem.delete(remoteDirPath,true);
                        break;
                }
            }else {
                fileSystem.mkdirs(remoteDirPath);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
复制代码

第一次执行前：

第一次执行：

第一次执行后自动建立文件目录

第二次执行，选择建立文件：

第三次执行，选择删除文件：

7. 提供一个 HDFS 的文件的路径，对该文件进行建立和删除操做。建立目录时，若是该目录文件所在目录不存在则自动建立相应目录；删除目录时，由用户指定该目录不为空时是否还删除该目录。

shell:

hadoop fs -touchz /test/test.txt
hadoop fs -mkdir /test
hadoop fs -rm -R /test/text.txt
复制代码

/** * @param fileSystem * @param remotePath 目标文件夹地址 */
    private static void test7(FileSystem fileSystem, Path remotePath){
        try {
            if (!fileSystem.exists(remotePath)){
                System.out.println("Can't find this path, the path will be created automatically");
                fileSystem.mkdirs(remotePath);
                return;
            }
            System.out.println("Do you want to delete this dir? ( y / n )");
            if (new Scanner(System.in).next().equals("y")){
                FileStatus[] iterator = fileSystem.listStatus(remotePath);
                if (iterator.length != 0){
                    System.out.println("There are some files in this dictionary, do you sure to delete all? (y / n)");
                    if (new Scanner(System.in).next().equals("y")){
                        if (fileSystem.delete(remotePath,true)){
                            System.out.println("Delete successful");
                            return;
                        }
                    }
                }
                if (fileSystem.delete(remotePath,true)){
                    System.out.println("Delete successful");
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
复制代码

执行前的 HDFS 文件列表：

第一次执行（删除全部文件）：

此时 HDFS 中的文件列表：

再次运行程序，自动建立文件夹：

8. 向 HDFS 中指定的文件追加内容，由用户指定追加到原有文件的开头或结尾。

shell:

hadoop fs -get text.txt
cat text.txt >> local.txt
hadoop fs -copyFromLocal -f text.txt text.txt
复制代码

/** * @param fileSystem * @param remotePath HDFS 中文件到路径 * @param localPath 本地文件路径 */
    private static void test8(FileSystem fileSystem,Path remotePath, Path localPath){
        try {
            if (!fileSystem.exists(remotePath)){
                System.out.println("Can't find this file");
                return;
            }
            System.out.println("input 1 or 2 , add the content to the remote file's start or end");
            switch (new Scanner(System.in).nextInt()){
                case 1:
                    fileSystem.moveToLocalFile(remotePath, localPath);
                    FSDataOutputStream fsDataOutputStream = fileSystem.create(remotePath);
                    FileInputStream fileInputStream = new FileInputStream("/Users/binguner/IdeaProjects/HadoopDemo/src/test2.txt");
                    FileInputStream fileInputStream1 = new FileInputStream("/Users/binguner/IdeaProjects/HadoopDemo/src/test.txt");
                    byte[] bytes = new byte[1024];
                    int read = -1;
                    while ((read = fileInputStream.read(bytes)) > 0) {
                        fsDataOutputStream.write(bytes,0,read);
                    }
                    while ((read = fileInputStream1.read(bytes)) > 0){
                        fsDataOutputStream.write(bytes,0,read);
                    }
                    fileInputStream.close();
                    fileInputStream1.close();
                    fsDataOutputStream.close();
                    break;
                case 2:
                    FileInputStream inputStream = new FileInputStream("/Users/binguner/IdeaProjects/HadoopDemo/"+localPath.toString());
                    FSDataOutputStream outputStream = fileSystem.append(remotePath);
                    byte[] bytes1 = new byte[1024];
                    int read1 = -1;
                    while ((read1 = inputStream.read(bytes1)) > 0){
                        outputStream.write(bytes1,0,read1);
                    }
                    inputStream.close();
                    outputStream.close();
                    break;
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
复制代码

执行前 HDFS 中文件的内容：

第一次执行，加载文件内容到原有文件开头：

第二次执行，加载文件内容到原有文件末尾：

9. 删除 HDFS 中指定的文件。

shell:

hadoop fs -rm -R /test/test.txt
复制代码

private static void test9(FileSystem fileSystem,Path remotePath){
        try {
            if(fileSystem.delete(remotePath,true)){
                System.out.println("Delete success");
            }else {
                System.out.println("Delete failed");
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
复制代码

原来 HDFS 中到目录结构：

执行删除操做：

10. 在 HDFS 中将文件从源路径移动到目的路径。

shell:

hadoop fs -mv /test/test.txt /test2
复制代码

/** * @param fileSystem * @param oldRemotePath old name * @param newRemotePath new name */
    private static void test10(FileSystem fileSystem, Path oldRemotePath, Path newRemotePath){
        try {
            if (fileSystem.rename(oldRemotePath,newRemotePath)){
                System.out.println("Rename success");
            }else {
                System.out.println("Rename failed");
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
复制代码

文件原来的名称：

执行修改操纵：

（2）编程实现一个类 `MyFSDataInputStream`，该类继承`org.apache.hadoop.fs.FSDataInputStream`，要求以下：

实现按行读取HDFS中指定文件的方法 readLine()，若是读到文件末尾，则返回空，不然返回文件一行的文本。
实现缓存功能，即利用 MyFSDataInputStream 读取若干字节数据时，首先查找缓存，若是缓存中所需数据，则直接由缓存提供，不然向 HDFS 读取数据。

import org.apache.hadoop.fs.*;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;

public class MyFSDataInputStream extends FSDataInputStream {

    private static MyFSDataInputStream myFSDataInputStream;
    private static InputStream inputStream;

    private MyFSDataInputStream(InputStream in) {
        super(in);
        inputStream = in;
    }

    public static MyFSDataInputStream getInstance(InputStream inputStream){
        if (null == myFSDataInputStream){
            synchronized (MyFSDataInputStream.class){
                if (null == myFSDataInputStream){
                    myFSDataInputStream = new MyFSDataInputStream(inputStream);
                }
            }
        }
        return myFSDataInputStream;
    }

    public static String readline(FileSystem fileStatus){
        try {
// FSDataInputStream inputStream = fileStatus.open(remotePath);
            BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
            String line = null;
            if ((line = bufferedReader.readLine()) != null){
                bufferedReader.close();
                inputStream.close();
                return line;
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

}
复制代码

运行结果：

（3）查看Java帮助手册或其它资料，用 `java.net.URL` 和 `org.apache.hadoop.fs.FsURLStreamHandlerFactory` 编程完成输出HDFS中指定文件的文本到终端中。

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.fs.Path;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;

public class ShowTheContent {

    private Path remotePath;
    private FileSystem fileSystem;

    public ShowTheContent(FileSystem fileSystem, Path remotePath){
        this.fileSystem = fileSystem;
        this.remotePath = remotePath;
    }

    public void show(){
        try {
            URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
            InputStream inputStream = new URL("hdfs","localhost",9000,remotePath.toString()).openStream();
            BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
            String line = null;
            while ((line = bufferedReader.readLine()) != null){
                System.out.println(line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

}
复制代码

输出结果：

欢迎关注本文做者：

扫码关注并回复「干货」，获取我整理的千G Android、iOS、JavaWeb、大数据、人工智能等学习资源。

大数据技术原理与应用 - 实验二 熟悉经常使用的 HDFS 操做

1、实验目的

2、实验平台

3、实验内容和要求

（1）编程实现如下指定功能，并利用 Hadopp 提供的 Shell 命令完成相同的任务。

1. 向HDFS中上传任意文本文件，若是指定的文件在 HDFS 中已经存在，由用户指定是追加到原有文件末尾仍是覆盖原有的文件。

2. 从HDFS中下载指定文件，若是本地文件与要下载的文件名称相同，则自动对下载的文件重命名。

3. 将HDFS中指定文件的内容输出到终端中。

4. 显示HDFS中指定的文件的读写权限、大小、建立时间、路径等信息。

5. 给定HDFS中某一个目录，输出该目录下的全部文件的读写权限、大小、建立时间、路径等信息，若是该文件是目录，则递归输出该目录下全部文件相关信息。

6. 提供一个HDFS内的文件的路径，对该文件进行建立和删除操做。若是文件所在目录不存在，则自动建立目录。

7. 提供一个 HDFS 的文件的路径，对该文件进行建立和删除操做。建立目录时，若是该目录文件所在目录不存在则自动建立相应目录；删除目录时，由用户指定该目录不为空时是否还删除该目录。

8. 向 HDFS 中指定的文件追加内容，由用户指定追加到原有文件的开头或结尾。

9. 删除 HDFS 中指定的文件。

10. 在 HDFS 中将文件从源路径移动到目的路径。

（2）编程实现一个类 MyFSDataInputStream，该类继承org.apache.hadoop.fs.FSDataInputStream，要求以下：

（3）查看Java帮助手册或其它资料，用 java.net.URL 和 org.apache.hadoop.fs.FsURLStreamHandlerFactory 编程完成输出HDFS中指定文件的文本到终端中。

大数据技术原理与应用 - 实验二熟悉经常使用的 HDFS 操做

（2）编程实现一个类 `MyFSDataInputStream`，该类继承`org.apache.hadoop.fs.FSDataInputStream`，要求以下：

（3）查看Java帮助手册或其它资料，用 `java.net.URL` 和 `org.apache.hadoop.fs.FsURLStreamHandlerFactory` 编程完成输出HDFS中指定文件的文本到终端中。