1、基础apache
1.了解Java、Linux操做系统相关知识服务器
2.如需精进,应为水平要达到必定标准,可以阅读国外相关技术网站,eg:http://hadoop.apache.org/oracle
2、什么是Hadoopapp
照搬官网并略做翻译:框架
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.分布式
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.ide
The project includes these modules:工具
Hadoop系统是一个稳定、分布式计算相关开源软件系统。oop
Hadoop框架经过使用集群中的简单程序模型支持大数据的分布式计算,它能够从单台计算机拓展到上千台提供本地计算和存储功能的服务器组成的集群。与以往系统使用硬件保证高可用方式不一样,Hadoop在应用层能够检测、处理异常,所以经过集群顶层的服务保证高可用性。大数据
Hadoop主要分为如下模块:
(1)Hadoop Common:支持其余模块的公共工具
(2)HDFS:分布式文件系统,用于提供系统数据存储服务(至关于oracle的存储模块)
(3)Hadoop YARN:工做与资源调度模块,至关于基于HDFS的操做系统
(4)Hadoop MapReduce:基于YARN系统的分布式计算方法
3、系统划分