『 Spark 』1. spark 简介

时间 2019-12-05

标签 Spark spark 简介栏目 Spark 繁體版

原文原文链接

写在前面

本系列是综合了本身在学习spark过程当中的理解记录＋对参考文章中的一些理解＋我的实践spark过程当中的一些心得而来。写这样一个系列仅仅是为了梳理我的学习spark的笔记记录，并不是为了作什么教程，因此一切以我的理解梳理为主，没有必要的细节就不会记录了。若想深刻了解，最好阅读参考文章和官方文档。github

其次，本系列是基于目前最新的 spark 1.6.0 系列开始的，spark 目前的更新速度很快，记录一下版本好仍是必要的。
最后，若是各位以为内容有误，欢迎留言备注，全部留言 24 小时内一定回复，很是感谢。
Tips: 若是插图看起来不明显，能够：1. 放大网页；2. 新标签中打开图片，查看原图哦。sql

1. 如何向别人介绍 spark

Apache Spark™ is a fast and general engine for large-scale data processing.编程

Apache Spark is a fast and general-purpose cluster computing system.
It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.
It also supports a rich set of higher-level tools including :app

Spark SQL for SQL and structured data processing, extends to DataFrames and DataSetside
MLlib for machine learningoop
GraphX for graph processing学习
Spark Streaming for stream data processing大数据

2. spark 诞生的一些背景

Spark started in 2009, open sourced 2010, unlike the various specialized systems[hadoop, storm], Spark’s goal was to :ui

generalize MapReduce to support new apps within same engine
- it's perfectly compatible with hadoop, can run on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.
speed up iteration computing over hadoop.
- use memory + disk instead of disk as data storage medium
- design a new programming modal, RDD, which make the data processing more graceful [RDD transformation, action, distributed jobs, stages and tasks]

3. 为什么选用 spark

designed, implemented and used as libs, instead of specialized systems;
- much more useful and maintainable

from history, it is designed and improved upon hadoop and storm, it has perfect genes;
documents, community, products and trends;
it provides sql, dataframes, datasets, machine learning lib, graph computing lib and activitily growth 3-party lib, easy to use, cover lots of use cases in lots field;
it provides ad-hoc exploring, which boost your data exploring and pre-processing and help you build your data ETL, processing job;

4. Next

下一篇，简单介绍 spark 里必须深入理解的基本概念。

参考文章

本系列文章连接

1. 『 Spark 』1. spark 简介
2. 1.spark简介
3. Spark实践1（Spark简介）
4. Spark（一）Spark简介
5. 【Spark】Spark SQL简介
6. Spark-Spark MLib简介
7. Spark速成之1：简介
8. Spark SQL和Spark Streaming简介
9. Spark（二） -- Spark简单介绍
10. Spark、Python spark、Hadoop简介
更多相关文章...
• Spring JDBCTemplate简介 - Spring教程
• XQuery 简介 - XQuery 教程
• Github 简明教程
• Java Agent入门实战（一）-Instrumentation介绍与使用