Spark的监控

时间 2019-12-19

标签 spark 监控栏目 Spark 繁體版

原文原文链接

Monitoring

spark的监控咱们目前只介绍4种，分别是html

经过Spark UI进行监控
使用Spark HistoryServer UI进行监控
使用REST API进行监控
Metrics

经过Spark UI进行监控

Spark的webUI界面给咱们提供了很是好的做业监控界面，经过仔细观察那些界面咱们能够作不少的事，好比能够查看正在运行的spark程序的做业的详细信息，Duration、gc、launch的时间，这都须要在生产上观看的。可是当任务跑完或者挂了的时候，咱们是没法看到任何信息的node

当启动spark做业的时候

能够看到http://hadoop001:4040这个webUI界面的地址，尝试打开
首先在spark中实现一个join

实现join之后，查看webUI界面

DAG图

当咱们把spark做业关掉，再刷新http://hadoop001:4040界面，发现界面打不开了

这就致使咱们没法查看生产上致使job挂了的缘由，也就没法作出解决

使用Spark HistoryServer UI进行监控

经过Spark HistoryServer咱们能够观看已经结束的Spark Applicationweb

想要使用HistoryServer，须要先进行配置才能使用，配置的时候须要细心一点，不然会有问题，如下内容是根据官网的最新内容进行配置docker

配置一

[hadoop@hadoop001 conf]$ pwd
/home/hadoop/app/spark-2.4.2-bin-2.6.0-cdh5.7.0/conf

[hadoop@hadoop001 conf]$ cp  spark-defaults.conf.template  spark-defaults.conf

[hadoop@hadoop001 conf]$ ll
total 52
-rw-r--r-- 1 hadoop hadoop 996 May 2 00:49 docker.properties.template
-rw-r--r-- 1 hadoop hadoop 1105 May 2 00:49 fairscheduler.xml.template
-rw-r--r-- 1 hadoop hadoop 1129 Jun 9 21:12 hive-site.xml
-rw-r--r-- 1 hadoop hadoop 2025 May 2 00:49 log4j.properties.template
-rw-r--r-- 1 hadoop hadoop 7801 May 2 00:49 metrics.properties.template
-rw-r--r-- 1 hadoop hadoop 865 May 2 00:49 slaves.template
-rw-r--r-- 1 hadoop hadoop 1406 Jun 18 22:09 spark-defaults.conf
-rw-r--r-- 1 hadoop hadoop 1292 May 2 00:49 spark-defaults.conf.template
-rwxr-xr-x 1 hadoop hadoop 4221 May 2 00:49 spark-env.sh.template

[hadoop@hadoop001 conf]$ vim spark-defaults.conf

配置界面

在spark-defaults.conf打开如下两个选项

spark.eventLog.enabled true #开启事件日志

spark.eventLog.dir hdfs://hadoop001:9000/g6_directory  #事件日志存放位置  
#hadoop001:9000就是你hadoop  /home/hadoop/app/hadoop/etc/hadoop/core-site.xml,你的

#core-site.xml中的fs.defaultFS配置的啥 你就写啥

配置二

[hadoop@hadoop001 conf]$ cp spark-env.sh.template spark-env.sh

[hadoop@hadoop001 conf]$ ll
total 52
-rw-r--r-- 1 hadoop hadoop 996 May 2 00:49 docker.properties.template
-rw-r--r-- 1 hadoop hadoop 1105 May 2 00:49 fairscheduler.xml.template
-rw-r--r-- 1 hadoop hadoop 1129 Jun 9 21:12 hive-site.xml
-rw-r--r-- 1 hadoop hadoop 2025 May 2 00:49 log4j.properties.template
-rw-r--r-- 1 hadoop hadoop 7801 May 2 00:49 metrics.properties.template
-rw-r--r-- 1 hadoop hadoop 865 May 2 00:49 slaves.template
-rw-r--r-- 1 hadoop hadoop 1406 Jun 18 22:09 spark-defaults.conf
-rw-r--r-- 1 hadoop hadoop 1292 May 2 00:49 spark-defaults.conf.template
-rwxr-xr-x 1 hadoop hadoop 4581 Jun 18 22:07 spark-env.sh
-rwxr-xr-x 1 hadoop hadoop 4221 May 2 00:49 spark-env.sh.template

官网解释

由上图红框能够看出，全部的spark.history.*参数都要配置在SPARK_HISTORY_OPTS之下apache

下表是全部的spark.history参数

Property Name	Default	Meaning
spark.history.provider	`org.apache.spark.deploy.history.FsHistoryProvider`	Name of the class implementing the application history backend. Currently there is only one implementation, provided by Spark, which looks for application logs stored in the file system.
spark.history.fs.logDirectory	file:/tmp/spark-events 日志目录	For the filesystem history provider, the URL to the directory containing application event logs to load. This can be a local `file://` path, an HDFS path `hdfs://namenode/shared/spark-logs` or that of an alternative filesystem supported by the Hadoop APIs.
spark.history.fs.update.interval	10s #多久更新日志	The period at which the filesystem history provider checks for new or updated logs in the log directory. A shorter interval detects new applications faster, at the expense of more server load re-reading updated applications. As soon as an update has completed, listings of the completed and incomplete applications will reflect the changes.
spark.history.retainedApplications	50 #内存中最多持有程序数，多的则须要读取磁盘vim	The number of applications to retain UI data for in the cache. If this cap is exceeded, then the oldest applications will be removed from the cache. If an application is not in the cache, it will have to be loaded from disk if it is accessed from the UI.
spark.history.ui.maxApplications	Int.MaxValue	The number of applications to display on the history summary page. Application UIs are still available by accessing their URLs directly even if they are not displayed on the history summary page.
spark.history.ui.port	18080 UIweb界面端口号默认的	The port to which the web interface of the history server binds.
spark.history.kerberos.enabled	false	Indicates whether the history server should use kerberos to login. This is required if the history server is accessing HDFS files on a secure Hadoop cluster. If this is true, it uses the configs `spark.history.kerberos.principal`and`spark.history.kerberos.keytab`.
spark.history.kerberos.principal	(none)	Kerberos principal name for the History Server.
spark.history.kerberos.keytab	(none)	Location of the kerberos keytab file for the History Server.
spark.history.fs.cleaner.enabled	false #是否开启清理日志数据功能，生产上是必须清理的，需开启	Specifies whether the History Server should periodically clean up event logs from storage.
spark.history.fs.cleaner.interval	1d	How often the filesystem job history cleaner checks for files to delete. Files are only deleted if they are older than `spark.history.fs.cleaner.maxAge`
spark.history.fs.cleaner.maxAge	7d	Job history files older than this will be deleted when the filesystem history cleaner runs.
spark.history.fs.endEventReparseChunkSize	1m	How many bytes to parse at the end of log files looking for the end event. This is used to speed up generation of application listings by skipping unnecessary parts of event log files. It can be disabled by setting this config to 0.
spark.history.fs.inProgressOptimization.enabled	true	Enable optimized handling of in-progress logs. This option may leave finished applications that fail to rename their event logs listed as in-progress.
spark.history.fs.numReplayThreads	25% of available cores	Number of threads that will be used by history server to process event logs.
spark.history.store.maxDiskUsage	10g	Maximum disk usage for the local directory where the cache application history information are stored.
spark.history.store.path	(none)	Local directory where to cache application history data. If set, the history server will store application data on disk instead of keeping it in memory. The data written to disk will be re-used in the event of a history server restart.

说明：-D是必须加在最前边，x表明spark.history.*参数，y表明给参数赋的值api

这里的app

SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://hadoop001:9000/g6_directory"的hdfs://hadoop001:9000/g6_directory目录必须跟配置一中spark-defaults.conf的spark.eventLog.dir路径配置一致，由于日志存放在哪里就要从哪里读取呀ide

提示：日志的存放目录必须提早在HDFS上建立oop

启动

[hadoop@hadoop001 spark-2.4.2-bin-2.6.0-cdh5.7.0]$ ./sbin/start-history-server.sh

[hadoop@hadoop001 spark-2.4.2-bin-2.6.0-cdh5.7.0]$ jps
17184 Jps
22609 ResourceManager
21604 NameNode
21860 DataNode
19210 HistoryServer   #启动的日志进程
22748 NodeManager
22236 SecondaryNameNode

启动HistoryServer之后，咱们能够启动一个spark应用程序，而后关闭，这时候咱们发现，web界面的信息并无随着应用程序的结束而消失，实际上hadoop001:18080这个界面显示的只是job结束之后的信息，在job还在运行的时候是看不到的

如图

关闭

[hadoop@hadoop001 spark-2.4.2-bin-2.6.0-cdh5.7.0]$ ./sbin/stop-history-server.sh

使用REST API进行监控

根据是否完成状态来筛选job信息，在http://hadoop001:4040/api/v1/applications中没法看到已完成的job只能看到正在执行的job
在http://hadoop001:18080/api/v1/applications能够看到全部状态的job

在applications路径之下咱们能够根据上图中的job的完成状态最先开始时间俎新开始时间等等，自行查看本身想要的信息
这只是一部分，其余详细信息可自行去官网查看
http://spark.apache.org/docs/latest/monitoring.html

Metrics

此方式生产用的少，一般对spark研究很深的人才才可能会使用，此处暂不作介绍

Spark的监控

Monitoring

经过Spark UI进行监控

使用Spark HistoryServer UI进行监控

使用REST API进行监控

Metrics

经过Spark UI进行监控

使用Spark HistoryServer UI进行监控

配置一

配置二

启动

关闭

使用REST API进行监控

Metrics