spark 2.1.1html
最近spark任务(spark on yarn)有一个报错java
Diagnostics: Container [pid=5901,containerID=container_1542879939729_30802_01_000001] is running beyond physical memory limits. Current usage: 11.0 GB of 11 GB physical memory used; 12.2 GB of 23.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1542879939729_30802_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 5901 5899 5901 5901 (bash) 3 4 115843072 361 /bin/bash -c LD_LIBRARY_PATH=/export/App/hadoop-2.6.1/lib/native::/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native::/export/App/hadoop-2.6.1/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native::/export/App/hadoop-2.6.1/lib/native:/export/App/hadoop-2.6.1/lib/native /export/App/jdk1.8.0_60/bin/java -server -Xmx10240m -Djava.io.tmpdir=/export/Data/tmp/hadoop-tmp/nm-local-dir/usercache/hadoop/appcache/application_1542879939729_30802/container_1542879939729_30802_01_000001/tmp '-XX:+PrintGCDetails' '-XX:+UseG1GC' '-XX:G1HeapRegionSize=32M' '-XX:+UseGCOverheadLimit' '-XX:+ExplicitGCInvokesConcurrent' '-XX:+HeapDumpOnOutOfMemoryError' '-XX:-UseCompressedClassPointers' '-XX:CompressedClassSpaceSize=3G' '-XX:+PrintGCTimeStamps' '-Xloggc:/export/Logs/hadoop/g1gc.log' -Dspark.yarn.app.container.log.dir=/export/Logs/hadoop/userlogs/application_1542879939729_30802/container_1542879939729_30802_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'app.package.AppClass' --jar file:/jarpath/app.jar --properties-file /export/Data/tmp/hadoop-tmp/nm-local-dir/usercache/hadoop/appcache/application_1542879939729_30802/container_1542879939729_30802_01_000001/__spark_conf__/__spark_conf__.properties 1> /export/Logs/hadoop/userlogs/application_1542879939729_30802/container_1542879939729_30802_01_000001/stdout 2> /export/Logs/hadoop/userlogs/application_1542879939729_30802/container_1542879939729_30802_01_000001/stderr
|- 6406 5901 5901 5901 (java) 1834301 372741 13026095104 2888407 /export/App/jdk1.8.0_60/bin/java -server -Xmx10240m -Djava.io.tmpdir=/export/Data/tmp/hadoop-tmp/nm-local-dir/usercache/hadoop/appcache/application_1542879939729_30802/container_1542879939729_30802_01_000001/tmp -XX:+PrintGCDetails -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:-UseCompressedClassPointers -XX:CompressedClassSpaceSize=3G -XX:+PrintGCTimeStamps -Xloggc:/export/Logs/hadoop/g1gc.log -Dspark.yarn.app.container.log.dir=/export/Logs/hadoop/userlogs/application_1542879939729_30802/container_1542879939729_30802_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class app.package.AppClass --jar file:/jarpath/app.jar --properties-file /export/Data/tmp/hadoop-tmp/nm-local-dir/usercache/hadoop/appcache/application_1542879939729_30802/container_1542879939729_30802_01_000001/__spark_conf__/__spark_conf__.properties
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attemptapache
从containerID=container_1542879939729_30802_01_000001,以及org.apache.spark.deploy.yarn.ApplicationMaster,可知这个是yarn的ApplicationMaster,运行的是spark的driver,bash
问题是提交spark任务时参数为 --driver-moery 10g,并且进程启动命令中也确实是 -Xmx10240m,为何container被kill是由于超过11g?app
Container [pid=5901,containerID=container_1542879939729_30802_01_000001] is running beyond physical memory limits. Current usage: 11.0 GB of 11 GB physical memory used;oop
跟进spark任务提交过程,详见:https://www.cnblogs.com/barneywill/p/9820684.htmlui
org.apache.spark.launcher.SparkSubmitCommandBuilderthis
String tsMemory = isThriftServer(mainClass) ? System.getenv("SPARK_DAEMON_MEMORY") : null; String memory = firstNonEmpty(tsMemory, config.get(SparkLauncher.DRIVER_MEMORY), System.getenv("SPARK_DRIVER_MEMORY"), System.getenv("SPARK_MEM"), DEFAULT_MEM); cmd.add("-Xmx" + memory);
这里会取driver memory的值,取的地方有一个优先级,firstNonEmptyspa
org.apache.spark.deploy.SparkSubmitrest
// In yarn-cluster mode, use yarn.Client as a wrapper around the user class if (isYarnCluster) { childMainClass = "org.apache.spark.deploy.yarn.Client"
若是--master yarn时,会提交Client类
org.apache.spark.deploy.yarn.Client
// AM related configurations private val amMemory = if (isClusterMode) { sparkConf.get(DRIVER_MEMORY).toInt } else { sparkConf.get(AM_MEMORY).toInt } private val amMemoryOverhead = { val amMemoryOverheadEntry = if (isClusterMode) DRIVER_MEMORY_OVERHEAD else AM_MEMORY_OVERHEAD sparkConf.get(amMemoryOverheadEntry).getOrElse( math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toLong, MEMORY_OVERHEAD_MIN)).toInt } private val amCores = if (isClusterMode) { sparkConf.get(DRIVER_CORES) } else { sparkConf.get(AM_CORES) } // Executor related configurations private val executorMemory = sparkConf.get(EXECUTOR_MEMORY) private val executorMemoryOverhead = sparkConf.get(EXECUTOR_MEMORY_OVERHEAD).getOrElse( math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toLong, MEMORY_OVERHEAD_MIN)).toInt
其中会设置amMemoryOverhead 和executorMemoryOverhead
val capability = Records.newRecord(classOf[Resource]) capability.setMemory(amMemory + amMemoryOverhead) capability.setVirtualCores(amCores)
而后会根据amMemory+amMemoryOverhead的值来向yarn申请资源;
一些默认值和配置以下:
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil
object YarnSparkHadoopUtil { // Additional memory overhead // 10% was arrived at experimentally. In the interest of minimizing memory waste while covering // the common cases. Memory overhead tends to grow with container size. val MEMORY_OVERHEAD_FACTOR = 0.10 val MEMORY_OVERHEAD_MIN = 384L
org.apache.spark.deploy.yarn.config
private[spark] val DRIVER_MEMORY_OVERHEAD = ConfigBuilder("spark.yarn.driver.memoryOverhead") .bytesConf(ByteUnit.MiB) .createOptional private[spark] val EXECUTOR_MEMORY_OVERHEAD = ConfigBuilder("spark.yarn.executor.memoryOverhead") .bytesConf(ByteUnit.MiB) .createOptional
因此默认的driver memory申请方式为
1 spark.yarn.driver.memoryOverhead 配置优先
2 driverMemory + overhead
其中 overhead = math.max((0.1 * driverMemory).toLong, 384))
因此--driver-memory 10g时向yarn申请的container内存是11g