由1.5版本的spark升级至2.0版本,将编译好的2.0版本spark软件包放到指定目录下,解压替换原先1.5版本的sparkjava
$ spark-sql --version Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0 /_/ Branch Compiled by user jenkins on 2016-07-19T21:16:08Z Revision Url Type --help for more information.
替换目录注意事项:mysql
一、配置文件sql
须要替换的配置文件:数据库
slaves ---全部worker的节点apache
spark-defaults.conf --spark启动配置文件json
spark-env.sh --spark启动环境变量api
log4j.properties --spark启动日志状况app
hive-site.xml --hive的配置文件,此处能够直接从hive配置里拷贝一份或者直接创建软连接ide
$ ln -s $HIVE_HOME/conf/hive-site.xml hive-site.xmloop
二、软件包
重点注意:因为hive使用的元数据库为mysql,此处spark与hive共用mysql,因此须要拷贝mysql链接的jar包 mysql-connector-java-5.1.31-bin.jar
另外,1.5版本里的spark,jar包是存放在$SPARK_HOME/lib目录下的,而2.0的jar包是存放于 $SPARK_HOME/jars下的。
一、本地模式:
$ spark-sql --master local
能够正常启动!
二、yarn模式
$ spark-sql --master yarn
16/08/25 17:29:28 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/08/25 17:29:28 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2256)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:57)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:288)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:137)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 25 more
启动报错,提示找不到 jersey 的类
查看关于jersey的jar包
$ find /opt/dlw/core/ |grep jersey
发现yarn的lib包下面使用的是1.9的jar,而spark下使用的是2.22.2的jar包,
/opt/dlw/core/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar
/opt/dlw/core/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar
/opt/dlw/core/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar
/opt/dlw/core/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar
/opt/dlw/core/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar
/opt/dlw/core/spark/jars/jersey-container-servlet-2.22.2.jar
/opt/dlw/core/spark/jars/jersey-guava-2.22.2.jar
/opt/dlw/core/spark/jars/jersey-container-servlet-core-2.22.2.jar
/opt/dlw/core/spark/jars/jersey-common-2.22.2.jar
/opt/dlw/core/spark/jars/jersey-server-2.22.2.jar
/opt/dlw/core/spark/jars/jersey-media-jaxb-2.22.2.jar
/opt/dlw/core/spark/jars/jersey-client-2.22.2.jar
通过屡次测试验证,发现所缺的类在 jersey-core-1.9.jar 和 jersey-client-1.9.jar 两个jar包中
缺乏 jersey-core-1.9.jar 的报错信息以下:
$ spark-sql --master yarn
16/08/25 17:42:47 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/08/25 17:42:47 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2256)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:57)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:288)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:137)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.core.util.FeaturesAndProperties
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 37 more
将 jersey-core-1.9.jar 和 jersey-client-1.9.jar 这两个包拷贝到$SPARK_HOME/jars目录下,并将该目录下本来的 jersey-client-2.22.2.jar更名,yarn模式的spark就能够正常启动。
$ spark-sql --master yarn --num-executors 36 16/08/25 16:40:15 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 16/08/25 16:40:15 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 16/08/25 16:40:19 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 16/08/25 16:40:38 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect. 16/08/25 16:40:38 WARN Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@493be56e:an attempt to override final parameter: hive.security.authorization.enabled; Ignoring. 16/08/25 16:40:38 WARN Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@493be56e:an attempt to override final parameter: hive.security.authorization.task.factory; Ignoring. 16/08/25 16:40:39 WARN Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@520f7f74:an attempt to override final parameter: hive.security.authorization.enabled; Ignoring. 16/08/25 16:40:39 WARN Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@520f7f74:an attempt to override final parameter: hive.security.authorization.task.factory; Ignoring. spark-sql (default)> cache table dm_cljy_m_all_label_txt_ext; 16/08/25 16:40:45 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf. [Stage 0:====> (13 + 135) / 148]16/08/25 16:42:17 WARN DFSClient: Slow ReadProcessor read fields took 91294ms (threshold=30000ms); ack: seqno: 74 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 19355905, targets: [10.162.5.167:50010, 10.162.5.164:50010, 10.162.5.162:50010] [Stage 0:====================================================> (141 + 7) / 148]16/08/25 16:43:14 WARN DFSClient: Slow ReadProcessor read fields took 48494ms (threshold=30000ms); ack: seqno: 132 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 821343, targets: [10.162.5.167:50010, 10.162.5.164:50010, 10.162.5.162:50010] Time taken: 155.439 seconds spark-sql (default)> select count(*) from dm_cljy_m_all_label_txt_ext; 16/08/25 16:43:59 WARN DFSClient: Slow ReadProcessor read fields took 41094ms (threshold=30000ms); ack: seqno: 149 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 418964, targets: [10.162.5.167:50010, 10.162.5.164:50010, 10.162.5.162:50010] 54966176 Time taken: 1.104 seconds, Fetched 1 row(s)
cache一张表,再对该表进行计算,spark真是快啊。