PredictionIO+Universal Recommender虽然能够帮助中小企业快速的搭建部署基于用户行为协同过滤的个性化推荐引擎,单纯从引擎层面来看,开发成本近乎于零,但仍然须要一些前提条件。好比说,组织内部最好已经搭建了较稳定的Hadoop,Spark集群,至少要拥有一部分熟悉Spark平台的开发和运维人员,不然会须要技术团队花费很长时间来踩坑,试错。java
本文列举了一些PredictionIO+Universal Recommender的使用过程当中会遇到的Spark平台相关的异常信息,以及其解决思路和最终的解决办法,供参考。node
1,执行训练时,发生java.lang.StackOverflowError错误git
这个问题比较简单,查看文档,执行训练时,经过参数指定内存大小能够避免该问题,例如:github
pio train -- --driver-memory 8g --executor-memory 8g --verbose
2,执行训练时,发生找不到EmptyRDD方法的错误apache
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.SparkContext.emptyRDD(Lscala/reflect/ClassTag;)Lorg/apache/spark/rdd/EmptyRDD; at com.actionml.URAlgorithm.getRanksRDD(URAlgorithm.scala:534) at com.actionml.URAlgorithm.calcAll(URAlgorithm.scala:340) at com.actionml.URAlgorithm.train(URAlgorithm.scala:285) at com.actionml.URAlgorithm.train(URAlgorithm.scala:175)
这个是编译和执行环境的Spark版本不一致致使的。json
/** Get an RDD that has no partitions or elements. */def emptyRDD[T: ClassTag]: RDD[T] = new EmptyRDD[T](this)
[INFO] [ServerConnector] Started ServerConnector@bd93bc3{HTTP/1.1}{0.0.0.0:4040} [INFO] [Server] Started @6428ms Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:151) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at org.apache.predictionio.workflow.WorkflowContext$.apply(WorkflowContext.scala:45) at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59) at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250) at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 21 more
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@7772d266{/jobs,null,UNAVAILABLE} [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered! [WARN] [MetricsSystem] Stopping a MetricsSystem that is not running Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUtil.scala:152) at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:775) at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:773) at scala.Option.foreach(Option.scala:257) at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:773) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:867) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:170) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at org.apache.predictionio.workflow.WorkflowContext$.apply(WorkflowContext.scala:45) at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59) at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250) at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
export SPARK_YARN_USER_ENV="HADOOP_CONF_DIR=/home/hadoop/apache-hadoop/etc/hadoop"
[WARN] [TaskSetManager] Lost task 3.0 in stage 173.0 (TID 250, bigdata01, executor 3): java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one jar:file:/home/hadoop/apache-hadoop/hadoop/var/yarn/local-dir/usercache/hadoop/appcache/application_1504083960020_0030/container_e235_1504083960020_0030_01_000005/universal-recommender-assembly-0.6.0-deps.jar jar:file:/home/hadoop/apache-hadoop/hadoop-2.7.2/var/yarn/local-dir/usercache/hadoop/appcache/application_1504083960020_0030/container_e235_1504083960020_0030_01_000005/universal-recommender-assembly-0.6.0-deps.jar at org.elasticsearch.hadoop.util.Version.<clinit>(Version.java:73) at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:570) at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:107) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:107) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
这不知道算不算一个BUG,总之,yarn的配置中若是使用了软链接来指定hadoop文件夹的路径,将有可能发生此问题。参考 https://interset.zendesk.com/hc/en-us/articles/230751687-PhoenixToElasticSearchJob-Fails-with-Multiple-ES-Hadoop-versions-detected-in-the-classpath-bootstrap
解决方式也很简单,nodemanager修改全部采用Hadoop文件夹的软链接的配置,改成真正的路径便可。api
6,Spark的JOB执行出错app
[WARN] [Utils] Service 'sparkDriver' could not bind on port 0. Attempting port 1. [ERROR] [SparkContext] Error initializing SparkContext. Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 60 retries (starting from 0)! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries. at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:437) at sun.nio.ch.Net.bind(Net.java:429) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:127) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:501) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1218) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:965) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:210) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:353) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:408) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:455) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) at java.lang.Thread.run(Thread.java:745)