File "/home/joy/spark/spark/python/pyspark/shell.py", line 28, in
html
import py4j zipimport.ZipImportError: can't decompress data; zlib not available
首先按照搜索结果使用 yum install -y zlib* 安装了欠缺的包,可是仍报错,后使用sudo命令执行./pyspark便可正常执行。目前必须使用sudo命令才能正常执行,可能与环境设置有关,待解决——由于使用sudo命令安装,因此文件的全部者为root,chown更改全部者。
可是这样必须使用sudo安装pip,为了一劳永逸,从新编译python
http://blog.csdn.net/woszsj/article/details/16848871
解决方法:java
一、安装依赖zlib、zlib-develnode
二、从新编译安装Pythonpython
./configure
编辑Modules/Setup文件
找到下面这句,去掉注释mysql
从新编译安装:make & make install
编译后报错仍有部分模块未编译成功
Python build finished, but the necessary bits to build these modules were not found:
_bsddb _curses _curses_panel
_sqlite3 _ssl _tkinter
bsddb185 bz2 dbm
dl gdbm imageop
参考
http://blog.csdn.net/huanle0610/article/details/41174943git
不管报错信息如何,意思很明确,咱们编译的时候,系统没有办法找到对应的模块信息,为了解决这些报错,咱们就须要提早安装依赖包,这些依赖包对应列表以下(不必定彻底):sql
模块 | 依赖 | 说明 |
---|---|---|
_bsddb | bsddb | Interface to Berkeley DB library。Berkeley数据库的接口.bsddb is deprecated since 2.6. The ideal is to use the bsddb3 module. |
_curses | ncurses | Terminal handling for character-cell displays。 |
_curses_panel | ncurses | A panel stack extension for curses。 |
_sqlite3 | sqlite | DB-API 2.0 interface for SQLite databases。SqlLite,CentOS能够安装sqlite-devel |
_ssl | openssl-devel.i686 | TLS/SSL wrapper for socket objects。 |
_tkinter | N/A | a thin object-oriented layer on top of Tcl/Tk。若是不使用桌面程序能够忽略TKinter |
bsddb185 | old bsddb module | 老的bsddb模块,可忽略。 |
bz2 | bzip2-devel.i686 | Compression compatible with bzip2。bzip2-devel |
dbm | bsddb | Simple “database” interface。 |
dl | N/A | Call C functions in shared objects.Python2.6开始,已经弃用。 |
gdbm | gdbm-devel.i686 | GNU’s reinterpretation of dbm |
imageop | N/A | Manipulate raw image data。已经弃用。 |
readline | readline-devel | GNU readline interface |
sunaudiodev | N/A | Access to Sun audio hardware。这个是针对Sun平台的,CentOS下能够忽略 |
zlib | Zlib | Compression compatible with gzip |
在CentOS下,能够安装这些依赖包:readline-devel,sqlite-devel,bzip2-devel.i686,openssl-devel.i686,gdbm-devel.i686,libdbi-devel.i686,ncurses-libs,zlib-devel.i686。完成这些安装以后,能够再次编译,上表中指定为弃用或者忽略的模块错误能够忽略。shell
在编译完成以后,就能够接着上面的第六步安装Python到指定目录下。安装完成以后,咱们能够到安装目录下查看Python是否正常安装。数据库
参考文章:http://www.2cto.com/database/201504/392307.htmlapache
首先呢,看使用HiveContext都须要哪些要求,这里参考了这篇文章:http://www.cnblogs.com/byrhuangqiang/p/4012087.html
文章中有这么三个要求:
一、检查$SPARK_HOME/lib目录下是否有datanucleus-api-jdo-3.2.1.jar、datanucleus-rdbms-3.2.1.jar
、datanucleus-core-3.2.2.jar 这几个jar包。
二、检查$SPARK_HOME/conf目录下是否有从$HIVE_HOME/conf目录下拷贝过来的hive-site.xml。
三、提交程序的时候将数据库驱动程序的jar包指定到DriverClassPath,如bin/spark-submit --driver-class-path *.jar。或者在spark-env.sh中设置SPARK_CLASSPATH。
参考文章,将$HIVE_HOME/lib下以datanucleus开头的几个jar包复制到$SPARK_HOME/lib下;$HIVE_HOME/conf下的hive-site.xml 复制到 $SPARK_HOME/conf下;将$HIVE_HOME/lib 下的mysql-connector复制到$SPARK_HOME/jars下,
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/01/17 11:42:58 WARN SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0 17/01/17 11:43:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/01/17 11:43:00 WARN Utils: Your hostname, node1 resolves to a loopback address: 127.0.0.1; using 192.168.85.128 instead (on interface eth1) 17/01/17 11:43:00 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 17/01/17 11:43:11 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead 17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.server2.thrift.http.min.worker.threads does not exist 17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.mapjoin.optimized.keys does not exist 17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.mapjoin.lazy.hashtable does not exist 17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.datampi.maxslots does not exist 17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.metastore.ds.retry.attempts does not exist 17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.server2.thrift.http.max.worker.threads does not exist 17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.datampi.sendqueue does not exist 17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.optimize.multigroupby.common.distincts does not exist 17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.metastore.ds.retry.interval does not exist 17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.datampi.parallelism does not exist 17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.stats.map.parallelism does not exist 17/01/17 11:43:11 WARN HiveConf: HiveConf of name hive.datampi.memusedpercent does not exist 17/01/17 11:43:12 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/joy/spark/spark-2.1.0-bin-hadoop2.6/jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/joy/spark/spark/jars/datanucleus-rdbms-3.2.9.jar." 17/01/17 11:43:12 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/joy/spark/spark-2.1.0-bin-hadoop2.6/jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/joy/spark/spark/jars/datanucleus-api-jdo-3.2.6.jar." 17/01/17 11:43:12 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/joy/spark/spark/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/joy/spark/spark-2.1.0-bin-hadoop2.6/jars/datanucleus-core-3.2.10.jar." 17/01/17 11:43:16 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead 17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.server2.thrift.http.min.worker.threads does not exist 17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.mapjoin.optimized.keys does not exist 17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.mapjoin.lazy.hashtable does not exist 17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.datampi.maxslots does not exist 17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.metastore.ds.retry.attempts does not exist 17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.server2.thrift.http.max.worker.threads does not exist 17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.datampi.sendqueue does not exist 17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.optimize.multigroupby.common.distincts does not exist 17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.metastore.ds.retry.interval does not exist 17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.datampi.parallelism does not exist 17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.stats.map.parallelism does not exist 17/01/17 11:43:16 WARN HiveConf: HiveConf of name hive.datampi.memusedpercent does not exist 17/01/17 11:43:22 ERROR ObjectStore: Version information found in metastore differs 0.13.0 from expected schema version 1.2.0. Schema verififcation is disabled hive.metastore.schema.verification so setting version. java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState': at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981) at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110) at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109) at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878) at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95) ... 47 elided Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog': at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978) ... 58 more Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog': at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169) at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86) at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101) at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101) at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100) at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157) at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32) ... 63 more Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.io.FileNotFoundException: File /hive/tmp does not exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166) ... 71 more **Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.io.FileNotFoundException: File /hive/tmp does not exist** at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:366) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:270) at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:65) ... 76 more Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File /hive/tmp does not exist at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:192) ... 84 more Caused by: java.io.FileNotFoundException: File /hive/tmp does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:537) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:750) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:527) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409) at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:599) at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508) ... 85 more
分析报错信息,发现出错缘由为/hive/tmp不存在的FileNotExist错误,查找hive-site.xml文件,该路径为 hive.exec.scratchdir 值, hive.exec.scratchdir 为 HDFS路径,用于存储不一样 map/reduce 阶段的执行计划和这些阶段的中间输出结果。
在终端输入hadoop fs -ls /hive
,执行结果为
Found 2 items drwxr-xr-x - joy supergroup 0 2016-06-12 21:35 /hive/log drwxr-xr-x - joy supergroup 0 2017-01-16 14:17 /hive/tmp
权限分配不对,应该增长g+w,hadoop fs -chmod g+w /hive/tmp
以及hadoop fs -chmod g+w /hive/log
,可是依然报错不存在
在$SPARK_HOME/conf下的spark-env.sh中增长HADOOP_CONF_DIR,增长后报错信息变动为
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState': at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981) at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110) at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109) at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878) at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95) ... 47 elided Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog': at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978) ... 58 more Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog': at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169) at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86) at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101) at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101) at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100) at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157) at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32) ... 63 more Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /hive/tmp on HDFS should be writable. Current permissions are: rwxrwxr-x at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166) ... 71 more Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /hive/tmp on HDFS should be writable. Current permissions are: rwxrwxr-x at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:366) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:270) at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:65) ... 76 more Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /hive/tmp on HDFS should be writable. Current permissions are: rwxrwxr-x at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:192) ... 84 more Caused by: java.lang.RuntimeException: The root scratch dir: /hive/tmp on HDFS should be writable. Current permissions are: rwxrwxr-x at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612) at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508) ... 85 more <console>:14: error: not found: value spark import spark.implicits._ ^ <console>:14: error: not found: value spark import spark.sql
出错信息指出文件夹权限不正确,再次使用hadoop fs -ls /hive
drwxrwxr-x - joy supergroup 0 2016-06-12 21:35 /hive/log drwxrwxr-x - joy supergroup 0 2017-01-16 14:17 /hive/tmp
将文件夹权限改成777,最终启动成功
出错信息相似于如下:
Traceback (most recent call last): File "/Users/lyj/Programs/kiseliugit/MyPysparkCodes/test/spark2.0.py", line 5, in <module> spark = SparkSession.builder.master("local").appName('test 2.0').config(conf=SparkConf()).getOrCreate() File "/Users/lyj/Programs/Apache/Spark2/python/pyspark/conf.py", line 104, in __init__ SparkContext._ensure_initialized() File "/Users/lyj/Programs/Apache/Spark2/python/pyspark/context.py", line 243, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File "/Users/lyj/Programs/Apache/Spark2/python/pyspark/java_gateway.py", line 116, in launch_gateway java_import(gateway.jvm, "org.apache.spark.SparkConf") File "/Library/Python/2.7/site-packages/py4j/java_gateway.py", line 90, in java_import return_value = get_return_value(answer, gateway_client, None, None) File "/Library/Python/2.7/site-packages/py4j/protocol.py", line 306, in get_return_value value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) KeyError: u'y'
出错缘由为py4j版本太低,使用pip upgrade升级便可
参考:http://stackoverflow.com/questions/38637988/how-could-i-write-the-right-entry-point-in-spark-2-0-program-actually-pyspark-2