pyspark LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak

pyspark执行卡在某一个阶段,而且报错:分布式

LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak reporting

缘由:spa

分布式数据量太大,收集到一台机器就会报错code

解决方法:it

在分布式计算中尽可能少使用收集到本地处理,好比collect、countByKey等等算子,直接输出到hdfs文件spark

相关文章
相关标签/搜索