Spark Sql 小文件问题

时间 2021-01-17

原文原文链接

参考： https://github.com/Intel-bigdata/spark-adaptive http://spark.apache.org/docs/latest/configuration.html 使用Spark Sql APIs 处理数据容易产生生成大量小文件，小文件问题也是在分布式计算中常见的问题。一般有三种方法来处理这类问题：设置spark.sql.shuffle.part