Null value appeared in non-nullable field java.lang.NullPointerException: Null value appeared in non-nullable field: top level row object If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).
root |-- window: long (nullable = false) |-- linkId: long (nullable = false) |-- mapVersion: integer (nullable = false) |-- passthrough: long (nullable = false) |-- resident: long (nullable = false) |-- driverId: string (nullable = true) |-- inLink: map (nullable = true) | |-- key: long | |-- value: integer (valueContainsNull = false) |-- outLink: map (nullable = true) | |-- key: long | |-- value: integer (valueContainsNull = false)
有些不能够为null的字段被赋值为null了java
一、过滤为这些字段为null的数据sql
二、将字段声明为能够为null的类型app
val path: String = ??? val peopleDF = spark.read .option("inferSchema","true") .option("header", "true") .option("delimiter", ",") .csv(path) peopleDF.printSchema
输出为: ide
root |-- name: string (nullable = true) |-- age: long (nullable = false) |-- stat: string (nullable = true)
peopleDF.where($"age".isNull).show
输出为:spa
+----+----+----+ |name| age|stat| +----+----+----+ | xyz|null| s| +----+----+----+
接下来将Dataset[Row]
转换为 Dataset[Person]
scala
val peopleDS = peopleDF.as[Person] peopleDS.printSchema
运行以下代码code
peopleDS.where($"age" > 30).show
结果get
+----+---+----+ |name|age|stat| +----+---+----+ +----+---+----+
sql认为null是有效值string
运行以下代码it
peopleDS.filter(_.age > 30)
报上面的错误
缘由是由于scala中Long类型不能为null
解决办法,用Option类
case class Person(name: String, age: Option[Long], stat: String)
peopleDS.filter(_.age.map(_ > 30).getOrElse(false))
结果
+----+---+----+ |name|age|stat| +----+---+----+ +----+---+----+