Null value appeared in non-nullable field java.lang.NullPointerException

报错

Null value appeared in non-nullable field
java.lang.NullPointerException: Null value appeared in non-nullable field: top level row object
If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).

dataset schema

root
 |-- window: long (nullable = false)
 |-- linkId: long (nullable = false)
 |-- mapVersion: integer (nullable = false)
 |-- passthrough: long (nullable = false)
 |-- resident: long (nullable = false)
 |-- driverId: string (nullable = true)
 |-- inLink: map (nullable = true)
 |    |-- key: long
 |    |-- value: integer (valueContainsNull = false)
 |-- outLink: map (nullable = true)
 |    |-- key: long
 |    |-- value: integer (valueContainsNull = false)

报错缘由

有些不能够为null的字段被赋值为null了java

解决办法

一、过滤为这些字段为null的数据sql

二、将字段声明为能够为null的类型app

例子

val path: String = ???

val peopleDF = spark.read
  .option("inferSchema","true")
  .option("header", "true")
  .option("delimiter", ",")
  .csv(path)

peopleDF.printSchema

输出为: ide

root
|-- name: string (nullable = true)
|-- age: long (nullable = false)
|-- stat: string (nullable = true)
peopleDF.where($"age".isNull).show

输出为:spa

+----+----+----+
|name| age|stat|
+----+----+----+
| xyz|null|   s|
+----+----+----+

接下来将Dataset[Row] 转换为 Dataset[Person]scala

val peopleDS = peopleDF.as[Person]

peopleDS.printSchema

运行以下代码code

peopleDS.where($"age" > 30).show

结果get

+----+---+----+
|name|age|stat|
+----+---+----+
+----+---+----+

sql认为null是有效值string

运行以下代码it

peopleDS.filter(_.age > 30)

报上面的错误

缘由是由于scala中Long类型不能为null

解决办法,用Option类

case class Person(name: String, age: Option[Long], stat: String)
peopleDS.filter(_.age.map(_ > 30).getOrElse(false))

结果

+----+---+----+
|name|age|stat|
+----+---+----+
+----+---+----+
相关文章
相关标签/搜索