spark sql对seq(s1, s2, s3, …)值的包装,seq的每一个元素si会被包装成一个Row
若是si为一个简单值,则生成一个只包含一个value列的Row
若是si为一个N-Tuple,则生成一个包含N列的Rowweb
特别的,若是N-Tuple是一元组,则视为非元组,即生成一个只包含一个value列的Rowsql
scala> Seq(("bluejoe"),("alex")).toDF().show +-------+
| value| +-------+
|bluejoe|
| alex| +-------+
scala> Seq("bluejoe","alex").toDF().show +-------+
| value| +-------+
|bluejoe|
| alex| +-------+
scala> Seq(("bluejoe",1),("alex",0)).toDF().show +-------+---+
| _1| _2| +-------+---+
|bluejoe| 1|
| alex| 0| +-------+---+
我特地编写了以下测试用例,验证了这种状况:apache
@Test
def testEncoderSchema() {
val spark = SparkSession.builder.master("local[4]")
.getOrCreate();
val sqlContext = spark.sqlContext;
import sqlContext.implicits._
import org.apache.spark.sql.catalyst.encoders.encoderFor
val schema1 = encoderFor[String].schema;
val schema2 = encoderFor[(String)].schema;
val schema3 = encoderFor[((String))].schema;
Assert.assertEquals(schema1, schema2);
Assert.assertEquals(schema1, schema3);
}