Local vectorhtml
Labeled pointapache
Local matrixapi
RowMatrix分布式
BlockMatrixspa
MLlib支持 在单独节点上本地化存储局部向量(local vectors) 和局部矩阵(local matrices),也能够依赖一个或更多的RDD来进行分布式的存储矩阵。局部向量和局部矩阵是简单的数据模型,被做为公共接口。底层的线性代数操做由 Breeze 和 jblas 提供。在MLlib中,一个使用监督式学习的例子被叫作“labeled point”。code
A local vector has integer-typed and 0-based indices and double-typed values, stored on a single machine. MLlib supports two types of local vectors: dense and sparse. A dense vector is backed by a double array representing its entry values, while a sparse vector is backed by two parallel arrays: indices and values. For example, a vector (1.0, 0.0, 3.0)
can be represented in dense format as [1.0, 0.0, 3.0]
or in sparse format as (3, [0, 2], [1.0, 3.0])
, where 3
is the size of the vector.
一个局部向量由一个从0开始的整数类型索引和一个double类型的值组成,被存储在一个单独的机器上。MLlib支持两种类型的局部向量:密集型和稀疏行。一个密集型依靠一个double型数组来表明他的entry值,而一个稀疏型向量依靠两个并行数组:索引数组和值数组。举个例子,一个向量(1.0,0.0,3.0)能够被表示为密集型格式:[1.0, 0.0, 3.0] 或者被表示为稀疏型格式:(3, [0,2], [1.0, 3.0]),元组的第一个值3是向量的数量。
The base class of local vectors is Vector
, and we provide two implementations: DenseVector
and SparseVector
. We recommend using the factory methods implemented in Vectors
to create local vectors.
局部向量的基本类型是Vector,咱们提供了两种实现:DenseVector
and SparseVector
.
咱们推荐使用 Vectors 已经实现了的
工厂方法来建立局部向量。
Refer to the Vector
Scala docs and Vectors
Scala docs for details on the API.
详细信息请参阅 Vector
Scala docs and Vectors
Scala docs API.
import org.apache.spark.mllib.linalg.{Vector, Vectors} // Create a dense vector (1.0, 0.0, 3.0). val dv: Vector = Vectors.dense(1.0, 0.0, 3.0) // Create a sparse vector (1.0, 0.0, 3.0) by specifying its indices and values corresponding to nonzero entries. val sv1: Vector = Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0)) // Create a sparse vector (1.0, 0.0, 3.0) by specifying its nonzero entries. val sv2: Vector = Vectors.sparse(3, Seq((0, 1.0), (2, 3.0))) //建立一个密集型局部向量(density) val dv = Vectors.dense(Array(1.0,0.0,3.0)) val densityVector = Vectors.dense(1.0,0.0,3.0) //建立一个稀疏型局部向量(sparse),两种方式: //一:使用并行数组:格式-> (size,index[Int],values[Double]) val sv1 = Vectors.sparse(3,Array(0,2),Array(1.0,3.0)) //二:使用Seq:格式-> (size,Seq((index,values)+)) val sv2 = Vectors.sparse(3,Seq((0,1.0),(2,3.0))) println(dv) println(densityVector) println(sv1) println(sv2) println(sv3) result: [1.0,0.0,3.0] [1.0,0.0,3.0] (3,[0,2],[1.0,3.0]) (3,[0,2],[1.0,3.0]) (3,[0,2],[1.0,3.0])
Note: Scala imports scala.collection.immutable.Vector
by default, so you have to importorg.apache.spark.mllib.linalg.Vector
explicitly to use MLlib’s Vector
.