mapPartitionsWithIndex

Return a new RDD by applying a function to each partition of this RDD, while tracking the index of the original partition.

val x = sc. parallelize(Array(1,2,3), 2) // make an RDD with 2 partitions

x: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[289] at parallelize at <console>:34

def f(partitionIndex:Int, i:Iterator[Int]) = {
  (partitionIndex, i.sum).productIterator
}

f: (partitionIndex: Int, i: Iterator[Int])Iterator[Any]

val y = x.mapPartitionsWithIndex(f)

y: org.apache.spark.rdd.RDD[Any] = MapPartitionsRDD[304] at mapPartitionsWithIndex at <console>:38

//glom() flattens elements on the same partition
val xout = x.glom().collect()

xout: Array[Array[Int]] = Array(Array(1), Array(2, 3))

val yout = y.glom().collect()

yout: Array[Array[Any]] = Array(Array(0, 1), Array(1, 5))

sds-3.x/ScaDaMaLe