collect

Return all items in the RDD to the driver in a single list.

Let us look at the legend and overview of the visual RDD Api.

val x = sc.parallelize(Array(1,2,3), 2) // make a RDD with two partitions
x: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[89108] at parallelize at <console>:34
// simply returns all elements in RDD x to the driver as an Array
val y = x.collect()
y: Array[Int] = Array(1, 2, 3)
//glom() flattens elements on the same partition
val xOut = x.glom().collect() 
xOut: Array[Array[Int]] = Array(Array(1), Array(2, 3))