This is an elaboration of the Apache Spark mllib-progamming-guide on mllib-data-types.

Overview

Data Types - MLlib Programming Guide

MLlib supports local vectors and matrices stored on a single machine, as well as distributed matrices backed by one or more RDDs. Local vectors and local matrices are simple data models that serve as public interfaces. The underlying linear algebra operations are provided by Breeze and jblas. A training example used in supervised learning is called a “labeled point” in MLlib.

Labeled point in Scala

A labeled point is a local vector, either dense or sparse, associated with a label/response. In MLlib, labeled points are used in supervised learning algorithms.

We use a double to store a label, so we can use labeled points in both regression and classification.

For binary classification, a label should be either 0 (negative) or 1 (positive). For multiclass classification, labels should be class indices starting from zero: 0, 1, 2, ....

A labeled point is represented by the case class LabeledPoint.

Refer to the LabeledPoint Scala docs for details on the API.

//import firstimport org.apache.spark.mllib.linalg.Vectorsimport org.apache.spark.mllib.regression.LabeledPoint

import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint

// Create a labeled point with a "positive" label and a dense feature vector.val pos = LabeledPoint(1.0, Vectors.dense(1.0, 0.0, 3.0))

pos: org.apache.spark.mllib.regression.LabeledPoint = (1.0,[1.0,0.0,3.0])

// Create a labeled point with a "negative" label and a sparse feature vector.val neg = LabeledPoint(0.0, Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0)))

neg: org.apache.spark.mllib.regression.LabeledPoint = (0.0,(3,[0,2],[1.0,3.0]))

Sparse data in Scala

It is very common in practice to have sparse training data. MLlib supports reading training examples stored in LIBSVM format, which is the default format used by LIBSVM and LIBLINEAR. It is a text format in which each line represents a labeled sparse feature vector using the following format:

label index1:value1 index2:value2 ...

where the indices are one-based and in ascending order. After loading, the feature indices are converted to zero-based.

MLUtils.loadLibSVMFile reads training examples stored in LIBSVM format.

Refer to the MLUtils Scala docs for details on the API.

import org.apache.spark.mllib.regression.LabeledPointimport org.apache.spark.mllib.util.MLUtilsimport org.apache.spark.rdd.RDD //val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt") // from prog guide but no such data here - can wget from github 

import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.util.MLUtils import org.apache.spark.rdd.RDD

display(dbutils.fs.ls("/databricks-datasets/mnist-digits/data-001/mnist-digits-train.txt"))

val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, "/databricks-datasets/mnist-digits/data-001/mnist-digits-train.txt")

examples: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint] = MapPartitionsRDD[3661] at map at MLUtils.scala:88

examples.take(1)

res1: Array[org.apache.spark.mllib.regression.LabeledPoint] = Array((5.0,(780,[152,153,154,155,156,157,158,159,160,161,162,163,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,231,232,233,234,235,236,237,238,239,240,241,260,261,262,263,264,265,266,268,269,289,290,291,292,293,319,320,321,322,347,348,349,350,376,377,378,379,380,381,405,406,407,408,409,410,434,435,436,437,438,439,463,464,465,466,467,493,494,495,496,518,519,520,521,522,523,524,544,545,546,547,548,549,550,551,570,571,572,573,574,575,576,577,578,596,597,598,599,600,601,602,603,604,605,622,623,624,625,626,627,628,629,630,631,648,649,650,651,652,653,654,655,656,657,676,677,678,679,680,681,682,683],[3.0,18.0,18.0,18.0,126.0,136.0,175.0,26.0,166.0,255.0,247.0,127.0,30.0,36.0,94.0,154.0,170.0,253.0,253.0,253.0,253.0,253.0,225.0,172.0,253.0,242.0,195.0,64.0,49.0,238.0,253.0,253.0,253.0,253.0,253.0,253.0,253.0,253.0,251.0,93.0,82.0,82.0,56.0,39.0,18.0,219.0,253.0,253.0,253.0,253.0,253.0,198.0,182.0,247.0,241.0,80.0,156.0,107.0,253.0,253.0,205.0,11.0,43.0,154.0,14.0,1.0,154.0,253.0,90.0,139.0,253.0,190.0,2.0,11.0,190.0,253.0,70.0,35.0,241.0,225.0,160.0,108.0,1.0,81.0,240.0,253.0,253.0,119.0,25.0,45.0,186.0,253.0,253.0,150.0,27.0,16.0,93.0,252.0,253.0,187.0,249.0,253.0,249.0,64.0,46.0,130.0,183.0,253.0,253.0,207.0,2.0,39.0,148.0,229.0,253.0,253.0,253.0,250.0,182.0,24.0,114.0,221.0,253.0,253.0,253.0,253.0,201.0,78.0,23.0,66.0,213.0,253.0,253.0,253.0,253.0,198.0,81.0,2.0,18.0,171.0,219.0,253.0,253.0,253.0,253.0,195.0,80.0,9.0,55.0,172.0,226.0,253.0,253.0,253.0,253.0,244.0,133.0,11.0,136.0,253.0,253.0,253.0,212.0,135.0,132.0,16.0])))

display(examples.toDF) // covert to DataFrame and display for convenient db visualization

label

features

1

2

3

4

5

6

7

8

9

10

11

12

5

[0, 780, [152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 260, 261, 262, 263, 264, 265, 266, 268, 269, 289, 290, 291, 292, 293, 319, 320, 321, 322, 347, 348, 349, 350, 376, 377, 378, 379, 380, 381, 405, 406, 407, 408, 409, 410, 434, 435, 436, 437, 438, 439, 463, 464, 465, 466, 467, 493, 494, 495, 496, 518, 519, 520, 521, 522, 523, 524, 544, 545, 546, 547, 548, 549, 550, 551, 570, 571, 572, 573, 574, 575, 576, 577, 578, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 676, 677, 678, 679, 680, 681, 682, 683], [3, 18, 18, 18, 126, 136, 175, 26, 166, 255, 247, 127, 30, 36, 94, 154, 170, 253, 253, 253, 253, 253, 225, 172, 253, 242, 195, 64, 49, 238, 253, 253, 253, 253, 253, 253, 253, 253, 251, 93, 82, 82, 56, 39, 18, 219, 253, 253, 253, 253, 253, 198, 182, 247, 241, 80, 156, 107, 253, 253, 205, 11, 43, 154, 14, 1, 154, 253, 90, 139, 253, 190, 2, 11, 190, 253, 70, 35, 241, 225, 160, 108, 1, 81, 240, 253, 253, 119, 25, 45, 186, 253, 253, 150, 27, 16, 93, 252, 253, 187, 249, 253, 249, 64, 46, 130, 183, 253, 253, 207, 2, 39, 148, 229, 253, 253, 253, 250, 182, 24, 114, 221, 253, 253, 253, 253, 201, 78, 23, 66, 213, 253, 253, 253, 253, 198, 81, 2, 18, 171, 219, 253, 253, 253, 253, 195, 80, 9, 55, 172, 226, 253, 253, 253, 253, 244, 133, 11, 136, 253, 253, 253, 212, 135, 132, 16]]

0

[0, 780, [127, 128, 129, 130, 131, 154, 155, 156, 157, 158, 159, 181, 182, 183, 184, 185, 186, 187, 188, 189, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 289, 290, 291, 292, 293, 294, 295, 296, 297, 300, 301, 302, 316, 317, 318, 319, 320, 321, 328, 329, 330, 343, 344, 345, 346, 347, 348, 349, 356, 357, 358, 371, 372, 373, 374, 384, 385, 386, 399, 400, 401, 412, 413, 414, 426, 427, 428, 429, 440, 441, 442, 454, 455, 456, 457, 466, 467, 468, 469, 470, 482, 483, 484, 493, 494, 495, 496, 497, 510, 511, 512, 520, 521, 522, 523, 538, 539, 540, 547, 548, 549, 550, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 622, 623, 624, 625, 626, 627, 628, 629, 630, 651, 652, 653, 654, 655, 656, 657], [51, 159, 253, 159, 50, 48, 238, 252, 252, 252, 237, 54, 227, 253, 252, 239, 233, 252, 57, 6, 10, 60, 224, 252, 253, 252, 202, 84, 252, 253, 122, 163, 252, 252, 252, 253, 252, 252, 96, 189, 253, 167, 51, 238, 253, 253, 190, 114, 253, 228, 47, 79, 255, 168, 48, 238, 252, 252, 179, 12, 75, 121, 21, 253, 243, 50, 38, 165, 253, 233, 208, 84, 253, 252, 165, 7, 178, 252, 240, 71, 19, 28, 253, 252, 195, 57, 252, 252, 63, 253, 252, 195, 198, 253, 190, 255, 253, 196, 76, 246, 252, 112, 253, 252, 148, 85, 252, 230, 25, 7, 135, 253, 186, 12, 85, 252, 223, 7, 131, 252, 225, 71, 85, 252, 145, 48, 165, 252, 173, 86, 253, 225, 114, 238, 253, 162, 85, 252, 249, 146, 48, 29, 85, 178, 225, 253, 223, 167, 56, 85, 252, 252, 252, 229, 215, 252, 252, 252, 196, 130, 28, 199, 252, 252, 253, 252, 252, 233, 145, 25, 128, 252, 253, 252, 141, 37]]

4

[0, 780, [160, 161, 162, 172, 173, 188, 189, 190, 200, 201, 215, 216, 217, 218, 228, 229, 243, 244, 245, 256, 257, 271, 272, 273, 283, 284, 285, 299, 300, 301, 311, 312, 313, 326, 327, 328, 329, 339, 340, 341, 354, 355, 356, 357, 367, 368, 369, 379, 380, 381, 382, 383, 384, 395, 396, 397, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 452, 453, 454, 455, 456, 457, 458, 459, 465, 466, 467, 493, 494, 495, 521, 522, 523, 549, 550, 551, 577, 578, 579, 605, 606, 607, 633, 634, 635, 661, 662, 663, 689, 690, 691], [67, 232, 39, 62, 81, 120, 180, 39, 126, 163, 2, 153, 210, 40, 220, 163, 27, 254, 162, 222, 163, 183, 254, 125, 46, 245, 163, 198, 254, 56, 120, 254, 163, 23, 231, 254, 29, 159, 254, 120, 163, 254, 216, 16, 159, 254, 67, 14, 86, 178, 248, 254, 91, 159, 254, 85, 47, 49, 116, 144, 150, 241, 243, 234, 179, 241, 252, 40, 150, 253, 237, 207, 207, 207, 253, 254, 250, 240, 198, 143, 91, 28, 5, 233, 250, 119, 177, 177, 177, 177, 177, 98, 56, 102, 254, 220, 169, 254, 137, 169, 254, 57, 169, 254, 57, 169, 255, 94, 169, 254, 96, 169, 254, 153, 169, 255, 153, 96, 254, 153]]

1

[0, 780, [158, 159, 160, 161, 185, 186, 187, 188, 189, 213, 214, 215, 216, 217, 240, 241, 242, 243, 244, 245, 267, 268, 269, 270, 271, 295, 296, 297, 298, 322, 323, 324, 325, 326, 349, 350, 351, 352, 353, 377, 378, 379, 380, 381, 404, 405, 406, 407, 408, 431, 432, 433, 434, 435, 459, 460, 461, 462, 463, 486, 487, 488, 489, 490, 514, 515, 516, 517, 518, 542, 543, 544, 545, 569, 570, 571, 572, 573, 596, 597, 598, 599, 600, 601, 624, 625, 626, 627, 652, 653, 654, 655, 680, 681, 682, 683], [124, 253, 255, 63, 96, 244, 251, 253, 62, 127, 251, 251, 253, 62, 68, 236, 251, 211, 31, 8, 60, 228, 251, 251, 94, 155, 253, 253, 189, 20, 253, 251, 235, 66, 32, 205, 253, 251, 126, 104, 251, 253, 184, 15, 80, 240, 251, 193, 23, 32, 253, 253, 253, 159, 151, 251, 251, 251, 39, 48, 221, 251, 251, 172, 234, 251, 251, 196, 12, 253, 251, 251, 89, 159, 255, 253, 253, 31, 48, 228, 253, 247, 140, 8, 64, 251, 253, 220, 64, 251, 253, 220, 24, 193, 253, 220]]

9

[0, 780, [208, 209, 210, 211, 212, 213, 214, 215, 216, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 289, 290, 291, 292, 293, 296, 297, 298, 299, 300, 316, 317, 318, 319, 320, 324, 325, 326, 327, 343, 344, 345, 346, 347, 350, 351, 352, 353, 354, 370, 371, 372, 373, 377, 378, 379, 380, 381, 382, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 489, 490, 491, 492, 517, 518, 519, 520, 546, 547, 548, 573, 574, 575, 576, 601, 602, 603, 604, 629, 630, 631, 632, 658, 659, 660, 686, 687, 688, 689, 714, 715, 716, 717, 718, 743, 744, 745, 746], [55, 148, 210, 253, 253, 113, 87, 148, 55, 87, 232, 252, 253, 189, 210, 252, 252, 253, 168, 4, 57, 242, 252, 190, 65, 5, 12, 182, 252, 253, 116, 96, 252, 252, 183, 14, 92, 252, 252, 225, 21, 132, 253, 252, 146, 14, 215, 252, 252, 79, 126, 253, 247, 176, 9, 8, 78, 245, 253, 129, 16, 232, 252, 176, 36, 201, 252, 252, 169, 11, 22, 252, 252, 30, 22, 119, 197, 241, 253, 252, 251, 77, 16, 231, 252, 253, 252, 252, 252, 226, 227, 252, 231, 55, 235, 253, 217, 138, 42, 24, 192, 252, 143, 62, 255, 253, 109, 71, 253, 252, 21, 253, 252, 21, 71, 253, 252, 21, 106, 253, 252, 21, 45, 255, 253, 21, 218, 252, 56, 96, 252, 189, 42, 14, 184, 252, 170, 11, 14, 147, 252, 42]]

2

[0, 780, [155, 156, 157, 158, 159, 181, 182, 183, 184, 185, 186, 187, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 261, 262, 263, 264, 265, 266, 267, 269, 270, 271, 272, 289, 290, 291, 292, 293, 294, 297, 298, 299, 300, 317, 318, 319, 320, 325, 326, 327, 328, 353, 354, 355, 356, 377, 378, 379, 380, 381, 382, 383, 384, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 455, 456, 457, 458, 459, 460, 462, 463, 464, 465, 466, 467, 468, 469, 482, 483, 484, 485, 486, 487, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 509, 510, 511, 512, 513, 514, 516, 517, 518, 519, 520, 522, 523, 524, 525, 526, 527, 528, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 553, 554, 555, 556, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 593, 594, 595, 596, 597, 598, 599, 600, 601, 621, 622, 623, 624, 625, 626], [13, 25, 100, 122, 7, 33, 151, 208, 252, 252, 252, 146, 40, 152, 244, 252, 253, 224, 211, 252, 232, 40, 15, 152, 239, 252, 252, 252, 216, 31, 37, 252, 252, 60, 96, 252, 252, 252, 252, 217, 29, 37, 252, 252, 60, 181, 252, 252, 220, 167, 30, 77, 252, 252, 60, 26, 128, 58, 22, 100, 252, 252, 60, 157, 252, 252, 60, 110, 121, 122, 121, 202, 252, 194, 3, 10, 53, 179, 253, 253, 255, 253, 253, 228, 35, 5, 54, 227, 252, 243, 228, 170, 242, 252, 252, 231, 117, 6, 6, 78, 252, 252, 125, 59, 18, 208, 252, 252, 252, 252, 87, 7, 5, 135, 252, 252, 180, 16, 21, 203, 253, 247, 129, 173, 252, 252, 184, 66, 49, 49, 3, 136, 252, 241, 106, 17, 53, 200, 252, 216, 65, 14, 72, 163, 241, 252, 252, 223, 105, 252, 242, 88, 18, 73, 170, 244, 252, 126, 29, 89, 180, 180, 37, 231, 252, 245, 205, 216, 252, 252, 252, 124, 3, 207, 252, 252, 252, 252, 178, 116, 36, 4, 13, 93, 143, 121, 23, 6]]

1

[0, 780, [124, 125, 126, 127, 151, 152, 153, 154, 155, 179, 180, 181, 182, 183, 208, 209, 210, 211, 235, 236, 237, 238, 239, 263, 264, 265, 266, 267, 268, 292, 293, 294, 295, 296, 321, 322, 323, 324, 349, 350, 351, 352, 377, 378, 379, 380, 405, 406, 407, 408, 433, 434, 435, 436, 461, 462, 463, 464, 489, 490, 491, 492, 493, 517, 518, 519, 520, 521, 545, 546, 547, 548, 549, 574, 575, 576, 577, 578, 602, 603, 604, 605, 606, 630, 631, 632, 633, 634, 658, 659, 660, 661, 662], [145, 255, 211, 31, 32, 237, 253, 252, 71, 11, 175, 253, 252, 71, 144, 253, 252, 71, 16, 191, 253, 252, 71, 26, 221, 253, 252, 124, 31, 125, 253, 252, 252, 108, 253, 252, 252, 108, 255, 253, 253, 108, 253, 252, 252, 108, 253, 252, 252, 108, 253, 252, 252, 108, 255, 253, 253, 170, 253, 252, 252, 252, 42, 149, 252, 252, 252, 144, 109, 252, 252, 252, 144, 218, 253, 253, 255, 35, 175, 252, 252, 253, 35, 73, 252, 252, 253, 35, 31, 211, 252, 253, 35]]

3

[0, 780, [151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 261, 262, 263, 264, 269, 270, 271, 272, 273, 274, 297, 298, 299, 300, 301, 324, 325, 326, 327, 328, 329, 350, 351, 352, 353, 354, 355, 356, 357, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 492, 493, 494, 495, 520, 521, 522, 523, 538, 539, 540, 547, 548, 549, 550, 551, 565, 566, 567, 568, 573, 574, 575, 576, 577, 578, 579, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 678, 679, 680, 681, 682, 683, 684], [38, 43, 105, 255, 253, 253, 253, 253, 253, 174, 6, 43, 139, 224, 226, 252, 253, 252, 252, 252, 252, 252, 252, 158, 14, 178, 252, 252, 252, 252, 253, 252, 252, 252, 252, 252, 252, 252, 59, 109, 252, 252, 230, 132, 133, 132, 132, 189, 252, 252, 252, 252, 59, 4, 29, 29, 24, 14, 226, 252, 252, 172, 7, 85, 243, 252, 252, 144, 88, 189, 252, 252, 252, 14, 91, 212, 247, 252, 252, 252, 204, 9, 32, 125, 193, 193, 193, 253, 252, 252, 252, 238, 102, 28, 45, 222, 252, 252, 252, 252, 253, 252, 252, 252, 177, 45, 223, 253, 253, 253, 253, 255, 253, 253, 253, 253, 74, 31, 123, 52, 44, 44, 44, 44, 143, 252, 252, 74, 15, 252, 252, 74, 86, 252, 252, 74, 5, 75, 9, 98, 242, 252, 252, 74, 61, 183, 252, 29, 18, 92, 239, 252, 252, 243, 65, 208, 252, 252, 147, 134, 134, 134, 134, 203, 253, 252, 252, 188, 83, 208, 252, 252, 252, 252, 252, 252, 252, 252, 253, 230, 153, 8, 49, 157, 252, 252, 252, 252, 252, 217, 207, 146, 45, 7, 103, 235, 252, 172, 103, 24]]

1

[0, 780, [152, 153, 154, 180, 181, 182, 183, 208, 209, 210, 211, 236, 237, 238, 239, 264, 265, 266, 267, 292, 293, 294, 295, 320, 321, 322, 323, 349, 350, 351, 377, 378, 379, 405, 406, 407, 433, 434, 435, 461, 462, 463, 489, 490, 491, 492, 517, 518, 519, 520, 546, 547, 548, 574, 575, 576, 602, 603, 604, 630, 631, 632, 658, 659, 660, 686, 687, 688], [5, 63, 197, 20, 254, 230, 24, 20, 254, 254, 48, 20, 254, 255, 48, 20, 254, 254, 57, 20, 254, 254, 108, 16, 239, 254, 143, 178, 254, 143, 178, 254, 143, 178, 254, 162, 178, 254, 240, 113, 254, 240, 83, 254, 245, 31, 79, 254, 246, 38, 214, 254, 150, 144, 241, 8, 144, 240, 2, 144, 254, 82, 230, 247, 40, 168, 209, 31]]

4

[0, 780, [134, 135, 161, 162, 163, 188, 189, 190, 191, 216, 217, 218, 236, 237, 238, 243, 244, 245, 246, 264, 265, 266, 271, 272, 273, 292, 293, 294, 298, 299, 300, 301, 319, 320, 321, 322, 325, 326, 327, 328, 329, 346, 347, 348, 349, 353, 354, 355, 373, 374, 375, 376, 380, 381, 382, 383, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 482, 483, 484, 488, 489, 490, 491, 492, 493, 494, 510, 511, 516, 517, 518, 519, 520, 521, 522, 543, 544, 545, 546, 571, 572, 573, 574, 598, 599, 600, 601, 626, 627, 628, 654, 655, 656], [189, 190, 143, 247, 153, 136, 247, 242, 86, 192, 252, 187, 62, 185, 18, 89, 236, 217, 47, 216, 253, 60, 212, 255, 81, 206, 252, 68, 48, 242, 253, 89, 131, 251, 212, 21, 11, 167, 252, 197, 5, 29, 232, 247, 63, 153, 252, 226, 45, 219, 252, 143, 116, 249, 252, 103, 4, 96, 253, 255, 253, 200, 122, 7, 25, 201, 250, 158, 92, 252, 252, 253, 217, 252, 252, 200, 227, 252, 231, 87, 251, 247, 231, 65, 48, 189, 252, 252, 253, 252, 251, 227, 35, 190, 221, 98, 42, 196, 252, 253, 252, 252, 162, 111, 29, 62, 239, 252, 86, 42, 42, 14, 15, 148, 253, 218, 121, 252, 231, 28, 31, 221, 251, 129, 218, 252, 160, 122, 252, 82]]

3

[0, 780, [123, 124, 125, 126, 127, 128, 129, 150, 151, 152, 153, 154, 155, 156, 157, 178, 179, 180, 181, 182, 183, 184, 185, 186, 207, 208, 209, 210, 211, 212, 213, 214, 236, 237, 238, 239, 240, 241, 242, 264, 265, 266, 267, 268, 269, 270, 293, 294, 295, 296, 297, 298, 320, 321, 322, 323, 324, 325, 326, 346, 347, 348, 349, 350, 351, 352, 353, 354, 374, 375, 376, 377, 378, 379, 380, 381, 382, 403, 404, 405, 406, 407, 408, 409, 410, 432, 433, 434, 435, 436, 437, 438, 463, 464, 465, 466, 467, 491, 492, 493, 494, 495, 519, 520, 521, 522, 538, 539, 540, 546, 547, 548, 549, 550, 566, 567, 568, 569, 570, 572, 573, 574, 575, 576, 577, 578, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 623, 624, 625, 626, 627, 628, 629, 630, 631, 652, 653, 654, 655, 656, 657, 658, 659], [42, 118, 219, 166, 118, 118, 6, 103, 242, 254, 254, 254, 254, 254, 66, 18, 232, 254, 254, 254, 254, 254, 238, 70, 104, 244, 254, 224, 254, 254, 254, 141, 207, 254, 210, 254, 254, 254, 34, 84, 206, 254, 254, 254, 254, 41, 24, 209, 254, 254, 254, 171, 91, 137, 253, 254, 254, 254, 112, 40, 214, 250, 254, 254, 254, 254, 254, 34, 81, 247, 254, 254, 254, 254, 254, 254, 146, 110, 246, 254, 254, 254, 254, 254, 171, 73, 89, 89, 93, 240, 254, 171, 1, 128, 254, 219, 31, 7, 254, 254, 214, 28, 138, 254, 254, 116, 19, 177, 90, 25, 240, 254, 254, 34, 164, 254, 215, 63, 36, 51, 89, 206, 254, 254, 139, 8, 57, 197, 254, 254, 222, 180, 241, 254, 254, 253, 213, 11, 140, 105, 254, 254, 254, 254, 254, 254, 236, 7, 117, 117, 165, 254, 254, 239, 50]]

5

[0, 780, [216, 217, 218, 219, 220, 221, 242, 243, 244, 245, 246, 247, 248, 249, 268, 269, 270, 271, 272, 273, 274, 275, 276, 294, 295, 296, 297, 298, 299, 300, 301, 322, 323, 324, 325, 347, 348, 349, 350, 351, 352, 375, 376, 377, 378, 403, 404, 405, 431, 432, 433, 434, 456, 457, 459, 460, 461, 462, 463, 482, 483, 484, 485, 488, 489, 490, 491, 510, 511, 512, 516, 517, 518, 519, 538, 539, 540, 541, 542, 543, 544, 545, 546, 567, 568, 569, 570, 571, 572, 573, 574], [31, 40, 129, 234, 234, 159, 68, 150, 239, 254, 253, 253, 253, 215, 156, 201, 254, 254, 254, 241, 150, 98, 8, 19, 154, 254, 236, 203, 83, 39, 30, 144, 253, 145, 12, 10, 129, 222, 78, 79, 8, 134, 253, 167, 8, 255, 254, 78, 201, 253, 226, 69, 55, 6, 18, 128, 253, 241, 41, 25, 205, 235, 92, 20, 253, 253, 58, 231, 245, 108, 132, 253, 185, 14, 121, 245, 254, 254, 254, 217, 254, 223, 50, 116, 165, 233, 233, 234, 180, 39, 3]]

Truncated results, showing first 715 rows.

The pixel intensities are represented in features as a sparse vector, for example the first observation, as seen in row 1 of the output to display(training) below, has label as 5, i.e. the hand-written image is for the number 5. And this hand-written image is the following sparse vector (just click the triangle to the left of the feature in first row to see the following):

type: 0
size: 780
indices: [152,153,155,...,682,683]
values: [3, 18, 18,18,126,...,132,16]

Here

type: 0 says we hve a sparse vector.
size: 780 says the vector has 780 indices in total
- these indices from 0,...,779 are a unidimensional indexing of the two-dimensional array of pixels in the image
indices: [152,153,155,...,682,683] are the indices from the [0,1,...,779] possible indices with non-zero values
- a value is an integer encoding the gray-level at the pixel index
values: [3, 18, 18,18,126,...,132,16] are the actual gray level values, for example:
- at pixed index 152 the gray-level value is 3,
- at index 153 the gray-level value is 18,
- ..., and finally at
- at index 683 the gray-level value is 18

019x_002_LabeledPoint(Scala)

Overview

Data Types - MLlib Programming Guide

Labeled point in Scala

Load MNIST training and test datasets