030_PowerPlantPipeline_03ModelTuneEvaluateDeploy(Scala)

Archived YouTube video of this live unedited lab-lecture:

Archived YouTube video of this live unedited lab-lecture Archived YouTube video of this live unedited lab-lecture

Power Plant ML Pipeline Application

This is an end-to-end example of using a number of different machine learning algorithms to solve a supervised regression problem.

Table of Contents

  • Step 1: Business Understanding
  • Step 2: Load Your Data
  • Step 3: Explore Your Data
  • Step 4: Visualize Your Data
  • Step 5: Data Preparation
  • Step 6: Data Modeling
  • Step 7: Tuning and Evaluation
  • Step 8: Deployment

We are trying to predict power output given a set of readings from various sensors in a gas-fired power generation plant. Power generation is a complex process, and understanding and predicting power output is an important element in managing a plant and its connection to the power grid.

More information about Peaker or Peaking Power Plants can be found on Wikipedia https://en.wikipedia.org/wiki/Peaking_power_plant

Given this business problem, we need to translate it to a Machine Learning task. The ML task is regression since the label (or target) we are trying to predict is numeric.

The example data is provided by UCI at UCI Machine Learning Repository Combined Cycle Power Plant Data Set

You can read the background on the UCI page, but in summary we have collected a number of readings from sensors at a Gas Fired Power Plant

(also called a Peaker Plant) and now we want to use those sensor readings to predict how much power the plant will generate.

More information about Machine Learning with Spark can be found in the Spark MLLib Programming Guide

Please note this example only works with Spark version 1.4 or higher



To Rerun Steps 1-4 done in the notebook at:

just run the following command as shown in the cell below:

  %run "/scalable-data-science/sds-2-2/009_PowerPlantPipeline_01ETLEDA"
  • Note: If you already evaluated the %run ... command above then:
    • first delete the cell by pressing on x on the top-right corner of the cell and
    • revaluate the run command above.
%run "/scalable-data-science/sds-2-x/009_PowerPlantPipeline_01ETLEDA"
res2: Int = 240
dbfs:/databricks-datasets/power-plant/data/Sheet1.tsvSheet1.tsv308693
dbfs:/databricks-datasets/power-plant/data/Sheet2.tsvSheet2.tsv308693
dbfs:/databricks-datasets/power-plant/data/Sheet3.tsvSheet3.tsv308693
dbfs:/databricks-datasets/power-plant/data/Sheet4.tsvSheet4.tsv308693
dbfs:/databricks-datasets/power-plant/data/Sheet5.tsvSheet5.tsv308693
powerPlantRDD: org.apache.spark.rdd.RDD[String] = /databricks-datasets/power-plant/data/Sheet1.tsv MapPartitionsRDD[28866] at textFile at command-45638284503054:1
AT V AP RH PE 14.96 41.76 1024.07 73.17 463.26 25.18 62.96 1020.04 59.08 444.37 5.11 39.4 1012.16 92.14 488.56 20.86 57.32 1010.24 76.64 446.48
powerPlantDF: org.apache.spark.sql.DataFrame = [AT: double, V: double ... 3 more fields]
root |-- AT: double (nullable = true) |-- V: double (nullable = true) |-- AP: double (nullable = true) |-- RH: double (nullable = true) |-- PE: double (nullable = true)
res9: Long = 9568
+-----+-----+-------+-----+------+ | AT| V| AP| RH| PE| +-----+-----+-------+-----+------+ |14.96|41.76|1024.07|73.17|463.26| |25.18|62.96|1020.04|59.08|444.37| | 5.11| 39.4|1012.16|92.14|488.56| |20.86|57.32|1010.24|76.64|446.48| |10.82| 37.5|1009.23|96.62| 473.9| |26.27|59.44|1012.23|58.77|443.67| |15.89|43.96|1014.02|75.24|467.35| | 9.48|44.71|1019.12|66.43|478.42| |14.64| 45.0|1021.78|41.25|475.98| |11.74|43.56|1015.14|70.72| 477.5| +-----+-----+-------+-----+------+ only showing top 10 rows
PE1.00k1.01k1.02k1.03k10.020.030.0100RHAP1.00k50.0V40.060.080.010040.060.0AT

Showing sample based on the first 1000 rows.

res12: Long = 9568
+--------+--------------------+-----------+ |database| tableName|isTemporary| +--------+--------------------+-----------+ | default| adult| false| | default| business_csv_csv| false| | default| checkin_table| false| | default| diamonds| false| | default| inventory| false| | default|item_merchant_cat...| false| | default| items_left_csv| false| | default| logistic_detail| false| | default| merchant_ratings| false| | default| order_data| false| | default| order_ids_left_csv| false| | default| repeat_csv| false| | default| review_2019_csv| false| | default|sample_logistic_t...| false| | default| sentimentlex_csv| false| | default| simple_range| false| | default| social_media_usage| false| | default| tip_json| false| | default| tips_csv_csv| false| | default| users_csv| false| +--------+--------------------+-----------+
+------------------------+--------+-----------+---------+-----------+ |name |database|description|tableType|isTemporary| +------------------------+--------+-----------+---------+-----------+ |adult |default |null |EXTERNAL |false | |business_csv_csv |default |null |EXTERNAL |false | |checkin_table |default |null |MANAGED |false | |diamonds |default |null |EXTERNAL |false | |inventory |default |null |MANAGED |false | |item_merchant_categories|default |null |MANAGED |false | |items_left_csv |default |null |EXTERNAL |false | |logistic_detail |default |null |MANAGED |false | |merchant_ratings |default |null |MANAGED |false | |order_data |default |null |MANAGED |false | |order_ids_left_csv |default |null |EXTERNAL |false | |repeat_csv |default |null |MANAGED |false | |review_2019_csv |default |null |EXTERNAL |false | |sample_logistic_table |default |null |EXTERNAL |false | |sentimentlex_csv |default |null |EXTERNAL |false | |simple_range |default |null |MANAGED |false | |social_media_usage |default |null |MANAGED |false | |tip_json |default |null |EXTERNAL |false | |tips_csv_csv |default |null |EXTERNAL |false | |users_csv |default |null |EXTERNAL |false | +------------------------+--------+-----------+---------+-----------+
+---------+---------------------+--------------------------------------+ |name |description |locationUri | +---------+---------------------+--------------------------------------+ |db_ad_gcs| |dbfs:/user/hive/warehouse/db_ad_gcs.db| |default |Default Hive database|dbfs:/user/hive/warehouse | +---------+---------------------+--------------------------------------+
+--------+--------------------+-----------+ |database| tableName|isTemporary| +--------+--------------------+-----------+ | default| adult| false| | default| business_csv_csv| false| | default| checkin_table| false| | default| diamonds| false| | default| inventory| false| | default|item_merchant_cat...| false| | default| items_left_csv| false| | default| logistic_detail| false| | default| merchant_ratings| false| | default| order_data| false| | default| order_ids_left_csv| false| | default| repeat_csv| false| | default| review_2019_csv| false| | default|sample_logistic_t...| false| | default| sentimentlex_csv| false| | default| simple_range| false| | default| social_media_usage| false| | default| tip_json| false| | default| tips_csv_csv| false| | default| users_csv| false| +--------+--------------------+-----------+ only showing top 20 rows
14.9641.761024.0773.17463.26
25.1862.961020.0459.08444.37
5.1139.41012.1692.14488.56
20.8657.321010.2476.64446.48
10.8237.51009.2396.62473.9
26.2759.441012.2358.77443.67
15.8943.961014.0275.24467.35
9.4844.711019.1266.43478.42
14.64451021.7841.25475.98
11.7443.561015.1470.72477.5
17.9943.721008.6475.04453.02
20.1446.931014.6664.22453.99
24.3473.51011.3184.15440.29
25.7158.591012.7761.83451.28
26.1969.341009.4887.59433.99
21.4243.791015.7643.08462.19
18.21451022.8648.84467.54
11.0441.741022.677.51477.2
14.4552.751023.9763.59459.85
13.9738.471015.1555.28464.3
17.7642.421009.0966.26468.27
5.4140.071019.1664.77495.24
7.7642.281008.5283.31483.8
27.2363.91014.347.19443.61
27.3648.61003.1854.93436.06
27.4770.721009.9774.62443.25
14.639.311011.1172.52464.16
7.9139.961023.5788.44475.52
5.8135.791012.1492.28484.41
30.5365.181012.6941.85437.89
23.8763.941019.0244.28445.11
26.0958.411013.6464.58438.86
29.2766.851011.1163.25440.98
27.3874.161010.0878.61436.65
24.8163.941018.7644.51444.26
12.7544.031007.2989.46465.86
24.6663.731011.474.52444.37
16.3847.451010.0888.86450.69
13.9139.351014.6975.51469.02
23.1851.31012.0478.64448.86
22.4747.451007.6276.65447.14
13.3944.851017.2480.44469.18
9.2841.541018.3379.89482.8
11.8242.861014.1288.28476.7
10.2740.641020.6384.6474.99
22.9263.941019.2842.69444.22
1637.871020.2478.41461.33
21.2243.431010.9661.07448.06
13.4644.711014.5150474.6
9.3940.111029.1477.29473.05
31.0773.51010.5843.66432.06
12.8238.621018.7183.8467.41
32.5778.921011.666.47430.12
8.1142.181014.8293.09473.62
13.9239.391012.9480.52471.81
23.0459.431010.2368.99442.99
27.3164.441014.6557.27442.77
5.9139.331010.1895.53491.49
25.2661.081013.6871.72447.46
27.9758.841002.2557.88446.11
26.0852.31007.0363.34442.44
29.0165.711013.6148.07446.22
12.1840.11016.6791.87471.49
13.7645.871008.8987.27463.5
25.558.791016.0264.4440.01
28.2665.341014.5643.4441.03
21.3962.961019.4972.24452.68
7.2640.691020.4390.22474.91
10.5434.031018.7174478.77
27.7174.34998.1471.85434.2
23.1168.31017.8386.62437.91
7.5141.011024.6197.41477.61
26.4674.671016.6584.44431.65
29.3474.34998.5881.55430.57
10.3242.281008.8275.66481.09
22.7461.021009.5679.41445.56
13.4839.851012.7158.91475.74
25.5269.751010.3690.06435.12
21.5867.251017.3979446.15
27.6676.861001.3169.47436.64
26.9669.451013.8951.47436.69
12.2942.181016.5383.13468.75
15.8643.021012.1840.33466.6
13.8745.081024.4281.69465.48
24.0973.681014.9394.55441.34
20.4569.451012.5391.81441.83
15.0739.3101963.62464.7
32.7269.751009.649.35437.99
18.2358.961015.5569.61459.12
35.5668.941006.5638.75429.69
18.3651.431010.5790.17459.8
26.3564.051009.8181.24433.63
25.9260.951014.6248.46442.84
8.0141.661014.4976.72485.13
19.6352.721025.0951.16459.12
20.0267.321012.0576.34445.31
10.0840.721022.767.3480.8
27.2366.481005.2352.38432.55
23.3763.771013.4276.44443.86
18.7459.211018.391.55449.77

Showing the first 1000 rows.

ATdoublenull
Vdoublenull
APdoublenull
RHdoublenull
PEdoublenull
count95689568956895689568
mean19.6512311872910254.305803720736011013.259078177260373.30897784280926454.3650094063554
stddev7.452473229611082512.7078929983267845.93878370581158114.60026875672896417.066994999803402
min1.8125.36992.8925.56420.26
max37.1181.561033.3100.16495.76
4304354404454504554604654704754804854904955.0010.015.020.025.030.035.0TemperaturePower

Showing sample based on the first 1000 rows.

43043544044545045546046547047548048549049530.040.050.060.070.0ExhaustVaccumPower

Showing sample based on the first 1000 rows.

4304354404454504554604654704754804854904951.00k1.01k1.02k1.03kPressurePower

Showing sample based on the first 1000 rows.

43043544044545045546046547047548048549049540.060.080.0100HumidityPower

Showing sample based on the first 1000 rows.

PE1.00k1.01k1.02k1.03k10.020.030.0100RHAP1.00k50.0V40.060.080.010040.060.0AT

Showing sample based on the first 1000 rows.



Now we will do the following Steps:

Step 5: Data Preparation,

Step 6: Modeling, and

Step 7: Tuning and Evaluation

Step 8: Deployment

Step 5: Data Preparation

The next step is to prepare the data. Since all of this data is numeric and consistent, this is a simple task for us today.

We will need to convert the predictor features from columns to Feature Vectors using the org.apache.spark.ml.feature.VectorAssembler

The VectorAssembler will be the first step in building our ML pipeline.

//Let's quickly recall the schema
// the table is available
table("power_plant_table").printSchema
root |-- AT: double (nullable = true) |-- V: double (nullable = true) |-- AP: double (nullable = true) |-- RH: double (nullable = true) |-- PE: double (nullable = true)
//the DataFrame should also be available
powerPlantDF 
res22: org.apache.spark.sql.DataFrame = [AT: double, V: double ... 3 more fields]
import org.apache.spark.ml.feature.VectorAssembler

// make a DataFrame called dataset from the table
val dataset = sqlContext.table("power_plant_table") 

val vectorizer =  new VectorAssembler()
                      .setInputCols(Array("AT", "V", "AP", "RH"))
                      .setOutputCol("features")
import org.apache.spark.ml.feature.VectorAssembler dataset: org.apache.spark.sql.DataFrame = [AT: double, V: double ... 3 more fields] vectorizer: org.apache.spark.ml.feature.VectorAssembler = vecAssembler_00521eb9630e

Step 6: Data Modeling

Now let's model our data to predict what the power output will be given a set of sensor readings

Our first model will be based on simple linear regression since we saw some linear patterns in our data based on the scatter plots during the exploration stage.

Show code
frameIt: (u: String, h: Int)String

Linear Regression Model

  • Linear Regression is one of the most useful work-horses of statistical learning
  • See Chapter 7 of Kevin Murphy's Machine Learning froma Probabilistic Perspective for a good mathematical and algorithmic introduction.
  • You should have already seen Ameet's treatment of the topic from earlier notebook.
// First let's hold out 20% of our data for testing and leave 80% for training
var Array(split20, split80) = dataset.randomSplit(Array(0.20, 0.80), 1800009193L)
split20: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [AT: double, V: double ... 3 more fields] split80: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [AT: double, V: double ... 3 more fields]
// Let's cache these datasets for performance
val testSet = split20.cache()
val trainingSet = split80.cache()
testSet: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [AT: double, V: double ... 3 more fields] trainingSet: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [AT: double, V: double ... 3 more fields]
testSet.count() // action to actually cache
res24: Long = 1966
trainingSet.count() // action to actually cache
res25: Long = 7602

Let's take a few elements of the three DataFrames.

dataset.take(3)
res26: Array[org.apache.spark.sql.Row] = Array([14.96,41.76,1024.07,73.17,463.26], [25.18,62.96,1020.04,59.08,444.37], [5.11,39.4,1012.16,92.14,488.56])
testSet.take(3)
res27: Array[org.apache.spark.sql.Row] = Array([1.81,39.42,1026.92,76.97,490.55], [3.2,41.31,997.67,98.84,489.86], [3.38,41.31,998.79,97.76,489.11])
trainingSet.take(3)
res28: Array[org.apache.spark.sql.Row] = Array([2.34,39.42,1028.47,69.68,490.34], [2.58,39.42,1028.68,69.03,488.69], [2.64,39.64,1011.02,85.24,481.29])
// ***** LINEAR REGRESSION MODEL ****

import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.ml.regression.LinearRegressionModel
import org.apache.spark.ml.Pipeline

// Let's initialize our linear regression learner
val lr = new LinearRegression()
import org.apache.spark.ml.regression.LinearRegression import org.apache.spark.ml.regression.LinearRegressionModel import org.apache.spark.ml.Pipeline lr: org.apache.spark.ml.regression.LinearRegression = linReg_955431dccb4f
// We use explain params to dump the parameters we can use
lr.explainParams()
res29: String = aggregationDepth: suggested depth for treeAggregate (>= 2) (default: 2) elasticNetParam: the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty (default: 0.0) featuresCol: features column name (default: features) fitIntercept: whether to fit an intercept term (default: true) labelCol: label column name (default: label) maxIter: maximum number of iterations (>= 0) (default: 100) predictionCol: prediction column name (default: prediction) regParam: regularization parameter (>= 0) (default: 0.0) solver: the solver algorithm for optimization. If this is not set or empty, default value is 'auto' (default: auto) standardization: whether to standardize the training features before fitting the model (default: true) tol: the convergence tolerance for iterative algorithms (>= 0) (default: 1.0E-6) weightCol: weight column name. If this is not set or empty, we treat all instance weights as 1.0 (undefined)

The cell below is based on the Spark ML pipeline API. More information can be found in the Spark ML Programming Guide at https://spark.apache.org/docs/latest/ml-guide.html

// Now we set the parameters for the method
lr.setPredictionCol("Predicted_PE")
  .setLabelCol("PE")
  .setMaxIter(100)
  .setRegParam(0.1)
// We will use the new spark.ml pipeline API. If you have worked with scikit-learn this will be very familiar.
val lrPipeline = new Pipeline()
lrPipeline.setStages(Array(vectorizer, lr))
// Let's first train on the entire dataset to see what we get
val lrModel = lrPipeline.fit(trainingSet)
lrPipeline: org.apache.spark.ml.Pipeline = pipeline_a16f729772ca lrModel: org.apache.spark.ml.PipelineModel = pipeline_a16f729772ca

Since Linear Regression is simply a line of best fit over the data that minimizes the square of the error, given multiple input dimensions we can express each predictor as a line function of the form:

y=b0+b1x1+b2x2+b3x3++bixi++bkxk y = b_0 + b_1 x_1 + b_2 x_2 + b_3 x_3 + \ldots + b_i x_i + \ldots + b_k x_k

where b0b_0 is the intercept and bib_i's are coefficients.

To express the coefficients of that line we can retrieve the Estimator stage from the fitted, linear-regression pipeline model named lrModel and express the weights and the intercept for the function.

// The intercept is as follows:
val intercept = lrModel.stages(1).asInstanceOf[LinearRegressionModel].intercept
intercept: Double = 427.9139822165837
// The coefficents (i.e. weights) are as follows:

val weights = lrModel.stages(1).asInstanceOf[LinearRegressionModel].coefficients.toArray
weights: Array[Double] = Array(-1.9083064919040942, -0.25381293007161654, 0.08739350304730673, -0.1474651301033126)

The model has been fit and the intercept and coefficients displayed above.

Now, let us do some work to make a string of the model that is easy to understand for an applied data scientist or data analyst.

val featuresNoLabel = dataset.columns.filter(col => col != "PE")
featuresNoLabel: Array[String] = Array(AT, V, AP, RH)
val coefficentFeaturePairs = sc.parallelize(weights).zip(sc.parallelize(featuresNoLabel))
coefficentFeaturePairs: org.apache.spark.rdd.RDD[(Double, String)] = ZippedPartitionsRDD2[108] at zip at <console>:42
coefficentFeaturePairs.collect() // this just pairs each coefficient with the name of its corresponding feature
res30: Array[(Double, String)] = Array((-1.9083064919040942,AT), (-0.25381293007161654,V), (0.08739350304730673,AP), (-0.1474651301033126,RH))
// Now let's sort the coefficients from the largest to the smallest

var equation = s"y = $intercept "
//var variables = Array
coefficentFeaturePairs.sortByKey().collect().foreach({
  case (weight, feature) =>
  { 
        val symbol = if (weight > 0) "+" else "-"
        val absWeight = Math.abs(weight)
        equation += (s" $symbol (${absWeight} * ${feature})")
  }
}
)
equation: String = y = 427.9139822165837 - (1.9083064919040942 * AT) - (0.25381293007161654 * V) - (0.1474651301033126 * RH) + (0.08739350304730673 * AP)
// Finally here is our equation
println("Linear Regression Equation: " + equation)
Linear Regression Equation: y = 427.9139822165837 - (1.9083064919040942 * AT) - (0.25381293007161654 * V) - (0.1474651301033126 * RH) + (0.08739350304730673 * AP)

Based on examining the fitted Linear Regression Equation above:

  • There is a strong negative correlation between Atmospheric Temperature (AT) and Power Output due to the coefficient being greater than -1.91.
  • But our other dimenensions seem to have little to no correlation with Power Output.

Do you remember Step 2: Explore Your Data? When we visualized each predictor against Power Output using a Scatter Plot, only the temperature variable seemed to have a linear correlation with Power Output so our final equation seems logical.

Now let's see what our predictions look like given this model.

val predictionsAndLabels = lrModel.transform(testSet)

display(predictionsAndLabels.select("AT", "V", "AP", "RH", "PE", "Predicted_PE"))
1.8139.421026.9276.97490.55492.8503868481024
3.241.31997.6798.84489.86483.9368120270272
3.3841.31998.7997.76489.11483.850459922409
3.439.641011.183.43459.86487.4251507226833
3.5135.471017.5386.56489.07488.37401129434335
3.6338.441016.1687.38487.87487.1505396071426
3.9135.471016.9286.03488.67487.6355351796776
3.9439.91008.0697.49488.81483.9896378767201
439.91009.6497.16490.79484.0618847149547
4.1539.91007.6295.69489.8483.8158776062654
4.1539.91008.8496.68491.22483.77650720118083
4.2338.441016.4676.64489487.61554926022393
4.2439.91009.2896.74491.25483.6343648504441
4.4338.911019.0488.17491.9485.6397981724803
4.4438.441016.1475.35486.53487.37706899378225
4.6140.271012.3277.28492.85485.96972834538735
4.6535.191018.2394.78489.36485.11862159667663
4.6939.421024.5879.35486.34486.79899634464203
4.7341.31999.7793.44486.6481.99694115337115
4.7739.331011.3268.98494.91487.0395505377602
4.7842.851013.3993.36481.47482.7127506383782
4.8338.441015.3572.94485.32486.9191795580812
4.8639.41012.7391.39488.63483.6685673220653
4.8945.871007.5899.35482.69480.3452494934288
4.9542.071004.8780.88485.67483.6820847979367
4.9639.41003.5892.22486.09482.5556900620063
4.9640.071011.867.38494.75486.76704382567345
5.0740.071019.3266.17494.87487.39276206190476
5.1940.781025.2495.07482.46483.2391853805797
5.2438.681018.0378.65486.67485.46804748846023
5.2540.071019.4867.7495.23486.8376282047915
5.2842.071003.8280.84485.24482.9664790826128
5.3535.571027.1280.81488.65486.52337424855034
5.4145.871008.4797.51479.48479.7020461747409
5.4740.621018.6683.61481.56483.8603707725907
5.5239.331009.7495.25492.39481.59632996620337
5.5335.791013.4894.19484.25482.9589094130443
5.6540.721022.4685.17487.09483.5935440196594
5.66401022.0893.03475.54482.56492081062197
5.6640.621015.8784.97485.18483.05341208868646
5.6745.871008.9193.29478.44479.8666424772226
5.740.621016.0784.94482.82482.99898248352287
5.7141.311003.2489.48485.83481.0140181620884
5.7240.811025.7892.46487.8482.6522450331836
5.7340.351012.2491.84490.5481.65803626550104
5.7645.871010.8395.79481.4479.4940275935438
5.7938.681017.1970.46487.4485.55280779089935
5.845.871009.1492.06481.6479.82004524900304
5.8145.871009.6394.38479.66479.5016658987375
5.8240.781024.8296.01470.02481.8616297971032
5.8540.771022.4484.77480.59483.25643025675544
5.8939.481005.1159.83484.91485.6707676138384
5.9740.351012.394.1489.03480.8720151235934
5.9839.611017.2784.86482.17482.8376771392271
5.9935.791011.5691.69484.82482.28195572617585
6.0135.791011.0591.33482.25482.25230635662086
6.0541.141027.6986.93481.02482.9211493842233
6.0641.171019.6784.7489.62482.5224032770931
6.0641.171019.6784.7489.62482.5224032770931
6.0741.141027.5786.98480.19482.8651227775144
6.0943.651020.9571.15485.96483.9457142125588
6.1340.811026.3191.66483.8482.03413003220066
6.1439.41011.2190.87485.94481.1697787554499
6.1736.251028.6890.59483.77483.4800950250837
6.1739.331012.5793.32491.54480.887862061189
6.2841.061020.9690.91489.79481.3274744321715
6.2843.021013.7288.13487.17480.60722518885586
6.2940.781024.7596.37478.29480.90552075385773
6.3440.641020.6294.39478.78480.7766850294918
6.439.91007.7586.55486.03480.8813804440216
6.4140.811026.5793.51484.49481.2497160345687
6.4836.241013.6292.03484.65481.36256219865294
6.4840.271010.5582.12486.68481.5327774754329
6.5439.331011.5493.69491.16480.0372112529075
6.5739.371020.277.37487.94483.13326820062326
6.5939.371020.3477.92488.17483.026231339655
6.6736.081022.3183.51486.52483.05644648396395
6.6739.371019.9975.61486.84483.1836235447748
6.6936.241013.3591.09483.82481.07683881182743
6.7140.721022.7880.69483.11482.2593488420791
6.7239.851011.8484.66489.09480.91956153647465
6.7539.371020.2670.99486.26483.7358441723225
6.7539.91008.387.42484.05480.13324493534134
6.7636.251028.3191.16484.36482.2378034745739
6.839.371020.2473.29487.33483.29951117842876
6.8137.491010.7488.25482.21480.72127979674934
6.8138.561016.570.99487.45483.4983346847084
6.8241.031022.1287.63489.64480.8896654047192
6.8439.41011.993.75484.09479.4695661535221
6.8640.021031.577.94476.45483.31837237370024
6.8641.381021.3590.78487.49480.1926904623461
6.8642.491007.9593.96486.14478.2709460554042
6.8740.071017.9157.64491.4485.0924630969619
6.8939.371020.2174.17486.9482.99537247457505
6.8943.651019.8772.77484482.0857905249771
6.9136.081021.8284.31486.37482.4376580053312
6.9137.491011.0582.07481.88481.46887563754206
6.9340.671020.1771.16494.61483.02945770729485
6.9341.141027.1884.67479.06481.53054017882704
6.9438.911018.9490.64485.12480.47697065614113

Showing the first 1000 rows.

Now that we have real predictions we can use an evaluation metric such as Root Mean Squared Error to validate our regression model. The lower the Root Mean Squared Error, the better our model.

//Now let's compute some evaluation metrics against our test dataset

import org.apache.spark.mllib.evaluation.RegressionMetrics 

val metrics = new RegressionMetrics(predictionsAndLabels.select("Predicted_PE", "PE").rdd.map(r => (r(0).asInstanceOf[Double], r(1).asInstanceOf[Double])))
import org.apache.spark.mllib.evaluation.RegressionMetrics metrics: org.apache.spark.mllib.evaluation.RegressionMetrics = org.apache.spark.mllib.evaluation.RegressionMetrics@78aa5b8c
val rmse = metrics.rootMeanSquaredError
rmse: Double = 4.609375859170583
val explainedVariance = metrics.explainedVariance
explainedVariance: Double = 274.54186073318266
val r2 = metrics.r2
r2: Double = 0.9308377700269259
println (f"Root Mean Squared Error: $rmse")
println (f"Explained Variance: $explainedVariance")  
println (f"R2: $r2")
Root Mean Squared Error: 4.609375859170583 Explained Variance: 274.54186073318266 R2: 0.9308377700269259

Generally a good model will have 68% of predictions within 1 RMSE and 95% within 2 RMSE of the actual value. Let's calculate and see if our RMSE meets this criteria.

display(predictionsAndLabels) // recall the DataFrame predictionsAndLabels
1.8139.421026.9276.97490.55["1","4",[],[1.81,39.42,1026.92,76.97]]492.8503868481024
3.241.31997.6798.84489.86["1","4",[],[3.2,41.31,997.67,98.84]]483.9368120270272
3.3841.31998.7997.76489.11["1","4",[],[3.38,41.31,998.79,97.76]]483.850459922409
3.439.641011.183.43459.86["1","4",[],[3.4,39.64,1011.1,83.43]]487.4251507226833
3.5135.471017.5386.56489.07["1","4",[],[3.51,35.47,1017.53,86.56]]488.37401129434335
3.6338.441016.1687.38487.87["1","4",[],[3.63,38.44,1016.16,87.38]]487.1505396071426
3.9135.471016.9286.03488.67["1","4",[],[3.91,35.47,1016.92,86.03]]487.6355351796776
3.9439.91008.0697.49488.81["1","4",[],[3.94,39.9,1008.06,97.49]]483.9896378767201
439.91009.6497.16490.79["1","4",[],[4,39.9,1009.64,97.16]]484.0618847149547
4.1539.91007.6295.69489.8["1","4",[],[4.15,39.9,1007.62,95.69]]483.8158776062654
4.1539.91008.8496.68491.22["1","4",[],[4.15,39.9,1008.84,96.68]]483.77650720118083
4.2338.441016.4676.64489["1","4",[],[4.23,38.44,1016.46,76.64]]487.61554926022393
4.2439.91009.2896.74491.25["1","4",[],[4.24,39.9,1009.28,96.74]]483.6343648504441
4.4338.911019.0488.17491.9["1","4",[],[4.43,38.91,1019.04,88.17]]485.6397981724803
4.4438.441016.1475.35486.53["1","4",[],[4.44,38.44,1016.14,75.35]]487.37706899378225
4.6140.271012.3277.28492.85["1","4",[],[4.61,40.27,1012.32,77.28]]485.96972834538735
4.6535.191018.2394.78489.36["1","4",[],[4.65,35.19,1018.23,94.78]]485.11862159667663
4.6939.421024.5879.35486.34["1","4",[],[4.69,39.42,1024.58,79.35]]486.79899634464203
4.7341.31999.7793.44486.6["1","4",[],[4.73,41.31,999.77,93.44]]481.99694115337115
4.7739.331011.3268.98494.91["1","4",[],[4.77,39.33,1011.32,68.98]]487.0395505377602
4.7842.851013.3993.36481.47["1","4",[],[4.78,42.85,1013.39,93.36]]482.7127506383782
4.8338.441015.3572.94485.32["1","4",[],[4.83,38.44,1015.35,72.94]]486.9191795580812
4.8639.41012.7391.39488.63["1","4",[],[4.86,39.4,1012.73,91.39]]483.6685673220653
4.8945.871007.5899.35482.69["1","4",[],[4.89,45.87,1007.58,99.35]]480.3452494934288
4.9542.071004.8780.88485.67["1","4",[],[4.95,42.07,1004.87,80.88]]483.6820847979367
4.9639.41003.5892.22486.09["1","4",[],[4.96,39.4,1003.58,92.22]]482.5556900620063
4.9640.071011.867.38494.75["1","4",[],[4.96,40.07,1011.8,67.38]]486.76704382567345
5.0740.071019.3266.17494.87["1","4",[],[5.07,40.07,1019.32,66.17]]487.39276206190476
5.1940.781025.2495.07482.46["1","4",[],[5.19,40.78,1025.24,95.07]]483.2391853805797
5.2438.681018.0378.65486.67["1","4",[],[5.24,38.68,1018.03,78.65]]485.46804748846023
5.2540.071019.4867.7495.23["1","4",[],[5.25,40.07,1019.48,67.7]]486.8376282047915
5.2842.071003.8280.84485.24["1","4",[],[5.28,42.07,1003.82,80.84]]482.9664790826128
5.3535.571027.1280.81488.65["1","4",[],[5.35,35.57,1027.12,80.81]]486.52337424855034
5.4145.871008.4797.51479.48["1","4",[],[5.41,45.87,1008.47,97.51]]479.7020461747409
5.4740.621018.6683.61481.56["1","4",[],[5.47,40.62,1018.66,83.61]]483.8603707725907
5.5239.331009.7495.25492.39["1","4",[],[5.52,39.33,1009.74,95.25]]481.59632996620337
5.5335.791013.4894.19484.25["1","4",[],[5.53,35.79,1013.48,94.19]]482.9589094130443
5.6540.721022.4685.17487.09["1","4",[],[5.65,40.72,1022.46,85.17]]483.5935440196594
5.66401022.0893.03475.54["1","4",[],[5.66,40,1022.08,93.03]]482.56492081062197
5.6640.621015.8784.97485.18["1","4",[],[5.66,40.62,1015.87,84.97]]483.05341208868646
5.6745.871008.9193.29478.44["1","4",[],[5.67,45.87,1008.91,93.29]]479.8666424772226
5.740.621016.0784.94482.82["1","4",[],[5.7,40.62,1016.07,84.94]]482.99898248352287
5.7141.311003.2489.48485.83["1","4",[],[5.71,41.31,1003.24,89.48]]481.0140181620884
5.7240.811025.7892.46487.8["1","4",[],[5.72,40.81,1025.78,92.46]]482.6522450331836
5.7340.351012.2491.84490.5["1","4",[],[5.73,40.35,1012.24,91.84]]481.65803626550104
5.7645.871010.8395.79481.4["1","4",[],[5.76,45.87,1010.83,95.79]]479.4940275935438
5.7938.681017.1970.46487.4["1","4",[],[5.79,38.68,1017.19,70.46]]485.55280779089935
5.845.871009.1492.06481.6["1","4",[],[5.8,45.87,1009.14,92.06]]479.82004524900304
5.8145.871009.6394.38479.66["1","4",[],[5.81,45.87,1009.63,94.38]]479.5016658987375
5.8240.781024.8296.01470.02["1","4",[],[5.82,40.78,1024.82,96.01]]481.8616297971032
5.8540.771022.4484.77480.59["1","4",[],[5.85,40.77,1022.44,84.77]]483.25643025675544
5.8939.481005.1159.83484.91["1","4",[],[5.89,39.48,1005.11,59.83]]485.6707676138384
5.9740.351012.394.1489.03["1","4",[],[5.97,40.35,1012.3,94.1]]480.8720151235934
5.9839.611017.2784.86482.17["1","4",[],[5.98,39.61,1017.27,84.86]]482.8376771392271
5.9935.791011.5691.69484.82["1","4",[],[5.99,35.79,1011.56,91.69]]482.28195572617585
6.0135.791011.0591.33482.25["1","4",[],[6.01,35.79,1011.05,91.33]]482.25230635662086
6.0541.141027.6986.93481.02["1","4",[],[6.05,41.14,1027.69,86.93]]482.9211493842233
6.0641.171019.6784.7489.62["1","4",[],[6.06,41.17,1019.67,84.7]]482.5224032770931
6.0641.171019.6784.7489.62["1","4",[],[6.06,41.17,1019.67,84.7]]482.5224032770931
6.0741.141027.5786.98480.19["1","4",[],[6.07,41.14,1027.57,86.98]]482.8651227775144
6.0943.651020.9571.15485.96["1","4",[],[6.09,43.65,1020.95,71.15]]483.9457142125588
6.1340.811026.3191.66483.8["1","4",[],[6.13,40.81,1026.31,91.66]]482.03413003220066
6.1439.41011.2190.87485.94["1","4",[],[6.14,39.4,1011.21,90.87]]481.1697787554499
6.1736.251028.6890.59483.77["1","4",[],[6.17,36.25,1028.68,90.59]]483.4800950250837
6.1739.331012.5793.32491.54["1","4",[],[6.17,39.33,1012.57,93.32]]480.887862061189
6.2841.061020.9690.91489.79["1","4",[],[6.28,41.06,1020.96,90.91]]481.3274744321715
6.2843.021013.7288.13487.17["1","4",[],[6.28,43.02,1013.72,88.13]]480.60722518885586
6.2940.781024.7596.37478.29["1","4",[],[6.29,40.78,1024.75,96.37]]480.90552075385773
6.3440.641020.6294.39478.78["1","4",[],[6.34,40.64,1020.62,94.39]]480.7766850294918
6.439.91007.7586.55486.03["1","4",[],[6.4,39.9,1007.75,86.55]]480.8813804440216
6.4140.811026.5793.51484.49["1","4",[],[6.41,40.81,1026.57,93.51]]481.2497160345687
6.4836.241013.6292.03484.65["1","4",[],[6.48,36.24,1013.62,92.03]]481.36256219865294
6.4840.271010.5582.12486.68["1","4",[],[6.48,40.27,1010.55,82.12]]481.5327774754329
6.5439.331011.5493.69491.16["1","4",[],[6.54,39.33,1011.54,93.69]]480.0372112529075
6.5739.371020.277.37487.94["1","4",[],[6.57,39.37,1020.2,77.37]]483.13326820062326
6.5939.371020.3477.92488.17["1","4",[],[6.59,39.37,1020.34,77.92]]483.026231339655
6.6736.081022.3183.51486.52["1","4",[],[6.67,36.08,1022.31,83.51]]483.05644648396395
6.6739.371019.9975.61486.84["1","4",[],[6.67,39.37,1019.99,75.61]]483.1836235447748
6.6936.241013.3591.09483.82["1","4",[],[6.69,36.24,1013.35,91.09]]481.07683881182743
6.7140.721022.7880.69483.11["1","4",[],[6.71,40.72,1022.78,80.69]]482.2593488420791
6.7239.851011.8484.66489.09["1","4",[],[6.72,39.85,1011.84,84.66]]480.91956153647465
6.7539.371020.2670.99486.26["1","4",[],[6.75,39.37,1020.26,70.99]]483.7358441723225
6.7539.91008.387.42484.05["1","4",[],[6.75,39.9,1008.3,87.42]]480.13324493534134
6.7636.251028.3191.16484.36["1","4",[],[6.76,36.25,1028.31,91.16]]482.2378034745739
6.839.371020.2473.29487.33["1","4",[],[6.8,39.37,1020.24,73.29]]483.29951117842876
6.8137.491010.7488.25482.21["1","4",[],[6.81,37.49,1010.74,88.25]]480.72127979674934
6.8138.561016.570.99487.45["1","4",[],[6.81,38.56,1016.5,70.99]]483.4983346847084
6.8241.031022.1287.63489.64["1","4",[],[6.82,41.03,1022.12,87.63]]480.8896654047192
6.8439.41011.993.75484.09["1","4",[],[6.84,39.4,1011.9,93.75]]479.4695661535221
6.8640.021031.577.94476.45["1","4",[],[6.86,40.02,1031.5,77.94]]483.31837237370024
6.8641.381021.3590.78487.49["1","4",[],[6.86,41.38,1021.35,90.78]]480.1926904623461
6.8642.491007.9593.96486.14["1","4",[],[6.86,42.49,1007.95,93.96]]478.2709460554042
6.8740.071017.9157.64491.4["1","4",[],[6.87,40.07,1017.91,57.64]]485.0924630969619
6.8939.371020.2174.17486.9["1","4",[],[6.89,39.37,1020.21,74.17]]482.99537247457505
6.8943.651019.8772.77484["1","4",[],[6.89,43.65,1019.87,72.77]]482.0857905249771
6.9136.081021.8284.31486.37["1","4",[],[6.91,36.08,1021.82,84.31]]482.4376580053312
6.9137.491011.0582.07481.88["1","4",[],[6.91,37.49,1011.05,82.07]]481.46887563754206
6.9340.671020.1771.16494.61["1","4",[],[6.93,40.67,1020.17,71.16]]483.02945770729485
6.9341.141027.1884.67479.06["1","4",[],[6.93,41.14,1027.18,84.67]]481.53054017882704
6.9438.911018.9490.64485.12["1","4",[],[6.94,38.91,1018.94,90.64]]480.47697065614113

Showing the first 1000 rows.

// First we calculate the residual error and divide it by the RMSE from predictionsAndLabels DataFrame and make another DataFrame that is registered as a temporary table Power_Plant_RMSE_Evaluation
predictionsAndLabels.selectExpr("PE", "Predicted_PE", "PE - Predicted_PE AS Residual_Error", s""" (PE - Predicted_PE) / $rmse AS Within_RSME""").createOrReplaceTempView("Power_Plant_RMSE_Evaluation")
%sql SELECT * from Power_Plant_RMSE_Evaluation
490.55492.8503868481024-2.3003868481023915-0.49906688419119855
489.86483.93681202702725.9231879729728121.2850303715606821
489.11483.8504599224095.2595400775909981.1410525499080058
459.86487.4251507226833-27.565150722683313-5.980234974295072
489.07488.374011294343350.69598870565664580.15099413172652035
487.87487.15053960714260.71946039285739970.1560862934243033
488.67487.63553517967761.03446482032239830.22442622427161782
488.81483.98963787672014.8203621232798921.045773282664624
490.79484.06188471495476.7281152850453051.4596586372229519
489.8483.81587760626545.9841223937345941.2982500400415133
491.22483.776507201180837.4434927988191931.6148591536552597
489487.615549260223931.38445073977607080.30035535874594327
491.25483.63436485044417.6156351495558851.6522052838030554
491.9485.63979817248036.2602018275196661.3581452280713195
486.53487.37706899378225-0.8470689937822726-0.1837708660917696
492.85485.969728345387356.880271654612671.4926688265015375
489.36485.118621596676634.2413784033233810.9201632786974722
486.34486.79899634464203-0.45899634464205974-0.09957884942900971
486.6481.996941153371154.6030588466288690.9986295297379263
494.91487.03955053776027.8704494622398331.707487022691192
481.47482.7127506383782-1.2427506383781974-0.26961364756264844
485.32486.9191795580812-1.5991795580812322-0.346940585220358
488.63483.66856732206534.9614326779346811.076378414240979
482.69480.34524949342882.3447505065711880.5086915405056825
485.67483.68208479793671.98791520206333420.4312764380253951
486.09482.55569006200633.5343099379936690.766765402947556
494.75486.767043825673457.9829561743265461.7318952539841284
494.87487.392762061904767.4772379380952431.6221801316590196
482.46483.2391853805797-0.7791853805797473-0.16904357648108023
486.67485.468047488460231.20195251153978690.26076253016955486
495.23486.83762820479158.3923717952085331.8207176094159236
485.24482.96647908261282.2735209173872020.49323834437669456
488.65486.523374248550342.12662575144963740.4613695685541915
479.48479.7020461747409-0.22204617474085353-0.048172720456085526
481.56483.8603707725907-2.3003707725907248-0.49906339662321586
492.39481.5963299662033710.7936700337966162.3416771301741583
484.25482.95890941304431.29109058695570410.2801009564856845
487.09483.59354401965943.49645598034055640.7585530204450963
475.54482.56492081062197-7.02492081062195-1.5240503324643226
485.18483.053412088686462.12658791131354970.4613613591702654
478.44479.8666424772226-1.4266424772226287-0.30950881872309294
482.82482.99898248352287-0.17898248352287283-0.03883009088243005
485.83481.01401816208844.8159818379115791.044822983643207
487.8482.65224503318365.14775496681642151.1168008693790303
490.5481.658036265501048.8419637344989611.9182561814540322
481.4479.49402759354381.90597240645615780.41349902127511057
487.4485.552807790899351.84719220910062630.4007467096495387
481.6479.820045249003041.77995475099697840.3861596028138321
479.66479.50166589873750.158334101262539660.03435044268466979
470.02481.8616297971032-11.841629797103224-2.5690310703440926
480.59483.25643025675544-2.6664302567554614-0.5784796766899505
484.91485.6707676138384-0.7607676138383681-0.16504785833960212
489.03480.87201512359348.1579848764065451.7698675754930742
482.17482.8376771392271-0.6676771392270666-0.14485196252735383
484.82482.281955726175852.53804427382414130.55062645168642
482.25482.25230635662086-0.0023063566208634256-0.0005003620210911666
481.02482.9211493842233-1.901149384223345-0.41245267088404464
489.62482.52240327709317.0975967229069281.5398173071058865
489.62482.52240327709317.0975967229069281.5398173071058865
480.19482.8651227775144-2.675122777514389-0.5803655113505441
485.96483.94571421255882.01428578744116750.43699751310877494
483.8482.034130032200661.76586996779934680.3831039216049307
485.94481.16977875544994.77022124455011.0348952635440887
483.77483.48009502508370.289904974916282750.06289462690257779
491.54480.88786206118910.6521379388109952.3109718678328295
489.79481.32747443217158.4625255678284931.8359374080965598
487.17480.607225188855866.5627748111441521.423788168215266
478.29480.90552075385773-2.6155207538577088-0.5674349052386345
478.78480.7766850294918-1.9966850294918004-0.43317904429930487
486.03480.88138044402165.1486195559783711.11698844123005
484.49481.24971603456873.2402839654313310.7029767292646845
484.65481.362562198652943.2874378013470390.7132067120988882
486.68481.53277747543295.1472225245670981.116685356505793
491.16480.037211252907511.12278874709252.4130791428004637
487.94483.133268200623264.8067317993767351.0428161959961462
488.17483.0262313396555.14376866034501751.1159360437294854
486.52483.056446483963953.4635535160360290.7514148600282003
486.84483.18362354477483.65637645522519960.7932476254785464
483.82481.076838811827432.74316118817256440.5951263841317928
483.11482.25934884207910.85065115792093590.1845480134210629
489.09480.919561536474658.1704384635253291.7725693701610024
486.26483.73584417232252.52415582767747540.5476133656263986
484.05480.133244935341343.91675506465867330.8497365336059749
484.36482.23780347457392.1221965254260910.46040865190107577
487.33483.299511178428764.0304888215712250.8744109711843887
482.21480.721279796749341.48872020325063660.3229765262662956
487.45483.49833468470843.95166531529156370.8573102814841035
489.64480.88966540471928.7503345952807761.8983773210578065
484.09479.46956615352214.6204338464778521.0023990205279676
476.45483.31837237370024-6.868372373700254-1.4900872880729146
487.49480.19269046234617.2973095376539161.5831448249410072
486.14478.27094605540427.8690539445958051.7071842663773948
491.4485.09246309696196.3075369030380561.3684145306764033
486.9482.995372474575053.9046275254249280.8471054747372092
484482.08579052497711.91420947502291480.4152860459870071
486.37482.43765800533123.9323419946688320.8531181042321038
481.88481.468875637542060.411124362457940150.08919306539951342
494.61483.0294577072948511.5805422927051612.5123883680835215
479.06481.53054017882704-2.4705401788270365-0.5359814982134238
485.12480.476970656141134.6430293438588711.0073010936223248

Showing the first 1000 rows.

%sql -- Now we can display the RMSE as a Histogram. Clearly this shows that the RMSE is centered around 0 with the vast majority of the error within 2 RMSEs.
SELECT Within_RSME  from Power_Plant_RMSE_Evaluation
0.000.020.040.060.080.100.120.140.160.180.20-6.0-5.5-5.0-4.5-4.0-3.5-3.0-2.5-2.0-1.5-1.00-0.500.000.501.001.52.02.53.03.54.0Within_RSMEDensityDensity

Showing sample based on the first 1000 rows.

We can see this definitively if we count the number of predictions within + or - 1.0 and + or - 2.0 and display this as a pie chart:

%sql 
SELECT case when Within_RSME <= 1.0 and Within_RSME >= -1.0 then 1  when  Within_RSME <= 2.0 and Within_RSME >= -2.0 then 2 else 3 end RSME_Multiple, COUNT(*) count  from Power_Plant_RMSE_Evaluation
group by case when Within_RSME <= 1.0 and Within_RSME >= -1.0 then 1  when  Within_RSME <= 2.0 and Within_RSME >= -2.0 then 2 else 3 end
13267%3%30%RSME_Multiple113322

So we have about 70% of our training data within 1 RMSE and about 97% (70% + 27%) within 2 RMSE. So the model is pretty decent. Let's see if we can tune the model to improve it further.

NOTE: these numbers will vary across runs due to the seed in random sampling of training and test set, number of iterations, and other stopping rules in optimization, for example.

Step 7: Tuning and Evaluation

Now that we have a model with all of the data let's try to make a better model by tuning over several parameters.

import org.apache.spark.ml.tuning.{ParamGridBuilder, CrossValidator}
import org.apache.spark.ml.evaluation._
import org.apache.spark.ml.tuning.{ParamGridBuilder, CrossValidator} import org.apache.spark.ml.evaluation._

First let's use a cross validator to split the data into training and validation subsets. See http://spark.apache.org/docs/latest/ml-tuning.html.

//Let's set up our evaluator class to judge the model based on the best root mean squared error
val regEval = new RegressionEvaluator()
regEval.setLabelCol("PE")
  .setPredictionCol("Predicted_PE")
  .setMetricName("rmse")
regEval: org.apache.spark.ml.evaluation.RegressionEvaluator = regEval_e2be8a782fd8 res37: regEval.type = regEval_e2be8a782fd8

We now treat the lrPipeline as an Estimator, wrapping it in a CrossValidator instance.

This will allow us to jointly choose parameters for all Pipeline stages.

A CrossValidator requires an Estimator, an Evaluator (which we set next).

//Let's create our crossvalidator with 3 fold cross validation
val crossval = new CrossValidator()
crossval.setEstimator(lrPipeline)
crossval.setNumFolds(3)
crossval.setEvaluator(regEval)
crossval: org.apache.spark.ml.tuning.CrossValidator = cv_414cd3231d9a res38: crossval.type = cv_414cd3231d9a

A CrossValidator also requires a set of EstimatorParamMaps which we set next.

For this we need a regularization parameter (more generally a hyper-parameter that is model-specific).

Now, let's tune over our regularization parameter from 0.01 to 0.10.

val regParam = ((1 to 10) toArray).map(x => (x /100.0))
warning: there was one feature warning; re-run with -feature for details regParam: Array[Double] = Array(0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1)

Check out the scala docs for syntactic details on org.apache.spark.ml.tuning.ParamGridBuilder.

val paramGrid = new ParamGridBuilder()
                    .addGrid(lr.regParam, regParam)
                    .build()
crossval.setEstimatorParamMaps(paramGrid)
paramGrid: Array[org.apache.spark.ml.param.ParamMap] = Array({ linReg_955431dccb4f-regParam: 0.01 }, { linReg_955431dccb4f-regParam: 0.02 }, { linReg_955431dccb4f-regParam: 0.03 }, { linReg_955431dccb4f-regParam: 0.04 }, { linReg_955431dccb4f-regParam: 0.05 }, { linReg_955431dccb4f-regParam: 0.06 }, { linReg_955431dccb4f-regParam: 0.07 }, { linReg_955431dccb4f-regParam: 0.08 }, { linReg_955431dccb4f-regParam: 0.09 }, { linReg_955431dccb4f-regParam: 0.1 }) res39: crossval.type = cv_414cd3231d9a
//Now let's create our model
val cvModel = crossval.fit(trainingSet)
cvModel: org.apache.spark.ml.tuning.CrossValidatorModel = cv_414cd3231d9a

In addition to CrossValidator Spark also offers TrainValidationSplit for hyper-parameter tuning. TrainValidationSplit only evaluates each combination of parameters once as opposed to k times in case of CrossValidator. It is therefore less expensive, but will not produce as reliable results when the training dataset is not sufficiently large.

Now that we have tuned let's see what we got for tuning parameters and what our RMSE was versus our intial model

val predictionsAndLabels = cvModel.transform(testSet)
val metrics = new RegressionMetrics(predictionsAndLabels.select("Predicted_PE", "PE").rdd.map(r => (r(0).asInstanceOf[Double], r(1).asInstanceOf[Double])))

val rmse = metrics.rootMeanSquaredError
val explainedVariance = metrics.explainedVariance
val r2 = metrics.r2
predictionsAndLabels: org.apache.spark.sql.DataFrame = [AT: double, V: double ... 5 more fields] metrics: org.apache.spark.mllib.evaluation.RegressionMetrics = org.apache.spark.mllib.evaluation.RegressionMetrics@2b71aef4 rmse: Double = 4.599964072968395 explainedVariance: Double = 277.2272873387723 r2: Double = 0.9311199234339246
println (f"Root Mean Squared Error: $rmse")
println (f"Explained Variance: $explainedVariance")  
println (f"R2: $r2")
Root Mean Squared Error: 4.599964072968395 Explained Variance: 277.2272873387723 R2: 0.9311199234339246

Let us explore other models to see if we can predict the power output better

There are several families of models in Spark's scalable machine learning library:

So our initial untuned and tuned linear regression models are statistically identical.

Given that the only linearly correlated variable is Temperature, it makes sense try another machine learning method such a Decision Tree to handle non-linear data and see if we can improve our model

A Decision Tree creates a model based on splitting variables using a tree structure. We will first start with a single decision tree model.

Reference Decision Trees: https://en.wikipedia.org/wiki/Decision_tree_learning

//Let's build a decision tree pipeline
import org.apache.spark.ml.regression.DecisionTreeRegressor

// we are using a Decision Tree Regressor as opposed to a classifier we used for the hand-written digit classification problem
val dt = new DecisionTreeRegressor()
dt.setLabelCol("PE")
dt.setPredictionCol("Predicted_PE")
dt.setFeaturesCol("features")
dt.setMaxBins(100)

val dtPipeline = new Pipeline()
dtPipeline.setStages(Array(vectorizer, dt))
import org.apache.spark.ml.regression.DecisionTreeRegressor dt: org.apache.spark.ml.regression.DecisionTreeRegressor = dtr_23e04c8c3476 dtPipeline: org.apache.spark.ml.Pipeline = pipeline_382103e9e31e res41: dtPipeline.type = pipeline_382103e9e31e
//Let's just resuse our CrossValidator
crossval.setEstimator(dtPipeline)
res42: crossval.type = cv_414cd3231d9a
val paramGrid = new ParamGridBuilder()
                     .addGrid(dt.maxDepth, Array(2, 3))
                     .build()
paramGrid: Array[org.apache.spark.ml.param.ParamMap] = Array({ dtr_23e04c8c3476-maxDepth: 2 }, { dtr_23e04c8c3476-maxDepth: 3 })
crossval.setEstimatorParamMaps(paramGrid)
res43: crossval.type = cv_414cd3231d9a
val dtModel = crossval.fit(trainingSet) // fit decitionTree with cv
dtModel: org.apache.spark.ml.tuning.CrossValidatorModel = cv_414cd3231d9a
import org.apache.spark.ml.regression.DecisionTreeRegressionModel
import org.apache.spark.ml.PipelineModel
dtModel.bestModel.asInstanceOf[PipelineModel].stages.last.asInstanceOf[DecisionTreeRegressionModel].toDebugString
import org.apache.spark.ml.regression.DecisionTreeRegressionModel import org.apache.spark.ml.PipelineModel res44: String = "DecisionTreeRegressionModel (uid=dtr_23e04c8c3476) of depth 3 with 15 nodes If (feature 0 <= 17.84) If (feature 0 <= 11.95) If (feature 0 <= 8.75) Predict: 483.5412151067323 Else (feature 0 > 8.75) Predict: 475.6305502392345 Else (feature 0 > 11.95) If (feature 0 <= 15.33) Predict: 467.63141917293234 Else (feature 0 > 15.33) Predict: 460.74754125412574 Else (feature 0 > 17.84) If (feature 0 <= 23.02) If (feature 1 <= 47.83) Predict: 457.1077966101695 Else (feature 1 > 47.83) Predict: 448.74750213858016 Else (feature 0 > 23.02) If (feature 1 <= 66.25) Predict: 442.88544855967086 Else (feature 1 > 66.25) Predict: 434.7293710691822 "

The line above will pull the Decision Tree model from the Pipeline and display it as an if-then-else string.

Next let's visualize it as a decision tree for regression.

display(dtModel.bestModel.asInstanceOf[PipelineModel].stages.last.asInstanceOf[DecisionTreeRegressionModel])
434.7293710691822442.88544855967086feature: 1448.74750213858016457.1077966101695feature: 1feature: 0460.74754125412574467.63141917293234feature: 0475.6305502392345483.5412151067323feature: 0feature: 0feature: 0<=6.63e+1>6.63e+1<=4.78e+1>4.78e+1<=1.53e+1>1.53e+1<=8.75e+0>8.75e+0

Now let's see how our DecisionTree model compares to our LinearRegression model

val predictionsAndLabels = dtModel.bestModel.transform(testSet)
val metrics = new RegressionMetrics(predictionsAndLabels.select("Predicted_PE", "PE").map(r => (r(0).asInstanceOf[Double], r(1).asInstanceOf[Double])).rdd)

val rmse = metrics.rootMeanSquaredError
val explainedVariance = metrics.explainedVariance
val r2 = metrics.r2

println (f"Root Mean Squared Error: $rmse")
println (f"Explained Variance: $explainedVariance")  
println (f"R2: $r2")
Root Mean Squared Error: 5.221342219456633 Explained Variance: 269.66550072645475 R2: 0.9112539444165726 predictionsAndLabels: org.apache.spark.sql.DataFrame = [AT: double, V: double ... 5 more fields] metrics: org.apache.spark.mllib.evaluation.RegressionMetrics = org.apache.spark.mllib.evaluation.RegressionMetrics@1b14107a rmse: Double = 5.221342219456633 explainedVariance: Double = 269.66550072645475 r2: Double = 0.9112539444165726

So our DecisionTree was slightly worse than our LinearRegression model (LR: 4.6 vs DT: 5.2). Maybe we can try an Ensemble method such as Gradient-Boosted Decision Trees to see if we can strengthen our model by using an ensemble of weaker trees with weighting to reduce the error in our model.

Note since this is a complex model, the cell below can take about 16 minutes or so to run on a small cluster with a couple nodes with about 6GB RAM, go out and grab a coffee and come back :-).

This GBTRegressor code will be way faster on a larger cluster of course.

A visual explanation of gradient boosted trees:

Let's see what a boosting algorithm, a type of ensemble method, is all about in more detail.

Show code

This can take between 5 - 15 minutes in a shard with 6 workers depending on other workloads (may be longer in the Community Edition).

import org.apache.spark.ml.regression.GBTRegressor

val gbt = new GBTRegressor()
gbt.setLabelCol("PE")
gbt.setPredictionCol("Predicted_PE")
gbt.setFeaturesCol("features")
gbt.setSeed(100088121L)
gbt.setMaxBins(100)
gbt.setMaxIter(120)

val gbtPipeline = new Pipeline()
gbtPipeline.setStages(Array(vectorizer, gbt))
//Let's just resuse our CrossValidator

crossval.setEstimator(gbtPipeline)

val paramGrid = new ParamGridBuilder()
  .addGrid(gbt.maxDepth, Array(2, 3))
  .build()
crossval.setEstimatorParamMaps(paramGrid)

//gbt.explainParams
val gbtModel = crossval.fit(trainingSet)
import org.apache.spark.ml.regression.GBTRegressor gbt: org.apache.spark.ml.regression.GBTRegressor = gbtr_9c5ab45fe584 gbtPipeline: org.apache.spark.ml.Pipeline = pipeline_e6a84d2d75ba paramGrid: Array[org.apache.spark.ml.param.ParamMap] = Array({ gbtr_9c5ab45fe584-maxDepth: 2 }, { gbtr_9c5ab45fe584-maxDepth: 3 }) gbtModel: org.apache.spark.ml.tuning.CrossValidatorModel = cv_414cd3231d9a
import org.apache.spark.ml.regression.GBTRegressionModel 

val predictionsAndLabels = gbtModel.bestModel.transform(testSet)
val metrics = new RegressionMetrics(predictionsAndLabels.select("Predicted_PE", "PE").map(r => (r(0).asInstanceOf[Double], r(1).asInstanceOf[Double])).rdd)

val rmse = metrics.rootMeanSquaredError
val explainedVariance = metrics.explainedVariance
val r2 = metrics.r2


println (f"Root Mean Squared Error: $rmse")
println (f"Explained Variance: $explainedVariance")  
println (f"R2: $r2")
Root Mean Squared Error: 3.7616562931536803 Explained Variance: 282.4365553123402 R2: 0.9539379816689415 import org.apache.spark.ml.regression.GBTRegressionModel predictionsAndLabels: org.apache.spark.sql.DataFrame = [AT: double, V: double ... 5 more fields] metrics: org.apache.spark.mllib.evaluation.RegressionMetrics = org.apache.spark.mllib.evaluation.RegressionMetrics@1f1d9c34 rmse: Double = 3.7616562931536803 explainedVariance: Double = 282.4365553123402 r2: Double = 0.9539379816689415

Note that the root mean squared error is smaller now due to the ensemble of 120 trees from Gradient Boosting!

We can use the toDebugString method to dump out what our trees and weighting look like:

gbtModel.bestModel.asInstanceOf[PipelineModel].stages.last.asInstanceOf[GBTRegressionModel].toDebugString
res49: String = "GBTRegressionModel (uid=gbtr_9c5ab45fe584) with 120 trees Tree 0 (weight 1.0): If (feature 0 <= 17.84) If (feature 0 <= 11.95) If (feature 0 <= 8.75) Predict: 483.5412151067323 Else (feature 0 > 8.75) Predict: 475.6305502392345 Else (feature 0 > 11.95) If (feature 0 <= 15.33) Predict: 467.63141917293234 Else (feature 0 > 15.33) Predict: 460.74754125412574 Else (feature 0 > 17.84) If (feature 0 <= 23.02) If (feature 1 <= 47.83) Predict: 457.1077966101695 Else (feature 1 > 47.83) Predict: 448.74750213858016 Else (feature 0 > 23.02) If (feature 1 <= 66.25) Predict: 442.88544855967086 Else (feature 1 > 66.25) Predict: 434.7293710691822 Tree 1 (weight 0.1): If (feature 2 <= 1009.9) If (feature 1 <= 43.13) If (feature 0 <= 15.33) Predict: -0.31094207231231485 Else (feature 0 > 15.33) Predict: 4.302958436537365 Else (feature 1 > 43.13) If (feature 0 <= 23.02) Predict: -8.38392506141353 Else (feature 0 > 23.02) Predict: -2.399960976520273 Else (feature 2 > 1009.9) If (feature 3 <= 86.21) If (feature 0 <= 26.35) Predict: 2.888623754019027 Else (feature 0 > 26.35) Predict: -1.1489483044194229 Else (feature 3 > 86.21) If (feature 1 <= 39.72) Predict: 3.876324424163314 Else (feature 1 > 39.72) Predict: -3.058952828112949 Tree 2 (weight 0.1): If (feature 2 <= 1009.63) If (feature 1 <= 43.13) If (feature 0 <= 11.95) Predict: -1.4091086031845625 Else (feature 0 > 11.95) Predict: 2.6329466235800942 Else (feature 1 > 43.13) If (feature 0 <= 23.02) Predict: -6.795414480322956 Else (feature 0 > 23.02) Predict: -2.166560698742912 Else (feature 2 > 1009.63) If (feature 3 <= 80.44) If (feature 0 <= 26.98) Predict: 2.878622882275939 Else (feature 0 > 26.98) Predict: -1.146426969990865 Else (feature 3 > 80.44) If (feature 3 <= 94.55) Predict: -0.35885921725905906 Else (feature 3 > 94.55) Predict: -5.75364586186002 Tree 3 (weight 0.1): If (feature 0 <= 27.6) If (feature 3 <= 70.2) If (feature 1 <= 40.05) Predict: -0.9480831286616939 Else (feature 1 > 40.05) Predict: 3.660397090904016 Else (feature 3 > 70.2) If (feature 1 <= 40.64) Predict: 2.1539405832035627 Else (feature 1 > 40.64) Predict: -1.2281619807661366 Else (feature 0 > 27.6) If (feature 1 <= 65.27) If (feature 2 <= 1005.99) Predict: -15.33433697033558 Else (feature 2 > 1005.99) Predict: -5.866095468145647 Else (feature 1 > 65.27) If (feature 2 <= 1008.75) Predict: -4.03431044067007 Else (feature 2 > 1008.75) Predict: -0.23440867445577207 Tree 4 (weight 0.1): If (feature 0 <= 26.35) If (feature 0 <= 23.02) If (feature 1 <= 68.67) Predict: 0.12035773384797814 Else (feature 1 > 68.67) Predict: -13.928523073642005 Else (feature 0 > 23.02) If (feature 0 <= 24.57) Predict: 5.5622340839882165 Else (feature 0 > 24.57) Predict: 1.6938172370244715 Else (feature 0 > 26.35) If (feature 1 <= 66.25) If (feature 2 <= 1008.4) Predict: -9.009916879825393 Else (feature 2 > 1008.4) Predict: -3.059736394918022 Else (feature 1 > 66.25) If (feature 0 <= 30.2) Predict: 0.14704705100738577 Else (feature 0 > 30.2) Predict: -3.2914123948921006 Tree 5 (weight 0.1): If (feature 2 <= 1010.41) If (feature 1 <= 43.41) If (feature 3 <= 99.27) Predict: 1.2994444710077233 Else (feature 3 > 99.27) Predict: -6.649548317319231 Else (feature 1 > 43.41) If (feature 0 <= 23.02) Predict: -4.9119452777748 Else (feature 0 > 23.02) Predict: -0.9514185089440673 Else (feature 2 > 1010.41) If (feature 3 <= 89.83) If (feature 0 <= 31.26) Predict: 1.2914123584761403 Else (feature 0 > 31.26) Predict: -5.115001417285994 Else (feature 3 > 89.83) If (feature 1 <= 40.64) Predict: 1.5160976219176363 Else (feature 1 > 40.64) Predict: -4.202813699523934 Tree 6 (weight 0.1): If (feature 2 <= 1007.27) If (feature 0 <= 27.94) If (feature 3 <= 71.09) Predict: 1.616448005210527 Else (feature 3 > 71.09) Predict: -2.1313527108274157 Else (feature 0 > 27.94) If (feature 1 <= 68.3) Predict: -8.579840063013142 Else (feature 1 > 68.3) Predict: -1.915909819494233 Else (feature 2 > 1007.27) If (feature 3 <= 95.45) If (feature 0 <= 6.52) Predict: 4.973465595410054 Else (feature 0 > 6.52) Predict: 0.3837975458985242 Else (feature 3 > 95.45) If (feature 2 <= 1013.43) Predict: -0.8175453481344352 Else (feature 2 > 1013.43) Predict: -7.264843604639278 Tree 7 (weight 0.1): If (feature 0 <= 26.35) If (feature 3 <= 71.09) If (feature 1 <= 67.83) Predict: 1.9620965187817083 Else (feature 1 > 67.83) Predict: 7.953863660960779 Else (feature 3 > 71.09) If (feature 1 <= 40.89) Predict: 1.2020440154192213 Else (feature 1 > 40.89) Predict: -0.9989659748111419 Else (feature 0 > 26.35) If (feature 1 <= 66.25) If (feature 2 <= 1008.4) Predict: -6.230272922553423 Else (feature 2 > 1008.4) Predict: -2.654681371247991 Else (feature 1 > 66.25) If (feature 2 <= 1004.52) Predict: -3.9527797601131853 Else (feature 2 > 1004.52) Predict: -0.21770148036273387 Tree 8 (weight 0.1): If (feature 0 <= 29.56) If (feature 3 <= 63.16) If (feature 1 <= 72.24) Predict: 1.9612116105231265 Else (feature 1 > 72.24) Predict: 8.756949826030025 Else (feature 3 > 63.16) If (feature 0 <= 5.95) Predict: 4.445363585074405 Else (feature 0 > 5.95) Predict: -0.4097996897633835 Else (feature 0 > 29.56) If (feature 1 <= 68.3) If (feature 2 <= 1009.9) Predict: -7.882200867406393 Else (feature 2 > 1009.9) Predict: -1.7273221348184091 Else (feature 1 > 68.3) If (feature 2 <= 1013.77) Predict: -0.7219749804525829 Else (feature 2 > 1013.77) Predict: -6.492100849806538 Tree 9 (weight 0.1): If (feature 3 <= 89.83) If (feature 0 <= 25.72) If (feature 0 <= 23.02) Predict: 0.15450088997272685 Else (feature 0 > 23.02) Predict: 3.010254802875794 Else (feature 0 > 25.72) If (feature 1 <= 66.25) Predict: -2.5821765284417615 Else (feature 1 > 66.25) Predict: -0.3935112713804148 Else (feature 3 > 89.83) If (feature 2 <= 1019.52) If (feature 0 <= 7.08) Predict: 3.264389020443774 Else (feature 0 > 7.08) Predict: -1.6246048211383168 Else (feature 2 > 1019.52) If (feature 0 <= 8.75) Predict: -8.005340799169343 Else (feature 0 > 8.75) Predict: -2.9832409167030063 Tree 10 (weight 0.1): If (feature 1 <= 56.57) If (feature 0 <= 17.84) If (feature 1 <= 45.87) Predict: 0.26309432916452813 Else (feature 1 > 45.87) Predict: -5.716473785544373 Else (feature 0 > 17.84) If (feature 2 <= 1012.56) Predict: -0.15863259341493433 Else (feature 2 > 1012.56) Predict: 7.899625065937478 Else (feature 1 > 56.57) If (feature 0 <= 17.84) If (feature 3 <= 67.72) Predict: -27.101084325134025 Else (feature 3 > 67.72) Predict: -12.755339130015875 Else (feature 0 > 17.84) If (feature 0 <= 20.6) Predict: 3.8741798886113408 Else (feature 0 > 20.6) Predict: -0.8179571837367839 Tree 11 (weight 0.1): If (feature 2 <= 1004.52) If (feature 1 <= 74.22) If (feature 1 <= 59.14) Predict: -0.6678644068375302 Else (feature 1 > 59.14) Predict: -5.0251736913870495 Else (feature 1 > 74.22) If (feature 2 <= 1000.68) Predict: -1.9453153753750236 Else (feature 2 > 1000.68) Predict: 3.954565899065237 Else (feature 2 > 1004.52) If (feature 3 <= 60.81) If (feature 0 <= 29.27) Predict: 2.256991118214201 Else (feature 0 > 29.27) Predict: -0.8956432652281918 Else (feature 3 > 60.81) If (feature 0 <= 5.18) Predict: 5.30208686561611 Else (feature 0 > 5.18) Predict: -0.2275806642044292 Tree 12 (weight 0.1): If (feature 3 <= 93.63) If (feature 0 <= 20.6) If (feature 0 <= 17.84) Predict: -0.13650451477274114 Else (feature 0 > 17.84) Predict: 4.26138638419226 Else (feature 0 > 20.6) If (feature 0 <= 23.02) Predict: -4.145788149131118 Else (feature 0 > 23.02) Predict: 0.45010060784860767 Else (feature 3 > 93.63) If (feature 0 <= 11.95) If (feature 0 <= 6.52) Predict: 1.9630864105825856 Else (feature 0 > 6.52) Predict: -5.847103580294793 Else (feature 0 > 11.95) If (feature 1 <= 57.85) Predict: 1.6850763767018282 Else (feature 1 > 57.85) Predict: -3.57522814358917 Tree 13 (weight 0.1): If (feature 1 <= 56.57) If (feature 1 <= 49.39) If (feature 0 <= 13.56) Predict: 0.7497248523199469 Else (feature 0 > 13.56) Predict: -0.8096048572768345 Else (feature 1 > 49.39) If (feature 0 <= 17.84) Predict: -4.9975868045736025 Else (feature 0 > 17.84) Predict: 6.70181838603398 Else (feature 1 > 56.57) If (feature 0 <= 17.84) If (feature 1 <= 58.62) Predict: -8.139327595464518 Else (feature 1 > 58.62) Predict: -11.260696586956563 Else (feature 0 > 17.84) If (feature 2 <= 1020.32) Predict: -0.4173502593107514 Else (feature 2 > 1020.32) Predict: 7.350524302545053 Tree 14 (weight 0.1): If (feature 2 <= 1009.3) If (feature 1 <= 73.67) If (feature 0 <= 26.35) Predict: -0.2834715308768144 Else (feature 0 > 26.35) Predict: -2.2855655986052446 Else (feature 1 > 73.67) If (feature 0 <= 21.42) Predict: -19.886551554013977 Else (feature 0 > 21.42) Predict: 1.8345107899392203 Else (feature 2 > 1009.3) If (feature 0 <= 17.84) If (feature 1 <= 46.93) Predict: 0.2012146645141011 Else (feature 1 > 46.93) Predict: -5.331252849501989 Else (feature 0 > 17.84) If (feature 0 <= 20.6) Predict: 3.9009310043506518 Else (feature 0 > 20.6) Predict: 0.05492627340134294 Tree 15 (weight 0.1): If (feature 3 <= 80.44) If (feature 0 <= 26.57) If (feature 0 <= 23.02) Predict: 0.24935555983937532 Else (feature 0 > 23.02) Predict: 1.9734839371689987 Else (feature 0 > 26.57) If (feature 1 <= 66.25) Predict: -2.652691255012269 Else (feature 1 > 66.25) Predict: 0.10205623249441657 Else (feature 3 > 80.44) If (feature 1 <= 57.85) If (feature 2 <= 1021.65) Predict: 0.3189331596273633 Else (feature 2 > 1021.65) Predict: -2.493847422724499 Else (feature 1 > 57.85) If (feature 0 <= 23.02) Predict: -4.443277995263894 Else (feature 0 > 23.02) Predict: 1.0414575062489446 Tree 16 (weight 0.1): If (feature 0 <= 6.52) If (feature 3 <= 67.72) If (feature 1 <= 39.48) Predict: -0.021931809818089017 Else (feature 1 > 39.48) Predict: 17.644618798102908 Else (feature 3 > 67.72) If (feature 1 <= 42.07) Predict: 2.6927240976688487 Else (feature 1 > 42.07) Predict: -3.720328734281554 Else (feature 0 > 6.52) If (feature 0 <= 8.75) If (feature 0 <= 7.97) Predict: -1.1870026837027776 Else (feature 0 > 7.97) Predict: -6.311604790035118 Else (feature 0 > 8.75) If (feature 0 <= 9.73) Predict: 5.036277690956247 Else (feature 0 > 9.73) Predict: -0.07156864179175153 Tree 17 (weight 0.1): If (feature 2 <= 1005.35) If (feature 1 <= 70.8) If (feature 0 <= 21.14) Predict: 0.2557898848412102 Else (feature 0 > 21.14) Predict: -4.092246463553751 Else (feature 1 > 70.8) If (feature 0 <= 23.02) Predict: -17.7762740471523 Else (feature 0 > 23.02) Predict: 1.4679036019616782 Else (feature 2 > 1005.35) If (feature 3 <= 60.81) If (feature 2 <= 1021.17) Predict: 0.8109918761137652 Else (feature 2 > 1021.17) Predict: 6.491756407811347 Else (feature 3 > 60.81) If (feature 0 <= 25.72) Predict: 0.06495066055048145 Else (feature 0 > 25.72) Predict: -1.234843690619109 Tree 18 (weight 0.1): If (feature 3 <= 93.63) If (feature 0 <= 13.56) If (feature 0 <= 11.95) Predict: -0.1389635939018028 Else (feature 0 > 11.95) Predict: 4.085304226900187 Else (feature 0 > 13.56) If (feature 0 <= 15.33) Predict: -3.558076811842663 Else (feature 0 > 15.33) Predict: 0.24840255719067195 Else (feature 3 > 93.63) If (feature 0 <= 11.95) If (feature 0 <= 6.52) Predict: 1.1725211739721944 Else (feature 0 > 6.52) Predict: -4.696815201291802 Else (feature 0 > 11.95) If (feature 0 <= 23.42) Predict: -0.1435586215485262 Else (feature 0 > 23.42) Predict: 6.017267110381734 Tree 19 (weight 0.1): If (feature 0 <= 29.89) If (feature 3 <= 46.38) If (feature 2 <= 1020.32) Predict: 2.734528637686715 Else (feature 2 > 1020.32) Predict: 14.229272221061546 Else (feature 3 > 46.38) If (feature 1 <= 73.18) Predict: -0.09112932077559661 Else (feature 1 > 73.18) Predict: 2.171636618202333 Else (feature 0 > 29.89) If (feature 1 <= 68.3) If (feature 2 <= 1012.96) Predict: -4.842672386234583 Else (feature 2 > 1012.96) Predict: 0.4656753436410731 Else (feature 1 > 68.3) If (feature 1 <= 69.88) Predict: 1.9998755414672877 Else (feature 1 > 69.88) Predict: -1.377187598546301 Tree 20 (weight 0.1): If (feature 1 <= 40.89) If (feature 0 <= 11.95) If (feature 0 <= 10.74) Predict: 0.3474341793041741 Else (feature 0 > 10.74) Predict: -3.2174625433704844 Else (feature 0 > 11.95) If (feature 0 <= 13.56) Predict: 6.2521753652461385 Else (feature 0 > 13.56) Predict: 0.7467107076401086 Else (feature 1 > 40.89) If (feature 1 <= 41.16) If (feature 2 <= 1011.9) Predict: 1.6159428806525291 Else (feature 2 > 1011.9) Predict: -5.525791920129847 Else (feature 1 > 41.16) If (feature 1 <= 41.48) Predict: 2.3655609293253264 Else (feature 1 > 41.48) Predict: -0.18730957785387015 Tree 21 (weight 0.1): If (feature 0 <= 7.08) If (feature 1 <= 41.58) If (feature 1 <= 41.16) Predict: 1.9153935195932974 Else (feature 1 > 41.16) Predict: 7.0746807427814735 Else (feature 1 > 41.58) If (feature 2 <= 1020.77) Predict: -1.256554177586309 Else (feature 2 > 1020.77) Predict: -26.29941855196938 Else (feature 0 > 7.08) If (feature 0 <= 8.75) If (feature 1 <= 37.8) Predict: -8.544132394601597 Else (feature 1 > 37.8) Predict: -2.6184141709801976 Else (feature 0 > 8.75) If (feature 0 <= 9.73) Predict: 4.069411815161333 Else (feature 0 > 9.73) Predict: -0.06494039395966968 Tree 22 (weight 0.1): If (feature 0 <= 23.02) If (feature 0 <= 21.69) If (feature 0 <= 15.33) Predict: -0.48298234147973435 Else (feature 0 > 15.33) Predict: 1.2747845905419344 Else (feature 0 > 21.69) If (feature 1 <= 66.25) Predict: -3.44223180465188 Else (feature 1 > 66.25) Predict: -9.677838572965495 Else (feature 0 > 23.02) If (feature 0 <= 24.39) If (feature 1 <= 66.25) Predict: 1.4289485230939327 Else (feature 1 > 66.25) Predict: 7.493228657621072 Else (feature 0 > 24.39) If (feature 1 <= 66.25) Predict: -1.55164310941819 Else (feature 1 > 66.25) Predict: 0.5159038364280375 Tree 23 (weight 0.1): If (feature 2 <= 1010.89) If (feature 1 <= 66.93) If (feature 1 <= 43.41) Predict: 0.8366856528539243 Else (feature 1 > 43.41) Predict: -2.146264827541657 Else (feature 1 > 66.93) If (feature 0 <= 23.02) Predict: -4.593173040738928 Else (feature 0 > 23.02) Predict: 0.7595925761507126 Else (feature 2 > 1010.89) If (feature 0 <= 15.33) If (feature 0 <= 14.38) Predict: 0.19019050526253845 Else (feature 0 > 14.38) Predict: -4.931089744789576 Else (feature 0 > 15.33) If (feature 1 <= 56.57) Predict: 2.893896440054576 Else (feature 1 > 56.57) Predict: -0.2411893147021192 Tree 24 (weight 0.1): If (feature 2 <= 1004.52) If (feature 1 <= 39.13) If (feature 0 <= 16.56) Predict: 5.674347262101248 Else (feature 0 > 16.56) Predict: -15.35003850200303 Else (feature 1 > 39.13) If (feature 1 <= 70.8) Predict: -2.2136597249782484 Else (feature 1 > 70.8) Predict: 0.4854909471410394 Else (feature 2 > 1004.52) If (feature 0 <= 23.02) If (feature 0 <= 21.14) Predict: 0.25072963079321764 Else (feature 0 > 21.14) Predict: -3.1127381475029745 Else (feature 0 > 23.02) If (feature 0 <= 24.98) Predict: 2.513302584995404 Else (feature 0 > 24.98) Predict: -0.17126775916442186 Tree 25 (weight 0.1): If (feature 3 <= 76.79) If (feature 0 <= 28.75) If (feature 1 <= 66.25) Predict: 0.1271610430935476 Else (feature 1 > 66.25) Predict: 2.4600009065275934 Else (feature 0 > 28.75) If (feature 1 <= 44.58) Predict: -10.925990145829292 Else (feature 1 > 44.58) Predict: -0.7031644656131009 Else (feature 3 > 76.79) If (feature 0 <= 20.9) If (feature 0 <= 17.84) Predict: -0.3807566877980857 Else (feature 0 > 17.84) Predict: 2.329590528017136 Else (feature 0 > 20.9) If (feature 0 <= 23.02) Predict: -3.741947089345415 Else (feature 0 > 23.02) Predict: -0.3619479813878585 Tree 26 (weight 0.1): If (feature 0 <= 5.18) If (feature 1 <= 42.07) If (feature 3 <= 84.36) Predict: 5.869887042156764 Else (feature 3 > 84.36) Predict: 2.3621425360574837 Else (feature 1 > 42.07) If (feature 2 <= 1007.82) Predict: -1.4185266335795177 Else (feature 2 > 1007.82) Predict: -5.383717178467172 Else (feature 0 > 5.18) If (feature 3 <= 53.32) If (feature 2 <= 1021.17) Predict: 0.6349729680247564 Else (feature 2 > 1021.17) Predict: 9.504309080910616 Else (feature 3 > 53.32) If (feature 0 <= 25.95) Predict: 0.010243524812335326 Else (feature 0 > 25.95) Predict: -0.8173343910336555 Tree 27 (weight 0.1): If (feature 2 <= 1028.38) If (feature 1 <= 74.87) If (feature 1 <= 56.57) Predict: 0.28085003688072396 Else (feature 1 > 56.57) Predict: -0.378551674966564 Else (feature 1 > 74.87) If (feature 0 <= 21.42) Predict: -12.321588273833015 Else (feature 0 > 21.42) Predict: 1.8659669412137414 Else (feature 2 > 1028.38) If (feature 3 <= 89.83) If (feature 3 <= 66.27) Predict: -8.252928408643971 Else (feature 3 > 66.27) Predict: -2.023910717088332 Else (feature 3 > 89.83) If (feature 0 <= 8.39) Predict: -11.472893448110653 Else (feature 0 > 8.39) Predict: -8.030312146910243 Tree 28 (weight 0.1): If (feature 3 <= 85.4) If (feature 0 <= 7.55) If (feature 1 <= 40.05) Predict: 0.3456361310433187 Else (feature 1 > 40.05) Predict: 4.958188742864418 Else (feature 0 > 7.55) If (feature 0 <= 8.75) Predict: -3.0608059226719657 Else (feature 0 > 8.75) Predict: 0.16507864507530287 Else (feature 3 > 85.4) If (feature 2 <= 1015.63) If (feature 2 <= 1014.19) Predict: -0.3593841710339432 Else (feature 2 > 1014.19) Predict: 3.2531365191458024 Else (feature 2 > 1015.63) If (feature 1 <= 40.64) Predict: 1.0007657377910708 Else (feature 1 > 40.64) Predict: -2.132339394694771 Tree 29 (weight 0.1): If (feature 0 <= 30.56) If (feature 3 <= 55.74) If (feature 1 <= 72.24) Predict: 0.8569729911086951 Else (feature 1 > 72.24) Predict: 6.358127096088517 Else (feature 3 > 55.74) If (feature 1 <= 41.48) Predict: 0.43148253820326676 Else (feature 1 > 41.48) Predict: -0.24352278568573174 Else (feature 0 > 30.56) If (feature 2 <= 1014.35) If (feature 1 <= 68.3) Predict: -2.5522103291398683 Else (feature 1 > 68.3) Predict: -0.21266182300917044 Else (feature 2 > 1014.35) If (feature 1 <= 74.87) Predict: -6.498613011225412 Else (feature 1 > 74.87) Predict: 0.9765776955731879 Tree 30 (weight 0.1): If (feature 0 <= 17.84) If (feature 1 <= 45.08) If (feature 0 <= 15.33) Predict: -0.14424299831222268 Else (feature 0 > 15.33) Predict: 1.8754751416891788 Else (feature 1 > 45.08) If (feature 2 <= 1020.77) Predict: -3.097730832691005 Else (feature 2 > 1020.77) Predict: -8.90070153022011 Else (feature 0 > 17.84) If (feature 0 <= 18.71) If (feature 1 <= 49.02) Predict: 1.2726140970398088 Else (feature 1 > 49.02) Predict: 6.649324687634596 Else (feature 0 > 18.71) If (feature 1 <= 46.93) Predict: -2.818245204603037 Else (feature 1 > 46.93) Predict: 0.23586447368304939 Tree 31 (weight 0.1): If (feature 2 <= 1004.52) If (feature 1 <= 59.14) If (feature 1 <= 50.66) Predict: -0.8733348655196066 Else (feature 1 > 50.66) Predict: 7.928862441716025 Else (feature 1 > 59.14) If (feature 1 <= 70.8) Predict: -3.8112988828197807 Else (feature 1 > 70.8) Predict: 0.42812840935226704 Else (feature 2 > 1004.52) If (feature 0 <= 17.84) If (feature 1 <= 46.93) Predict: 0.07282772802501089 Else (feature 1 > 46.93) Predict: -3.3364389464988706 Else (feature 0 > 17.84) If (feature 2 <= 1020.32) Predict: 0.18419167853517965 Else (feature 2 > 1020.32) Predict: 6.584432032190064 Tree 32 (weight 0.1): If (feature 1 <= 56.57) If (feature 1 <= 49.39) If (feature 0 <= 13.56) Predict: 0.36741135502935035 Else (feature 0 > 13.56) Predict: -0.7178818728654812 Else (feature 1 > 49.39) If (feature 0 <= 17.84) Predict: -1.7883686826457996 Else (feature 0 > 17.84) Predict: 4.519745157967235 Else (feature 1 > 56.57) If (feature 0 <= 17.84) If (feature 0 <= 17.5) Predict: -4.182857837547887 Else (feature 0 > 17.5) Predict: -7.917768935292194 Else (feature 0 > 17.84) If (feature 0 <= 19.61) Predict: 2.6880627533068244 Else (feature 0 > 19.61) Predict: -0.2998975340288976 Tree 33 (weight 0.1): If (feature 0 <= 11.95) If (feature 0 <= 11.03) If (feature 3 <= 93.63) Predict: 0.7278554646891878 Else (feature 3 > 93.63) Predict: -2.2492543009893162 Else (feature 0 > 11.03) If (feature 2 <= 1024.3) Predict: -5.536706488618952 Else (feature 2 > 1024.3) Predict: 4.479707018501001 Else (feature 0 > 11.95) If (feature 0 <= 13.08) If (feature 0 <= 12.5) Predict: 5.173128471411881 Else (feature 0 > 12.5) Predict: 2.3834255982190755 Else (feature 0 > 13.08) If (feature 0 <= 15.33) Predict: -1.5022006203890645 Else (feature 0 > 15.33) Predict: 0.15423852245074754 Tree 34 (weight 0.1): If (feature 0 <= 8.75) If (feature 0 <= 7.55) If (feature 3 <= 77.56) Predict: 3.015852739381847 Else (feature 3 > 77.56) Predict: -0.06103236076131486 Else (feature 0 > 7.55) If (feature 3 <= 62.1) Predict: -13.594573386743992 Else (feature 3 > 62.1) Predict: -2.6914920546129273 Else (feature 0 > 8.75) If (feature 0 <= 10.03) If (feature 3 <= 95.45) Predict: 3.213047453934116 Else (feature 3 > 95.45) Predict: -2.3699077010186502 Else (feature 0 > 10.03) If (feature 0 <= 11.95) Predict: -1.841483689919706 Else (feature 0 > 11.95) Predict: 0.1034719724734039 Tree 35 (weight 0.1): If (feature 1 <= 56.57) If (feature 1 <= 49.02) If (feature 1 <= 44.88) Predict: 0.1854471597033813 Else (feature 1 > 44.88) Predict: -1.537157071790549 Else (feature 1 > 49.02) If (feature 2 <= 1009.77) Predict: -0.7176011396833722 Else (feature 2 > 1009.77) Predict: 3.4414962844541495 Else (feature 1 > 56.57) If (feature 1 <= 66.25) If (feature 0 <= 21.92) Predict: 0.6042503983890641 Else (feature 0 > 21.92) Predict: -1.6430682984491796 Else (feature 1 > 66.25) If (feature 0 <= 23.02) Predict: -3.919778656895867 Else (feature 0 > 23.02) Predict: 0.8520833743461524 Tree 36 (weight 0.1): If (feature 0 <= 27.6) If (feature 0 <= 23.02) If (feature 0 <= 22.1) Predict: 0.08610814822616036 Else (feature 0 > 22.1) Predict: -3.39446668206219 Else (feature 0 > 23.02) If (feature 1 <= 66.25) Predict: -0.25067209339950686 Else (feature 1 > 66.25) Predict: 2.1536703058787143 Else (feature 0 > 27.6) If (feature 3 <= 62.1) If (feature 1 <= 74.87) Predict: -0.3912307208100507 Else (feature 1 > 74.87) Predict: 2.6168301411252224 Else (feature 3 > 62.1) If (feature 1 <= 71.8) Predict: -0.1075335658351684 Else (feature 1 > 71.8) Predict: -3.3756176659678685 Tree 37 (weight 0.1): If (feature 0 <= 25.35) If (feature 0 <= 23.02) If (feature 1 <= 64.84) Predict: 0.07789630965601392 Else (feature 1 > 64.84) Predict: -2.8928836560033093 Else (feature 0 > 23.02) If (feature 1 <= 66.25) Predict: 0.13731068060749954 Else (feature 1 > 66.25) Predict: 4.15851454889221 Else (feature 0 > 25.35) If (feature 1 <= 43.65) If (feature 0 <= 27.19) Predict: -16.475158304770883 Else (feature 0 > 27.19) Predict: -7.947134756554647 Else (feature 1 > 43.65) If (feature 3 <= 62.7) Predict: 0.1725950049938879 Else (feature 3 > 62.7) Predict: -1.0926147971432427 Tree 38 (weight 0.1): If (feature 2 <= 1028.38) If (feature 0 <= 30.56) If (feature 3 <= 47.89) Predict: 1.6647926733523803 Else (feature 3 > 47.89) Predict: 0.019004190066623235 Else (feature 0 > 30.56) If (feature 2 <= 1014.35) Predict: -0.6192794789083232 Else (feature 2 > 1014.35) Predict: -4.385760311827676 Else (feature 2 > 1028.38) If (feature 1 <= 39.48) If (feature 0 <= 6.52) Predict: 4.573467616169609 Else (feature 0 > 6.52) Predict: -1.362091279334777 Else (feature 1 > 39.48) If (feature 0 <= 8.75) Predict: -7.0007999537928605 Else (feature 0 > 8.75) Predict: -1.617908469279585 Tree 39 (weight 0.1): If (feature 2 <= 1017.42) If (feature 2 <= 1014.19) If (feature 1 <= 43.13) Predict: 1.2098492492388833 Else (feature 1 > 43.13) Predict: -0.4345828650352739 Else (feature 2 > 1014.19) If (feature 3 <= 96.38) Predict: 1.0830640036331665 Else (feature 3 > 96.38) Predict: -6.6054777318343785 Else (feature 2 > 1017.42) If (feature 2 <= 1019.23) If (feature 1 <= 57.85) Predict: -0.8212874032064794 Else (feature 1 > 57.85) Predict: -2.6667829000634105 Else (feature 2 > 1019.23) If (feature 0 <= 17.84) Predict: -0.39094381687835245 Else (feature 0 > 17.84) Predict: 3.336117383932137 Tree 40 (weight 0.1): If (feature 3 <= 75.23) If (feature 1 <= 40.05) If (feature 1 <= 39.96) Predict: -1.2851367407493581 Else (feature 1 > 39.96) Predict: -9.117459296991676 Else (feature 1 > 40.05) If (feature 1 <= 40.89) Predict: 4.461974679211411 Else (feature 1 > 40.89) Predict: 0.25422282080546216 Else (feature 3 > 75.23) If (feature 0 <= 21.42) If (feature 0 <= 17.84) Predict: -0.11457026696795661 Else (feature 0 > 17.84) Predict: 0.9995406591682215 Else (feature 0 > 21.42) If (feature 0 <= 23.02) Predict: -2.664637163988949 Else (feature 0 > 23.02) Predict: -0.5023743568762508 Tree 41 (weight 0.1): If (feature 2 <= 1001.9) If (feature 1 <= 39.13) If (feature 3 <= 79.95) Predict: 9.0188365708008 Else (feature 3 > 79.95) Predict: 2.9702965803786205 Else (feature 1 > 39.13) If (feature 3 <= 63.68) Predict: -4.052067945951171 Else (feature 3 > 63.68) Predict: -1.0796516186664176 Else (feature 2 > 1001.9) If (feature 0 <= 15.33) If (feature 0 <= 14.38) Predict: 0.15316006561614587 Else (feature 0 > 14.38) Predict: -3.487291240038168 Else (feature 0 > 15.33) If (feature 1 <= 43.13) Predict: 2.5605988792505605 Else (feature 1 > 43.13) Predict: 0.03166127813460667 Tree 42 (weight 0.1): If (feature 0 <= 11.95) If (feature 0 <= 11.42) If (feature 1 <= 38.25) Predict: -2.0532785635493065 Else (feature 1 > 38.25) Predict: 0.4665697970110133 Else (feature 0 > 11.42) If (feature 1 <= 44.2) Predict: -4.178641719198364 Else (feature 1 > 44.2) Predict: -9.84024023297988 Else (feature 0 > 11.95) If (feature 0 <= 13.08) If (feature 1 <= 40.89) Predict: 4.383821312183712 Else (feature 1 > 40.89) Predict: 2.000819554066434 Else (feature 0 > 13.08) If (feature 0 <= 15.33) Predict: -1.0813581518144955 Else (feature 0 > 15.33) Predict: 0.11492139312962121 Tree 43 (weight 0.1): If (feature 0 <= 8.75) If (feature 0 <= 7.97) If (feature 3 <= 86.54) Predict: 0.983392336251922 Else (feature 3 > 86.54) Predict: -0.8690504742953818 Else (feature 0 > 7.97) If (feature 3 <= 62.1) Predict: -20.310342278835464 Else (feature 3 > 62.1) Predict: -2.975869736741497 Else (feature 0 > 8.75) If (feature 0 <= 9.42) If (feature 2 <= 1015.45) Predict: 5.74314556767472 Else (feature 2 > 1015.45) Predict: 2.1033141679659995 Else (feature 0 > 9.42) If (feature 1 <= 40.89) Predict: 0.6933339562649613 Else (feature 1 > 40.89) Predict: -0.10718368674776323 Tree 44 (weight 0.1): If (feature 1 <= 74.87) If (feature 1 <= 71.43) If (feature 1 <= 68.3) Predict: -0.0751396787352361 Else (feature 1 > 68.3) Predict: 1.0387569941322914 Else (feature 1 > 71.43) If (feature 1 <= 72.86) Predict: -2.5461711201599986 Else (feature 1 > 72.86) Predict: -0.0018936704520639966 Else (feature 1 > 74.87) If (feature 1 <= 77.3) If (feature 3 <= 73.33) Predict: 3.4362919081871732 Else (feature 3 > 73.33) Predict: 0.022595797531833054 Else (feature 1 > 77.3) If (feature 2 <= 1012.39) Predict: -2.0026738842740444 Else (feature 2 > 1012.39) Predict: 1.7553499174736846 Tree 45 (weight 0.1): If (feature 2 <= 1005.35) If (feature 1 <= 72.24) If (feature 1 <= 59.14) Predict: 0.030127466104975898 Else (feature 1 > 59.14) Predict: -2.2341894812350676 Else (feature 1 > 72.24) If (feature 3 <= 60.09) Predict: 4.41863108135717 Else (feature 3 > 60.09) Predict: -0.11040726869235623 Else (feature 2 > 1005.35) If (feature 0 <= 31.8) If (feature 1 <= 66.25) Predict: -0.06640264597455495 Else (feature 1 > 66.25) Predict: 0.6711276381424462 Else (feature 0 > 31.8) If (feature 1 <= 62.44) Predict: 18.071299971628946 Else (feature 1 > 62.44) Predict: -1.613111097205577 Tree 46 (weight 0.1): If (feature 0 <= 25.95) If (feature 0 <= 23.02) If (feature 0 <= 22.6) Predict: 0.0037802976144726266 Else (feature 0 > 22.6) Predict: -3.2702083989998565 Else (feature 0 > 23.02) If (feature 1 <= 47.83) Predict: 7.351532379664369 Else (feature 1 > 47.83) Predict: 0.6617643737173495 Else (feature 0 > 25.95) If (feature 3 <= 62.1) If (feature 0 <= 29.89) Predict: 0.7522949567047181 Else (feature 0 > 29.89) Predict: -0.5659530686126862 Else (feature 3 > 62.1) If (feature 1 <= 43.41) Predict: -9.179671352130104 Else (feature 1 > 43.41) Predict: -0.9646184420761758 Tree 47 (weight 0.1): If (feature 0 <= 5.18) If (feature 1 <= 38.62) If (feature 3 <= 77.17) Predict: -4.215696425771664 Else (feature 3 > 77.17) Predict: 5.655069692148392 Else (feature 1 > 38.62) If (feature 1 <= 39.13) Predict: -12.269101167501105 Else (feature 1 > 39.13) Predict: 1.081763483601667 Else (feature 0 > 5.18) If (feature 0 <= 8.75) If (feature 0 <= 7.97) Predict: -0.19756946285599916 Else (feature 0 > 7.97) Predict: -2.7184931590940438 Else (feature 0 > 8.75) If (feature 0 <= 9.42) Predict: 2.558566383813981 Else (feature 0 > 9.42) Predict: -0.006722635545763743 Tree 48 (weight 0.1): If (feature 2 <= 1028.38) If (feature 2 <= 1010.89) If (feature 1 <= 66.93) Predict: -0.7473456438858288 Else (feature 1 > 66.93) Predict: 0.34762458916260297 Else (feature 2 > 1010.89) If (feature 1 <= 58.86) Predict: 0.4001213596367478 Else (feature 1 > 58.86) Predict: -0.33373941983121597 Else (feature 2 > 1028.38) If (feature 1 <= 42.85) If (feature 1 <= 39.48) Predict: 2.1904388134214514 Else (feature 1 > 39.48) Predict: -3.2474441160938956 Else (feature 1 > 42.85) If (feature 3 <= 71.55) Predict: -1.061140549595708 Else (feature 3 > 71.55) Predict: 6.934556118848832 Tree 49 (weight 0.1): If (feature 0 <= 11.95) If (feature 0 <= 10.74) If (feature 0 <= 8.75) Predict: -0.48190999213172564 Else (feature 0 > 8.75) Predict: 1.0350335598803566 Else (feature 0 > 10.74) If (feature 2 <= 1024.3) Predict: -3.057989388513731 Else (feature 2 > 1024.3) Predict: 2.162024696272738 Else (feature 0 > 11.95) If (feature 0 <= 12.5) If (feature 3 <= 86.91) Predict: 4.627051067913808 Else (feature 3 > 86.91) Predict: 0.9386052167341327 Else (feature 0 > 12.5) If (feature 1 <= 37.8) Predict: 4.0889321278523685 Else (feature 1 > 37.8) Predict: -0.02245818963891235 Tree 50 (weight 0.1): If (feature 2 <= 1017.42) If (feature 2 <= 1014.19) If (feature 1 <= 43.13) Predict: 0.9320375696962719 Else (feature 1 > 43.13) Predict: -0.31844348507047093 Else (feature 2 > 1014.19) If (feature 1 <= 42.42) Predict: -0.5988031510673222 Else (feature 1 > 42.42) Predict: 1.3187243855742212 Else (feature 2 > 1017.42) If (feature 2 <= 1019.23) If (feature 1 <= 44.2) Predict: -2.0646082455368195 Else (feature 1 > 44.2) Predict: -0.4969601265683861 Else (feature 2 > 1019.23) If (feature 0 <= 17.84) Predict: -0.2870181057370213 Else (feature 0 > 17.84) Predict: 2.6148230736448608 Tree 51 (weight 0.1): If (feature 1 <= 38.62) If (feature 0 <= 18.4) If (feature 0 <= 5.18) Predict: 3.850885339006515 Else (feature 0 > 5.18) Predict: -0.940687510645146 Else (feature 0 > 18.4) If (feature 0 <= 18.98) Predict: -10.80330040562501 Else (feature 0 > 18.98) Predict: -18.03404880535599 Else (feature 1 > 38.62) If (feature 2 <= 1026.23) If (feature 0 <= 13.56) Predict: 0.5295719576334972 Else (feature 0 > 13.56) Predict: -0.052812717813551166 Else (feature 2 > 1026.23) If (feature 1 <= 40.22) Predict: -4.371246083031292 Else (feature 1 > 40.22) Predict: -1.3541229527292618 Tree 52 (weight 0.1): If (feature 1 <= 66.25) If (feature 1 <= 64.84) If (feature 3 <= 41.26) Predict: 3.045631536773922 Else (feature 3 > 41.26) Predict: -0.0337837562463145 Else (feature 1 > 64.84) If (feature 1 <= 65.27) Predict: -5.921444872611693 Else (feature 1 > 65.27) Predict: -0.8270282146869598 Else (feature 1 > 66.25) If (feature 0 <= 23.02) If (feature 0 <= 19.83) Predict: 1.5405239234096135 Else (feature 0 > 19.83) Predict: -3.1288830506195398 Else (feature 0 > 23.02) If (feature 0 <= 25.35) Predict: 3.2672442442602656 Else (feature 0 > 25.35) Predict: -0.007592990267182966 Tree 53 (weight 0.1): If (feature 0 <= 17.84) If (feature 1 <= 46.93) If (feature 0 <= 17.2) Predict: 0.1228349542857993 Else (feature 0 > 17.2) Predict: -2.392588492043597 Else (feature 1 > 46.93) If (feature 2 <= 1020.77) Predict: -1.8240349072310669 Else (feature 2 > 1020.77) Predict: -6.523289398433308 Else (feature 0 > 17.84) If (feature 0 <= 18.4) If (feature 1 <= 47.83) Predict: 0.5318997435908227 Else (feature 1 > 47.83) Predict: 4.907584149653537 Else (feature 0 > 18.4) If (feature 1 <= 46.93) Predict: -2.110133253015907 Else (feature 1 > 46.93) Predict: 0.20708863671712482 Tree 54 (weight 0.1): If (feature 3 <= 76.79) If (feature 1 <= 40.05) If (feature 1 <= 39.96) Predict: -0.7416033424896232 Else (feature 1 > 39.96) Predict: -6.880323474190146 Else (feature 1 > 40.05) If (feature 1 <= 40.89) Predict: 2.887497917363201 Else (feature 1 > 40.89) Predict: 0.17777582956662522 Else (feature 3 > 76.79) If (feature 0 <= 19.61) If (feature 0 <= 17.84) Predict: -0.09172434324104897 Else (feature 0 > 17.84) Predict: 1.9482862934683598 Else (feature 0 > 19.61) If (feature 2 <= 1010.6) Predict: -0.15262790703036064 Else (feature 2 > 1010.6) Predict: -1.7280878096087295 Tree 55 (weight 0.1): If (feature 0 <= 24.79) If (feature 0 <= 23.02) If (feature 1 <= 66.93) Predict: 0.02682576814507517 Else (feature 1 > 66.93) Predict: -2.323863726560255 Else (feature 0 > 23.02) If (feature 1 <= 47.83) Predict: 6.909290893058579 Else (feature 1 > 47.83) Predict: 0.9944889736997976 Else (feature 0 > 24.79) If (feature 3 <= 65.24) If (feature 0 <= 28.5) Predict: 0.8432916332803679 Else (feature 0 > 28.5) Predict: -0.3680864130080106 Else (feature 3 > 65.24) If (feature 1 <= 66.51) Predict: -2.1147474860288 Else (feature 1 > 66.51) Predict: -0.3834883036951788 Tree 56 (weight 0.1): If (feature 0 <= 15.33) If (feature 0 <= 14.38) If (feature 0 <= 11.95) Predict: -0.3290262091199092 Else (feature 0 > 11.95) Predict: 0.8543511625463592 Else (feature 0 > 14.38) If (feature 2 <= 1016.21) Predict: -0.7208476709379852 Else (feature 2 > 1016.21) Predict: -4.40928839539672 Else (feature 0 > 15.33) If (feature 0 <= 16.22) If (feature 2 <= 1013.19) Predict: 4.554268903891635 Else (feature 2 > 1013.19) Predict: 1.538781048856137 Else (feature 0 > 16.22) If (feature 1 <= 46.93) Predict: -1.1488437756174756 Else (feature 1 > 46.93) Predict: 0.1634274865006602 Tree 57 (weight 0.1): If (feature 2 <= 1007.46) If (feature 1 <= 73.67) If (feature 1 <= 71.43) Predict: -0.28457458674767294 Else (feature 1 > 71.43) Predict: -2.556284198496123 Else (feature 1 > 73.67) If (feature 3 <= 60.81) Predict: 4.31886476056719 Else (feature 3 > 60.81) Predict: 0.3197495651743129 Else (feature 2 > 1007.46) If (feature 0 <= 17.84) If (feature 1 <= 46.93) Predict: 0.04575453109929229 Else (feature 1 > 46.93) Predict: -2.141138284310683 Else (feature 0 > 17.84) If (feature 1 <= 56.57) Predict: 1.3439965861050847 Else (feature 1 > 56.57) Predict: -0.02904919315788331 Tree 58 (weight 0.1): If (feature 0 <= 31.8) If (feature 1 <= 66.25) If (feature 1 <= 64.84) Predict: -0.006836636445003446 Else (feature 1 > 64.84) Predict: -2.0890363043188134 Else (feature 1 > 66.25) If (feature 1 <= 69.05) Predict: 1.8596834938858298 Else (feature 1 > 69.05) Predict: -0.2637818907162569 Else (feature 0 > 31.8) If (feature 1 <= 69.34) If (feature 2 <= 1009.63) Predict: -4.53407923927751 Else (feature 2 > 1009.63) Predict: 1.2479530412848983 Else (feature 1 > 69.34) If (feature 1 <= 69.88) Predict: 5.672382101944148 Else (feature 1 > 69.88) Predict: -0.7728960613425813 Tree 59 (weight 0.1): If (feature 2 <= 1010.89) If (feature 1 <= 68.3) If (feature 1 <= 43.41) Predict: 0.423961936091299 Else (feature 1 > 43.41) Predict: -1.0411314850417004 Else (feature 1 > 68.3) If (feature 1 <= 68.67) Predict: 7.130757445704555 Else (feature 1 > 68.67) Predict: 0.1160942217864609 Else (feature 2 > 1010.89) If (feature 3 <= 93.63) If (feature 1 <= 58.86) Predict: 0.41091291246834866 Else (feature 1 > 58.86) Predict: -0.2764637915143923 Else (feature 3 > 93.63) If (feature 1 <= 41.74) Predict: -3.564757715833512 Else (feature 1 > 41.74) Predict: 1.1644353912440248 Tree 60 (weight 0.1): If (feature 1 <= 48.6) If (feature 1 <= 44.88) If (feature 2 <= 1016.57) Predict: 0.4410572983039277 Else (feature 2 > 1016.57) Predict: -0.44414793681792664 Else (feature 1 > 44.88) If (feature 2 <= 1014.35) Predict: -3.0626378082153085 Else (feature 2 > 1014.35) Predict: 2.0328536525605063 Else (feature 1 > 48.6) If (feature 1 <= 52.05) If (feature 2 <= 1009.9) Predict: 0.24004783900051171 Else (feature 2 > 1009.9) Predict: 3.1645061792332916 Else (feature 1 > 52.05) If (feature 0 <= 17.84) Predict: -1.95074879327582 Else (feature 0 > 17.84) Predict: 0.021106826304965107 Tree 61 (weight 0.1): If (feature 1 <= 74.87) If (feature 1 <= 71.43) If (feature 1 <= 68.3) Predict: -0.06241270845694165 Else (feature 1 > 68.3) Predict: 0.8051320337219834 Else (feature 1 > 71.43) If (feature 0 <= 24.57) Predict: 1.648459594873699 Else (feature 0 > 24.57) Predict: -1.2314608832462137 Else (feature 1 > 74.87) If (feature 1 <= 77.3) If (feature 0 <= 21.42) Predict: -7.482222216002697 Else (feature 0 > 21.42) Predict: 1.8228183337802573 Else (feature 1 > 77.3) If (feature 2 <= 1012.39) Predict: -1.4326641812285505 Else (feature 2 > 1012.39) Predict: 1.7079353624089986 Tree 62 (weight 0.1): If (feature 0 <= 5.18) If (feature 1 <= 42.07) If (feature 3 <= 96.38) Predict: 1.4583097259406885 Else (feature 3 > 96.38) Predict: 7.4053761713858615 Else (feature 1 > 42.07) If (feature 2 <= 1008.19) Predict: 0.311290850436914 Else (feature 2 > 1008.19) Predict: -5.145119802972147 Else (feature 0 > 5.18) If (feature 1 <= 38.62) If (feature 0 <= 18.4) Predict: -0.7259884411546618 Else (feature 0 > 18.4) Predict: -12.427884135864616 Else (feature 1 > 38.62) If (feature 1 <= 39.48) Predict: 1.131291291234381 Else (feature 1 > 39.48) Predict: -0.007004055574359982 Tree 63 (weight 0.1): If (feature 2 <= 1004.52) If (feature 1 <= 70.8) If (feature 1 <= 69.05) Predict: -0.45566718124370104 Else (feature 1 > 69.05) Predict: -3.3633539333883373 Else (feature 1 > 70.8) If (feature 3 <= 70.63) Predict: 1.7061073842258219 Else (feature 3 > 70.63) Predict: -0.35469491259927843 Else (feature 2 > 1004.52) If (feature 0 <= 15.33) If (feature 0 <= 14.13) Predict: 0.13165022513417465 Else (feature 0 > 14.13) Predict: -1.8886218519887454 Else (feature 0 > 15.33) If (feature 1 <= 43.13) Predict: 2.0897911694212086 Else (feature 1 > 43.13) Predict: 0.023571622513158218 Tree 64 (weight 0.1): If (feature 1 <= 41.92) If (feature 1 <= 41.58) If (feature 2 <= 1015.45) Predict: 0.6420804366913081 Else (feature 2 > 1015.45) Predict: -0.3393001000428116 Else (feature 1 > 41.58) If (feature 3 <= 91.38) Predict: -2.959889489145066 Else (feature 3 > 91.38) Predict: -14.822621379271645 Else (feature 1 > 41.92) If (feature 1 <= 43.13) If (feature 0 <= 15.33) Predict: 0.5584851317693598 Else (feature 0 > 15.33) Predict: 5.35806974907062 Else (feature 1 > 43.13) If (feature 1 <= 43.65) Predict: -2.5734171913252673 Else (feature 1 > 43.65) Predict: 0.06206747847844893 Tree 65 (weight 0.1): If (feature 2 <= 1010.89) If (feature 1 <= 66.93) If (feature 0 <= 20.6) Predict: -0.0679333275254979 Else (feature 0 > 20.6) Predict: -1.053808811058633 Else (feature 1 > 66.93) If (feature 1 <= 67.32) Predict: 7.372080266725638 Else (feature 1 > 67.32) Predict: 0.09996335027123535 Else (feature 2 > 1010.89) If (feature 3 <= 75.61) If (feature 1 <= 40.05) Predict: -0.9831581524231143 Else (feature 1 > 40.05) Predict: 0.5486160789249349 Else (feature 3 > 75.61) If (feature 1 <= 58.86) Predict: 0.19399224442246701 Else (feature 1 > 58.86) Predict: -1.5652059699408227 Tree 66 (weight 0.1): If (feature 0 <= 28.75) If (feature 1 <= 73.18) If (feature 1 <= 71.43) Predict: 0.05143978594106816 Else (feature 1 > 71.43) Predict: -1.436513600322334 Else (feature 1 > 73.18) If (feature 3 <= 73.33) Predict: 4.1459864582084975 Else (feature 3 > 73.33) Predict: 0.34965185037807356 Else (feature 0 > 28.75) If (feature 2 <= 1014.54) If (feature 2 <= 1013.43) Predict: -0.4008005884834272 Else (feature 2 > 1013.43) Predict: 3.683818693727259 Else (feature 2 > 1014.54) If (feature 1 <= 67.83) Predict: -0.82614879352537 Else (feature 1 > 67.83) Predict: -4.535981326886069 Tree 67 (weight 0.1): If (feature 1 <= 47.83) If (feature 0 <= 23.02) If (feature 0 <= 18.71) Predict: -0.0010074123242523121 Else (feature 0 > 18.71) Predict: -3.2926535011699234 Else (feature 0 > 23.02) If (feature 2 <= 1012.39) Predict: 1.3034696914565052 Else (feature 2 > 1012.39) Predict: 11.235282784300427 Else (feature 1 > 47.83) If (feature 1 <= 56.57) If (feature 0 <= 17.84) Predict: -1.039931035628621 Else (feature 0 > 17.84) Predict: 1.9905896386111916 Else (feature 1 > 56.57) If (feature 1 <= 57.19) Predict: -2.3357601760278204 Else (feature 1 > 57.19) Predict: -0.0355403353056693 Tree 68 (weight 0.1): If (feature 0 <= 24.79) If (feature 3 <= 41.26) If (feature 1 <= 45.87) Predict: 2.4904273637383265 Else (feature 1 > 45.87) Predict: 13.013875696314063 Else (feature 3 > 41.26) If (feature 1 <= 49.02) Predict: -0.18642415027276396 Else (feature 1 > 49.02) Predict: 0.47121076166963227 Else (feature 0 > 24.79) If (feature 1 <= 65.27) If (feature 1 <= 64.84) Predict: -0.5...

Conclusion

Wow! So our best model is in fact our Gradient Boosted Decision tree model which uses an ensemble of 120 Trees with a depth of 3 to construct a better model than the single decision tree.

Persisting Statistical Machine Learning Models

Show code

Let's save our best model so we can load it without having to rerun the validation and training again.

gbtModel.bestModel.asInstanceOf[PipelineModel].stages.last.asInstanceOf[GBTRegressionModel]
        .write.overwrite().save("dbfs:///databricks/driver/MyTrainedGbtModel")
val sameModel = GBTRegressionModel.load("dbfs:///databricks/driver/MyTrainedGbtModel/")
sameModel: org.apache.spark.ml.regression.GBTRegressionModel = GBTRegressionModel (uid=gbtr_9c5ab45fe584) with 120 trees
// making sure we have the same model loaded from the file
sameModel.toDebugString
res53: String = "GBTRegressionModel (uid=gbtr_9c5ab45fe584) with 120 trees Tree 0 (weight 1.0): If (feature 0 <= 17.84) If (feature 0 <= 11.95) If (feature 0 <= 8.75) Predict: 483.5412151067323 Else (feature 0 > 8.75) Predict: 475.6305502392345 Else (feature 0 > 11.95) If (feature 0 <= 15.33) Predict: 467.63141917293234 Else (feature 0 > 15.33) Predict: 460.74754125412574 Else (feature 0 > 17.84) If (feature 0 <= 23.02) If (feature 1 <= 47.83) Predict: 457.1077966101695 Else (feature 1 > 47.83) Predict: 448.74750213858016 Else (feature 0 > 23.02) If (feature 1 <= 66.25) Predict: 442.88544855967086 Else (feature 1 > 66.25) Predict: 434.7293710691822 Tree 1 (weight 0.1): If (feature 2 <= 1009.9) If (feature 1 <= 43.13) If (feature 0 <= 15.33) Predict: -0.31094207231231485 Else (feature 0 > 15.33) Predict: 4.302958436537365 Else (feature 1 > 43.13) If (feature 0 <= 23.02) Predict: -8.38392506141353 Else (feature 0 > 23.02) Predict: -2.399960976520273 Else (feature 2 > 1009.9) If (feature 3 <= 86.21) If (feature 0 <= 26.35) Predict: 2.888623754019027 Else (feature 0 > 26.35) Predict: -1.1489483044194229 Else (feature 3 > 86.21) If (feature 1 <= 39.72) Predict: 3.876324424163314 Else (feature 1 > 39.72) Predict: -3.058952828112949 Tree 2 (weight 0.1): If (feature 2 <= 1009.63) If (feature 1 <= 43.13) If (feature 0 <= 11.95) Predict: -1.4091086031845625 Else (feature 0 > 11.95) Predict: 2.6329466235800942 Else (feature 1 > 43.13) If (feature 0 <= 23.02) Predict: -6.795414480322956 Else (feature 0 > 23.02) Predict: -2.166560698742912 Else (feature 2 > 1009.63) If (feature 3 <= 80.44) If (feature 0 <= 26.98) Predict: 2.878622882275939 Else (feature 0 > 26.98) Predict: -1.146426969990865 Else (feature 3 > 80.44) If (feature 3 <= 94.55) Predict: -0.35885921725905906 Else (feature 3 > 94.55) Predict: -5.75364586186002 Tree 3 (weight 0.1): If (feature 0 <= 27.6) If (feature 3 <= 70.2) If (feature 1 <= 40.05) Predict: -0.9480831286616939 Else (feature 1 > 40.05) Predict: 3.660397090904016 Else (feature 3 > 70.2) If (feature 1 <= 40.64) Predict: 2.1539405832035627 Else (feature 1 > 40.64) Predict: -1.2281619807661366 Else (feature 0 > 27.6) If (feature 1 <= 65.27) If (feature 2 <= 1005.99) Predict: -15.33433697033558 Else (feature 2 > 1005.99) Predict: -5.866095468145647 Else (feature 1 > 65.27) If (feature 2 <= 1008.75) Predict: -4.03431044067007 Else (feature 2 > 1008.75) Predict: -0.23440867445577207 Tree 4 (weight 0.1): If (feature 0 <= 26.35) If (feature 0 <= 23.02) If (feature 1 <= 68.67) Predict: 0.12035773384797814 Else (feature 1 > 68.67) Predict: -13.928523073642005 Else (feature 0 > 23.02) If (feature 0 <= 24.57) Predict: 5.5622340839882165 Else (feature 0 > 24.57) Predict: 1.6938172370244715 Else (feature 0 > 26.35) If (feature 1 <= 66.25) If (feature 2 <= 1008.4) Predict: -9.009916879825393 Else (feature 2 > 1008.4) Predict: -3.059736394918022 Else (feature 1 > 66.25) If (feature 0 <= 30.2) Predict: 0.14704705100738577 Else (feature 0 > 30.2) Predict: -3.2914123948921006 Tree 5 (weight 0.1): If (feature 2 <= 1010.41) If (feature 1 <= 43.41) If (feature 3 <= 99.27) Predict: 1.2994444710077233 Else (feature 3 > 99.27) Predict: -6.649548317319231 Else (feature 1 > 43.41) If (feature 0 <= 23.02) Predict: -4.9119452777748 Else (feature 0 > 23.02) Predict: -0.9514185089440673 Else (feature 2 > 1010.41) If (feature 3 <= 89.83) If (feature 0 <= 31.26) Predict: 1.2914123584761403 Else (feature 0 > 31.26) Predict: -5.115001417285994 Else (feature 3 > 89.83) If (feature 1 <= 40.64) Predict: 1.5160976219176363 Else (feature 1 > 40.64) Predict: -4.202813699523934 Tree 6 (weight 0.1): If (feature 2 <= 1007.27) If (feature 0 <= 27.94) If (feature 3 <= 71.09) Predict: 1.616448005210527 Else (feature 3 > 71.09) Predict: -2.1313527108274157 Else (feature 0 > 27.94) If (feature 1 <= 68.3) Predict: -8.579840063013142 Else (feature 1 > 68.3) Predict: -1.915909819494233 Else (feature 2 > 1007.27) If (feature 3 <= 95.45) If (feature 0 <= 6.52) Predict: 4.973465595410054 Else (feature 0 > 6.52) Predict: 0.3837975458985242 Else (feature 3 > 95.45) If (feature 2 <= 1013.43) Predict: -0.8175453481344352 Else (feature 2 > 1013.43) Predict: -7.264843604639278 Tree 7 (weight 0.1): If (feature 0 <= 26.35) If (feature 3 <= 71.09) If (feature 1 <= 67.83) Predict: 1.9620965187817083 Else (feature 1 > 67.83) Predict: 7.953863660960779 Else (feature 3 > 71.09) If (feature 1 <= 40.89) Predict: 1.2020440154192213 Else (feature 1 > 40.89) Predict: -0.9989659748111419 Else (feature 0 > 26.35) If (feature 1 <= 66.25) If (feature 2 <= 1008.4) Predict: -6.230272922553423 Else (feature 2 > 1008.4) Predict: -2.654681371247991 Else (feature 1 > 66.25) If (feature 2 <= 1004.52) Predict: -3.9527797601131853 Else (feature 2 > 1004.52) Predict: -0.21770148036273387 Tree 8 (weight 0.1): If (feature 0 <= 29.56) If (feature 3 <= 63.16) If (feature 1 <= 72.24) Predict: 1.9612116105231265 Else (feature 1 > 72.24) Predict: 8.756949826030025 Else (feature 3 > 63.16) If (feature 0 <= 5.95) Predict: 4.445363585074405 Else (feature 0 > 5.95) Predict: -0.4097996897633835 Else (feature 0 > 29.56) If (feature 1 <= 68.3) If (feature 2 <= 1009.9) Predict: -7.882200867406393 Else (feature 2 > 1009.9) Predict: -1.7273221348184091 Else (feature 1 > 68.3) If (feature 2 <= 1013.77) Predict: -0.7219749804525829 Else (feature 2 > 1013.77) Predict: -6.492100849806538 Tree 9 (weight 0.1): If (feature 3 <= 89.83) If (feature 0 <= 25.72) If (feature 0 <= 23.02) Predict: 0.15450088997272685 Else (feature 0 > 23.02) Predict: 3.010254802875794 Else (feature 0 > 25.72) If (feature 1 <= 66.25) Predict: -2.5821765284417615 Else (feature 1 > 66.25) Predict: -0.3935112713804148 Else (feature 3 > 89.83) If (feature 2 <= 1019.52) If (feature 0 <= 7.08) Predict: 3.264389020443774 Else (feature 0 > 7.08) Predict: -1.6246048211383168 Else (feature 2 > 1019.52) If (feature 0 <= 8.75) Predict: -8.005340799169343 Else (feature 0 > 8.75) Predict: -2.9832409167030063 Tree 10 (weight 0.1): If (feature 1 <= 56.57) If (feature 0 <= 17.84) If (feature 1 <= 45.87) Predict: 0.26309432916452813 Else (feature 1 > 45.87) Predict: -5.716473785544373 Else (feature 0 > 17.84) If (feature 2 <= 1012.56) Predict: -0.15863259341493433 Else (feature 2 > 1012.56) Predict: 7.899625065937478 Else (feature 1 > 56.57) If (feature 0 <= 17.84) If (feature 3 <= 67.72) Predict: -27.101084325134025 Else (feature 3 > 67.72) Predict: -12.755339130015875 Else (feature 0 > 17.84) If (feature 0 <= 20.6) Predict: 3.8741798886113408 Else (feature 0 > 20.6) Predict: -0.8179571837367839 Tree 11 (weight 0.1): If (feature 2 <= 1004.52) If (feature 1 <= 74.22) If (feature 1 <= 59.14) Predict: -0.6678644068375302 Else (feature 1 > 59.14) Predict: -5.0251736913870495 Else (feature 1 > 74.22) If (feature 2 <= 1000.68) Predict: -1.9453153753750236 Else (feature 2 > 1000.68) Predict: 3.954565899065237 Else (feature 2 > 1004.52) If (feature 3 <= 60.81) If (feature 0 <= 29.27) Predict: 2.256991118214201 Else (feature 0 > 29.27) Predict: -0.8956432652281918 Else (feature 3 > 60.81) If (feature 0 <= 5.18) Predict: 5.30208686561611 Else (feature 0 > 5.18) Predict: -0.2275806642044292 Tree 12 (weight 0.1): If (feature 3 <= 93.63) If (feature 0 <= 20.6) If (feature 0 <= 17.84) Predict: -0.13650451477274114 Else (feature 0 > 17.84) Predict: 4.26138638419226 Else (feature 0 > 20.6) If (feature 0 <= 23.02) Predict: -4.145788149131118 Else (feature 0 > 23.02) Predict: 0.45010060784860767 Else (feature 3 > 93.63) If (feature 0 <= 11.95) If (feature 0 <= 6.52) Predict: 1.9630864105825856 Else (feature 0 > 6.52) Predict: -5.847103580294793 Else (feature 0 > 11.95) If (feature 1 <= 57.85) Predict: 1.6850763767018282 Else (feature 1 > 57.85) Predict: -3.57522814358917 Tree 13 (weight 0.1): If (feature 1 <= 56.57) If (feature 1 <= 49.39) If (feature 0 <= 13.56) Predict: 0.7497248523199469 Else (feature 0 > 13.56) Predict: -0.8096048572768345 Else (feature 1 > 49.39) If (feature 0 <= 17.84) Predict: -4.9975868045736025 Else (feature 0 > 17.84) Predict: 6.70181838603398 Else (feature 1 > 56.57) If (feature 0 <= 17.84) If (feature 1 <= 58.62) Predict: -8.139327595464518 Else (feature 1 > 58.62) Predict: -11.260696586956563 Else (feature 0 > 17.84) If (feature 2 <= 1020.32) Predict: -0.4173502593107514 Else (feature 2 > 1020.32) Predict: 7.350524302545053 Tree 14 (weight 0.1): If (feature 2 <= 1009.3) If (feature 1 <= 73.67) If (feature 0 <= 26.35) Predict: -0.2834715308768144 Else (feature 0 > 26.35) Predict: -2.2855655986052446 Else (feature 1 > 73.67) If (feature 0 <= 21.42) Predict: -19.886551554013977 Else (feature 0 > 21.42) Predict: 1.8345107899392203 Else (feature 2 > 1009.3) If (feature 0 <= 17.84) If (feature 1 <= 46.93) Predict: 0.2012146645141011 Else (feature 1 > 46.93) Predict: -5.331252849501989 Else (feature 0 > 17.84) If (feature 0 <= 20.6) Predict: 3.9009310043506518 Else (feature 0 > 20.6) Predict: 0.05492627340134294 Tree 15 (weight 0.1): If (feature 3 <= 80.44) If (feature 0 <= 26.57) If (feature 0 <= 23.02) Predict: 0.24935555983937532 Else (feature 0 > 23.02) Predict: 1.9734839371689987 Else (feature 0 > 26.57) If (feature 1 <= 66.25) Predict: -2.652691255012269 Else (feature 1 > 66.25) Predict: 0.10205623249441657 Else (feature 3 > 80.44) If (feature 1 <= 57.85) If (feature 2 <= 1021.65) Predict: 0.3189331596273633 Else (feature 2 > 1021.65) Predict: -2.493847422724499 Else (feature 1 > 57.85) If (feature 0 <= 23.02) Predict: -4.443277995263894 Else (feature 0 > 23.02) Predict: 1.0414575062489446 Tree 16 (weight 0.1): If (feature 0 <= 6.52) If (feature 3 <= 67.72) If (feature 1 <= 39.48) Predict: -0.021931809818089017 Else (feature 1 > 39.48) Predict: 17.644618798102908 Else (feature 3 > 67.72) If (feature 1 <= 42.07) Predict: 2.6927240976688487 Else (feature 1 > 42.07) Predict: -3.720328734281554 Else (feature 0 > 6.52) If (feature 0 <= 8.75) If (feature 0 <= 7.97) Predict: -1.1870026837027776 Else (feature 0 > 7.97) Predict: -6.311604790035118 Else (feature 0 > 8.75) If (feature 0 <= 9.73) Predict: 5.036277690956247 Else (feature 0 > 9.73) Predict: -0.07156864179175153 Tree 17 (weight 0.1): If (feature 2 <= 1005.35) If (feature 1 <= 70.8) If (feature 0 <= 21.14) Predict: 0.2557898848412102 Else (feature 0 > 21.14) Predict: -4.092246463553751 Else (feature 1 > 70.8) If (feature 0 <= 23.02) Predict: -17.7762740471523 Else (feature 0 > 23.02) Predict: 1.4679036019616782 Else (feature 2 > 1005.35) If (feature 3 <= 60.81) If (feature 2 <= 1021.17) Predict: 0.8109918761137652 Else (feature 2 > 1021.17) Predict: 6.491756407811347 Else (feature 3 > 60.81) If (feature 0 <= 25.72) Predict: 0.06495066055048145 Else (feature 0 > 25.72) Predict: -1.234843690619109 Tree 18 (weight 0.1): If (feature 3 <= 93.63) If (feature 0 <= 13.56) If (feature 0 <= 11.95) Predict: -0.1389635939018028 Else (feature 0 > 11.95) Predict: 4.085304226900187 Else (feature 0 > 13.56) If (feature 0 <= 15.33) Predict: -3.558076811842663 Else (feature 0 > 15.33) Predict: 0.24840255719067195 Else (feature 3 > 93.63) If (feature 0 <= 11.95) If (feature 0 <= 6.52) Predict: 1.1725211739721944 Else (feature 0 > 6.52) Predict: -4.696815201291802 Else (feature 0 > 11.95) If (feature 0 <= 23.42) Predict: -0.1435586215485262 Else (feature 0 > 23.42) Predict: 6.017267110381734 Tree 19 (weight 0.1): If (feature 0 <= 29.89) If (feature 3 <= 46.38) If (feature 2 <= 1020.32) Predict: 2.734528637686715 Else (feature 2 > 1020.32) Predict: 14.229272221061546 Else (feature 3 > 46.38) If (feature 1 <= 73.18) Predict: -0.09112932077559661 Else (feature 1 > 73.18) Predict: 2.171636618202333 Else (feature 0 > 29.89) If (feature 1 <= 68.3) If (feature 2 <= 1012.96) Predict: -4.842672386234583 Else (feature 2 > 1012.96) Predict: 0.4656753436410731 Else (feature 1 > 68.3) If (feature 1 <= 69.88) Predict: 1.9998755414672877 Else (feature 1 > 69.88) Predict: -1.377187598546301 Tree 20 (weight 0.1): If (feature 1 <= 40.89) If (feature 0 <= 11.95) If (feature 0 <= 10.74) Predict: 0.3474341793041741 Else (feature 0 > 10.74) Predict: -3.2174625433704844 Else (feature 0 > 11.95) If (feature 0 <= 13.56) Predict: 6.2521753652461385 Else (feature 0 > 13.56) Predict: 0.7467107076401086 Else (feature 1 > 40.89) If (feature 1 <= 41.16) If (feature 2 <= 1011.9) Predict: 1.6159428806525291 Else (feature 2 > 1011.9) Predict: -5.525791920129847 Else (feature 1 > 41.16) If (feature 1 <= 41.48) Predict: 2.3655609293253264 Else (feature 1 > 41.48) Predict: -0.18730957785387015 Tree 21 (weight 0.1): If (feature 0 <= 7.08) If (feature 1 <= 41.58) If (feature 1 <= 41.16) Predict: 1.9153935195932974 Else (feature 1 > 41.16) Predict: 7.0746807427814735 Else (feature 1 > 41.58) If (feature 2 <= 1020.77) Predict: -1.256554177586309 Else (feature 2 > 1020.77) Predict: -26.29941855196938 Else (feature 0 > 7.08) If (feature 0 <= 8.75) If (feature 1 <= 37.8) Predict: -8.544132394601597 Else (feature 1 > 37.8) Predict: -2.6184141709801976 Else (feature 0 > 8.75) If (feature 0 <= 9.73) Predict: 4.069411815161333 Else (feature 0 > 9.73) Predict: -0.06494039395966968 Tree 22 (weight 0.1): If (feature 0 <= 23.02) If (feature 0 <= 21.69) If (feature 0 <= 15.33) Predict: -0.48298234147973435 Else (feature 0 > 15.33) Predict: 1.2747845905419344 Else (feature 0 > 21.69) If (feature 1 <= 66.25) Predict: -3.44223180465188 Else (feature 1 > 66.25) Predict: -9.677838572965495 Else (feature 0 > 23.02) If (feature 0 <= 24.39) If (feature 1 <= 66.25) Predict: 1.4289485230939327 Else (feature 1 > 66.25) Predict: 7.493228657621072 Else (feature 0 > 24.39) If (feature 1 <= 66.25) Predict: -1.55164310941819 Else (feature 1 > 66.25) Predict: 0.5159038364280375 Tree 23 (weight 0.1): If (feature 2 <= 1010.89) If (feature 1 <= 66.93) If (feature 1 <= 43.41) Predict: 0.8366856528539243 Else (feature 1 > 43.41) Predict: -2.146264827541657 Else (feature 1 > 66.93) If (feature 0 <= 23.02) Predict: -4.593173040738928 Else (feature 0 > 23.02) Predict: 0.7595925761507126 Else (feature 2 > 1010.89) If (feature 0 <= 15.33) If (feature 0 <= 14.38) Predict: 0.19019050526253845 Else (feature 0 > 14.38) Predict: -4.931089744789576 Else (feature 0 > 15.33) If (feature 1 <= 56.57) Predict: 2.893896440054576 Else (feature 1 > 56.57) Predict: -0.2411893147021192 Tree 24 (weight 0.1): If (feature 2 <= 1004.52) If (feature 1 <= 39.13) If (feature 0 <= 16.56) Predict: 5.674347262101248 Else (feature 0 > 16.56) Predict: -15.35003850200303 Else (feature 1 > 39.13) If (feature 1 <= 70.8) Predict: -2.2136597249782484 Else (feature 1 > 70.8) Predict: 0.4854909471410394 Else (feature 2 > 1004.52) If (feature 0 <= 23.02) If (feature 0 <= 21.14) Predict: 0.25072963079321764 Else (feature 0 > 21.14) Predict: -3.1127381475029745 Else (feature 0 > 23.02) If (feature 0 <= 24.98) Predict: 2.513302584995404 Else (feature 0 > 24.98) Predict: -0.17126775916442186 Tree 25 (weight 0.1): If (feature 3 <= 76.79) If (feature 0 <= 28.75) If (feature 1 <= 66.25) Predict: 0.1271610430935476 Else (feature 1 > 66.25) Predict: 2.4600009065275934 Else (feature 0 > 28.75) If (feature 1 <= 44.58) Predict: -10.925990145829292 Else (feature 1 > 44.58) Predict: -0.7031644656131009 Else (feature 3 > 76.79) If (feature 0 <= 20.9) If (feature 0 <= 17.84) Predict: -0.3807566877980857 Else (feature 0 > 17.84) Predict: 2.329590528017136 Else (feature 0 > 20.9) If (feature 0 <= 23.02) Predict: -3.741947089345415 Else (feature 0 > 23.02) Predict: -0.3619479813878585 Tree 26 (weight 0.1): If (feature 0 <= 5.18) If (feature 1 <= 42.07) If (feature 3 <= 84.36) Predict: 5.869887042156764 Else (feature 3 > 84.36) Predict: 2.3621425360574837 Else (feature 1 > 42.07) If (feature 2 <= 1007.82) Predict: -1.4185266335795177 Else (feature 2 > 1007.82) Predict: -5.383717178467172 Else (feature 0 > 5.18) If (feature 3 <= 53.32) If (feature 2 <= 1021.17) Predict: 0.6349729680247564 Else (feature 2 > 1021.17) Predict: 9.504309080910616 Else (feature 3 > 53.32) If (feature 0 <= 25.95) Predict: 0.010243524812335326 Else (feature 0 > 25.95) Predict: -0.8173343910336555 Tree 27 (weight 0.1): If (feature 2 <= 1028.38) If (feature 1 <= 74.87) If (feature 1 <= 56.57) Predict: 0.28085003688072396 Else (feature 1 > 56.57) Predict: -0.378551674966564 Else (feature 1 > 74.87) If (feature 0 <= 21.42) Predict: -12.321588273833015 Else (feature 0 > 21.42) Predict: 1.8659669412137414 Else (feature 2 > 1028.38) If (feature 3 <= 89.83) If (feature 3 <= 66.27) Predict: -8.252928408643971 Else (feature 3 > 66.27) Predict: -2.023910717088332 Else (feature 3 > 89.83) If (feature 0 <= 8.39) Predict: -11.472893448110653 Else (feature 0 > 8.39) Predict: -8.030312146910243 Tree 28 (weight 0.1): If (feature 3 <= 85.4) If (feature 0 <= 7.55) If (feature 1 <= 40.05) Predict: 0.3456361310433187 Else (feature 1 > 40.05) Predict: 4.958188742864418 Else (feature 0 > 7.55) If (feature 0 <= 8.75) Predict: -3.0608059226719657 Else (feature 0 > 8.75) Predict: 0.16507864507530287 Else (feature 3 > 85.4) If (feature 2 <= 1015.63) If (feature 2 <= 1014.19) Predict: -0.3593841710339432 Else (feature 2 > 1014.19) Predict: 3.2531365191458024 Else (feature 2 > 1015.63) If (feature 1 <= 40.64) Predict: 1.0007657377910708 Else (feature 1 > 40.64) Predict: -2.132339394694771 Tree 29 (weight 0.1): If (feature 0 <= 30.56) If (feature 3 <= 55.74) If (feature 1 <= 72.24) Predict: 0.8569729911086951 Else (feature 1 > 72.24) Predict: 6.358127096088517 Else (feature 3 > 55.74) If (feature 1 <= 41.48) Predict: 0.43148253820326676 Else (feature 1 > 41.48) Predict: -0.24352278568573174 Else (feature 0 > 30.56) If (feature 2 <= 1014.35) If (feature 1 <= 68.3) Predict: -2.5522103291398683 Else (feature 1 > 68.3) Predict: -0.21266182300917044 Else (feature 2 > 1014.35) If (feature 1 <= 74.87) Predict: -6.498613011225412 Else (feature 1 > 74.87) Predict: 0.9765776955731879 Tree 30 (weight 0.1): If (feature 0 <= 17.84) If (feature 1 <= 45.08) If (feature 0 <= 15.33) Predict: -0.14424299831222268 Else (feature 0 > 15.33) Predict: 1.8754751416891788 Else (feature 1 > 45.08) If (feature 2 <= 1020.77) Predict: -3.097730832691005 Else (feature 2 > 1020.77) Predict: -8.90070153022011 Else (feature 0 > 17.84) If (feature 0 <= 18.71) If (feature 1 <= 49.02) Predict: 1.2726140970398088 Else (feature 1 > 49.02) Predict: 6.649324687634596 Else (feature 0 > 18.71) If (feature 1 <= 46.93) Predict: -2.818245204603037 Else (feature 1 > 46.93) Predict: 0.23586447368304939 Tree 31 (weight 0.1): If (feature 2 <= 1004.52) If (feature 1 <= 59.14) If (feature 1 <= 50.66) Predict: -0.8733348655196066 Else (feature 1 > 50.66) Predict: 7.928862441716025 Else (feature 1 > 59.14) If (feature 1 <= 70.8) Predict: -3.8112988828197807 Else (feature 1 > 70.8) Predict: 0.42812840935226704 Else (feature 2 > 1004.52) If (feature 0 <= 17.84) If (feature 1 <= 46.93) Predict: 0.07282772802501089 Else (feature 1 > 46.93) Predict: -3.3364389464988706 Else (feature 0 > 17.84) If (feature 2 <= 1020.32) Predict: 0.18419167853517965 Else (feature 2 > 1020.32) Predict: 6.584432032190064 Tree 32 (weight 0.1): If (feature 1 <= 56.57) If (feature 1 <= 49.39) If (feature 0 <= 13.56) Predict: 0.36741135502935035 Else (feature 0 > 13.56) Predict: -0.7178818728654812 Else (feature 1 > 49.39) If (feature 0 <= 17.84) Predict: -1.7883686826457996 Else (feature 0 > 17.84) Predict: 4.519745157967235 Else (feature 1 > 56.57) If (feature 0 <= 17.84) If (feature 0 <= 17.5) Predict: -4.182857837547887 Else (feature 0 > 17.5) Predict: -7.917768935292194 Else (feature 0 > 17.84) If (feature 0 <= 19.61) Predict: 2.6880627533068244 Else (feature 0 > 19.61) Predict: -0.2998975340288976 Tree 33 (weight 0.1): If (feature 0 <= 11.95) If (feature 0 <= 11.03) If (feature 3 <= 93.63) Predict: 0.7278554646891878 Else (feature 3 > 93.63) Predict: -2.2492543009893162 Else (feature 0 > 11.03) If (feature 2 <= 1024.3) Predict: -5.536706488618952 Else (feature 2 > 1024.3) Predict: 4.479707018501001 Else (feature 0 > 11.95) If (feature 0 <= 13.08) If (feature 0 <= 12.5) Predict: 5.173128471411881 Else (feature 0 > 12.5) Predict: 2.3834255982190755 Else (feature 0 > 13.08) If (feature 0 <= 15.33) Predict: -1.5022006203890645 Else (feature 0 > 15.33) Predict: 0.15423852245074754 Tree 34 (weight 0.1): If (feature 0 <= 8.75) If (feature 0 <= 7.55) If (feature 3 <= 77.56) Predict: 3.015852739381847 Else (feature 3 > 77.56) Predict: -0.06103236076131486 Else (feature 0 > 7.55) If (feature 3 <= 62.1) Predict: -13.594573386743992 Else (feature 3 > 62.1) Predict: -2.6914920546129273 Else (feature 0 > 8.75) If (feature 0 <= 10.03) If (feature 3 <= 95.45) Predict: 3.213047453934116 Else (feature 3 > 95.45) Predict: -2.3699077010186502 Else (feature 0 > 10.03) If (feature 0 <= 11.95) Predict: -1.841483689919706 Else (feature 0 > 11.95) Predict: 0.1034719724734039 Tree 35 (weight 0.1): If (feature 1 <= 56.57) If (feature 1 <= 49.02) If (feature 1 <= 44.88) Predict: 0.1854471597033813 Else (feature 1 > 44.88) Predict: -1.537157071790549 Else (feature 1 > 49.02) If (feature 2 <= 1009.77) Predict: -0.7176011396833722 Else (feature 2 > 1009.77) Predict: 3.4414962844541495 Else (feature 1 > 56.57) If (feature 1 <= 66.25) If (feature 0 <= 21.92) Predict: 0.6042503983890641 Else (feature 0 > 21.92) Predict: -1.6430682984491796 Else (feature 1 > 66.25) If (feature 0 <= 23.02) Predict: -3.919778656895867 Else (feature 0 > 23.02) Predict: 0.8520833743461524 Tree 36 (weight 0.1): If (feature 0 <= 27.6) If (feature 0 <= 23.02) If (feature 0 <= 22.1) Predict: 0.08610814822616036 Else (feature 0 > 22.1) Predict: -3.39446668206219 Else (feature 0 > 23.02) If (feature 1 <= 66.25) Predict: -0.25067209339950686 Else (feature 1 > 66.25) Predict: 2.1536703058787143 Else (feature 0 > 27.6) If (feature 3 <= 62.1) If (feature 1 <= 74.87) Predict: -0.3912307208100507 Else (feature 1 > 74.87) Predict: 2.6168301411252224 Else (feature 3 > 62.1) If (feature 1 <= 71.8) Predict: -0.1075335658351684 Else (feature 1 > 71.8) Predict: -3.3756176659678685 Tree 37 (weight 0.1): If (feature 0 <= 25.35) If (feature 0 <= 23.02) If (feature 1 <= 64.84) Predict: 0.07789630965601392 Else (feature 1 > 64.84) Predict: -2.8928836560033093 Else (feature 0 > 23.02) If (feature 1 <= 66.25) Predict: 0.13731068060749954 Else (feature 1 > 66.25) Predict: 4.15851454889221 Else (feature 0 > 25.35) If (feature 1 <= 43.65) If (feature 0 <= 27.19) Predict: -16.475158304770883 Else (feature 0 > 27.19) Predict: -7.947134756554647 Else (feature 1 > 43.65) If (feature 3 <= 62.7) Predict: 0.1725950049938879 Else (feature 3 > 62.7) Predict: -1.0926147971432427 Tree 38 (weight 0.1): If (feature 2 <= 1028.38) If (feature 0 <= 30.56) If (feature 3 <= 47.89) Predict: 1.6647926733523803 Else (feature 3 > 47.89) Predict: 0.019004190066623235 Else (feature 0 > 30.56) If (feature 2 <= 1014.35) Predict: -0.6192794789083232 Else (feature 2 > 1014.35) Predict: -4.385760311827676 Else (feature 2 > 1028.38) If (feature 1 <= 39.48) If (feature 0 <= 6.52) Predict: 4.573467616169609 Else (feature 0 > 6.52) Predict: -1.362091279334777 Else (feature 1 > 39.48) If (feature 0 <= 8.75) Predict: -7.0007999537928605 Else (feature 0 > 8.75) Predict: -1.617908469279585 Tree 39 (weight 0.1): If (feature 2 <= 1017.42) If (feature 2 <= 1014.19) If (feature 1 <= 43.13) Predict: 1.2098492492388833 Else (feature 1 > 43.13) Predict: -0.4345828650352739 Else (feature 2 > 1014.19) If (feature 3 <= 96.38) Predict: 1.0830640036331665 Else (feature 3 > 96.38) Predict: -6.6054777318343785 Else (feature 2 > 1017.42) If (feature 2 <= 1019.23) If (feature 1 <= 57.85) Predict: -0.8212874032064794 Else (feature 1 > 57.85) Predict: -2.6667829000634105 Else (feature 2 > 1019.23) If (feature 0 <= 17.84) Predict: -0.39094381687835245 Else (feature 0 > 17.84) Predict: 3.336117383932137 Tree 40 (weight 0.1): If (feature 3 <= 75.23) If (feature 1 <= 40.05) If (feature 1 <= 39.96) Predict: -1.2851367407493581 Else (feature 1 > 39.96) Predict: -9.117459296991676 Else (feature 1 > 40.05) If (feature 1 <= 40.89) Predict: 4.461974679211411 Else (feature 1 > 40.89) Predict: 0.25422282080546216 Else (feature 3 > 75.23) If (feature 0 <= 21.42) If (feature 0 <= 17.84) Predict: -0.11457026696795661 Else (feature 0 > 17.84) Predict: 0.9995406591682215 Else (feature 0 > 21.42) If (feature 0 <= 23.02) Predict: -2.664637163988949 Else (feature 0 > 23.02) Predict: -0.5023743568762508 Tree 41 (weight 0.1): If (feature 2 <= 1001.9) If (feature 1 <= 39.13) If (feature 3 <= 79.95) Predict: 9.0188365708008 Else (feature 3 > 79.95) Predict: 2.9702965803786205 Else (feature 1 > 39.13) If (feature 3 <= 63.68) Predict: -4.052067945951171 Else (feature 3 > 63.68) Predict: -1.0796516186664176 Else (feature 2 > 1001.9) If (feature 0 <= 15.33) If (feature 0 <= 14.38) Predict: 0.15316006561614587 Else (feature 0 > 14.38) Predict: -3.487291240038168 Else (feature 0 > 15.33) If (feature 1 <= 43.13) Predict: 2.5605988792505605 Else (feature 1 > 43.13) Predict: 0.03166127813460667 Tree 42 (weight 0.1): If (feature 0 <= 11.95) If (feature 0 <= 11.42) If (feature 1 <= 38.25) Predict: -2.0532785635493065 Else (feature 1 > 38.25) Predict: 0.4665697970110133 Else (feature 0 > 11.42) If (feature 1 <= 44.2) Predict: -4.178641719198364 Else (feature 1 > 44.2) Predict: -9.84024023297988 Else (feature 0 > 11.95) If (feature 0 <= 13.08) If (feature 1 <= 40.89) Predict: 4.383821312183712 Else (feature 1 > 40.89) Predict: 2.000819554066434 Else (feature 0 > 13.08) If (feature 0 <= 15.33) Predict: -1.0813581518144955 Else (feature 0 > 15.33) Predict: 0.11492139312962121 Tree 43 (weight 0.1): If (feature 0 <= 8.75) If (feature 0 <= 7.97) If (feature 3 <= 86.54) Predict: 0.983392336251922 Else (feature 3 > 86.54) Predict: -0.8690504742953818 Else (feature 0 > 7.97) If (feature 3 <= 62.1) Predict: -20.310342278835464 Else (feature 3 > 62.1) Predict: -2.975869736741497 Else (feature 0 > 8.75) If (feature 0 <= 9.42) If (feature 2 <= 1015.45) Predict: 5.74314556767472 Else (feature 2 > 1015.45) Predict: 2.1033141679659995 Else (feature 0 > 9.42) If (feature 1 <= 40.89) Predict: 0.6933339562649613 Else (feature 1 > 40.89) Predict: -0.10718368674776323 Tree 44 (weight 0.1): If (feature 1 <= 74.87) If (feature 1 <= 71.43) If (feature 1 <= 68.3) Predict: -0.0751396787352361 Else (feature 1 > 68.3) Predict: 1.0387569941322914 Else (feature 1 > 71.43) If (feature 1 <= 72.86) Predict: -2.5461711201599986 Else (feature 1 > 72.86) Predict: -0.0018936704520639966 Else (feature 1 > 74.87) If (feature 1 <= 77.3) If (feature 3 <= 73.33) Predict: 3.4362919081871732 Else (feature 3 > 73.33) Predict: 0.022595797531833054 Else (feature 1 > 77.3) If (feature 2 <= 1012.39) Predict: -2.0026738842740444 Else (feature 2 > 1012.39) Predict: 1.7553499174736846 Tree 45 (weight 0.1): If (feature 2 <= 1005.35) If (feature 1 <= 72.24) If (feature 1 <= 59.14) Predict: 0.030127466104975898 Else (feature 1 > 59.14) Predict: -2.2341894812350676 Else (feature 1 > 72.24) If (feature 3 <= 60.09) Predict: 4.41863108135717 Else (feature 3 > 60.09) Predict: -0.11040726869235623 Else (feature 2 > 1005.35) If (feature 0 <= 31.8) If (feature 1 <= 66.25) Predict: -0.06640264597455495 Else (feature 1 > 66.25) Predict: 0.6711276381424462 Else (feature 0 > 31.8) If (feature 1 <= 62.44) Predict: 18.071299971628946 Else (feature 1 > 62.44) Predict: -1.613111097205577 Tree 46 (weight 0.1): If (feature 0 <= 25.95) If (feature 0 <= 23.02) If (feature 0 <= 22.6) Predict: 0.0037802976144726266 Else (feature 0 > 22.6) Predict: -3.2702083989998565 Else (feature 0 > 23.02) If (feature 1 <= 47.83) Predict: 7.351532379664369 Else (feature 1 > 47.83) Predict: 0.6617643737173495 Else (feature 0 > 25.95) If (feature 3 <= 62.1) If (feature 0 <= 29.89) Predict: 0.7522949567047181 Else (feature 0 > 29.89) Predict: -0.5659530686126862 Else (feature 3 > 62.1) If (feature 1 <= 43.41) Predict: -9.179671352130104 Else (feature 1 > 43.41) Predict: -0.9646184420761758 Tree 47 (weight 0.1): If (feature 0 <= 5.18) If (feature 1 <= 38.62) If (feature 3 <= 77.17) Predict: -4.215696425771664 Else (feature 3 > 77.17) Predict: 5.655069692148392 Else (feature 1 > 38.62) If (feature 1 <= 39.13) Predict: -12.269101167501105 Else (feature 1 > 39.13) Predict: 1.081763483601667 Else (feature 0 > 5.18) If (feature 0 <= 8.75) If (feature 0 <= 7.97) Predict: -0.19756946285599916 Else (feature 0 > 7.97) Predict: -2.7184931590940438 Else (feature 0 > 8.75) If (feature 0 <= 9.42) Predict: 2.558566383813981 Else (feature 0 > 9.42) Predict: -0.006722635545763743 Tree 48 (weight 0.1): If (feature 2 <= 1028.38) If (feature 2 <= 1010.89) If (feature 1 <= 66.93) Predict: -0.7473456438858288 Else (feature 1 > 66.93) Predict: 0.34762458916260297 Else (feature 2 > 1010.89) If (feature 1 <= 58.86) Predict: 0.4001213596367478 Else (feature 1 > 58.86) Predict: -0.33373941983121597 Else (feature 2 > 1028.38) If (feature 1 <= 42.85) If (feature 1 <= 39.48) Predict: 2.1904388134214514 Else (feature 1 > 39.48) Predict: -3.2474441160938956 Else (feature 1 > 42.85) If (feature 3 <= 71.55) Predict: -1.061140549595708 Else (feature 3 > 71.55) Predict: 6.934556118848832 Tree 49 (weight 0.1): If (feature 0 <= 11.95) If (feature 0 <= 10.74) If (feature 0 <= 8.75) Predict: -0.48190999213172564 Else (feature 0 > 8.75) Predict: 1.0350335598803566 Else (feature 0 > 10.74) If (feature 2 <= 1024.3) Predict: -3.057989388513731 Else (feature 2 > 1024.3) Predict: 2.162024696272738 Else (feature 0 > 11.95) If (feature 0 <= 12.5) If (feature 3 <= 86.91) Predict: 4.627051067913808 Else (feature 3 > 86.91) Predict: 0.9386052167341327 Else (feature 0 > 12.5) If (feature 1 <= 37.8) Predict: 4.0889321278523685 Else (feature 1 > 37.8) Predict: -0.02245818963891235 Tree 50 (weight 0.1): If (feature 2 <= 1017.42) If (feature 2 <= 1014.19) If (feature 1 <= 43.13) Predict: 0.9320375696962719 Else (feature 1 > 43.13) Predict: -0.31844348507047093 Else (feature 2 > 1014.19) If (feature 1 <= 42.42) Predict: -0.5988031510673222 Else (feature 1 > 42.42) Predict: 1.3187243855742212 Else (feature 2 > 1017.42) If (feature 2 <= 1019.23) If (feature 1 <= 44.2) Predict: -2.0646082455368195 Else (feature 1 > 44.2) Predict: -0.4969601265683861 Else (feature 2 > 1019.23) If (feature 0 <= 17.84) Predict: -0.2870181057370213 Else (feature 0 > 17.84) Predict: 2.6148230736448608 Tree 51 (weight 0.1): If (feature 1 <= 38.62) If (feature 0 <= 18.4) If (feature 0 <= 5.18) Predict: 3.850885339006515 Else (feature 0 > 5.18) Predict: -0.940687510645146 Else (feature 0 > 18.4) If (feature 0 <= 18.98) Predict: -10.80330040562501 Else (feature 0 > 18.98) Predict: -18.03404880535599 Else (feature 1 > 38.62) If (feature 2 <= 1026.23) If (feature 0 <= 13.56) Predict: 0.5295719576334972 Else (feature 0 > 13.56) Predict: -0.052812717813551166 Else (feature 2 > 1026.23) If (feature 1 <= 40.22) Predict: -4.371246083031292 Else (feature 1 > 40.22) Predict: -1.3541229527292618 Tree 52 (weight 0.1): If (feature 1 <= 66.25) If (feature 1 <= 64.84) If (feature 3 <= 41.26) Predict: 3.045631536773922 Else (feature 3 > 41.26) Predict: -0.0337837562463145 Else (feature 1 > 64.84) If (feature 1 <= 65.27) Predict: -5.921444872611693 Else (feature 1 > 65.27) Predict: -0.8270282146869598 Else (feature 1 > 66.25) If (feature 0 <= 23.02) If (feature 0 <= 19.83) Predict: 1.5405239234096135 Else (feature 0 > 19.83) Predict: -3.1288830506195398 Else (feature 0 > 23.02) If (feature 0 <= 25.35) Predict: 3.2672442442602656 Else (feature 0 > 25.35) Predict: -0.007592990267182966 Tree 53 (weight 0.1): If (feature 0 <= 17.84) If (feature 1 <= 46.93) If (feature 0 <= 17.2) Predict: 0.1228349542857993 Else (feature 0 > 17.2) Predict: -2.392588492043597 Else (feature 1 > 46.93) If (feature 2 <= 1020.77) Predict: -1.8240349072310669 Else (feature 2 > 1020.77) Predict: -6.523289398433308 Else (feature 0 > 17.84) If (feature 0 <= 18.4) If (feature 1 <= 47.83) Predict: 0.5318997435908227 Else (feature 1 > 47.83) Predict: 4.907584149653537 Else (feature 0 > 18.4) If (feature 1 <= 46.93) Predict: -2.110133253015907 Else (feature 1 > 46.93) Predict: 0.20708863671712482 Tree 54 (weight 0.1): If (feature 3 <= 76.79) If (feature 1 <= 40.05) If (feature 1 <= 39.96) Predict: -0.7416033424896232 Else (feature 1 > 39.96) Predict: -6.880323474190146 Else (feature 1 > 40.05) If (feature 1 <= 40.89) Predict: 2.887497917363201 Else (feature 1 > 40.89) Predict: 0.17777582956662522 Else (feature 3 > 76.79) If (feature 0 <= 19.61) If (feature 0 <= 17.84) Predict: -0.09172434324104897 Else (feature 0 > 17.84) Predict: 1.9482862934683598 Else (feature 0 > 19.61) If (feature 2 <= 1010.6) Predict: -0.15262790703036064 Else (feature 2 > 1010.6) Predict: -1.7280878096087295 Tree 55 (weight 0.1): If (feature 0 <= 24.79) If (feature 0 <= 23.02) If (feature 1 <= 66.93) Predict: 0.02682576814507517 Else (feature 1 > 66.93) Predict: -2.323863726560255 Else (feature 0 > 23.02) If (feature 1 <= 47.83) Predict: 6.909290893058579 Else (feature 1 > 47.83) Predict: 0.9944889736997976 Else (feature 0 > 24.79) If (feature 3 <= 65.24) If (feature 0 <= 28.5) Predict: 0.8432916332803679 Else (feature 0 > 28.5) Predict: -0.3680864130080106 Else (feature 3 > 65.24) If (feature 1 <= 66.51) Predict: -2.1147474860288 Else (feature 1 > 66.51) Predict: -0.3834883036951788 Tree 56 (weight 0.1): If (feature 0 <= 15.33) If (feature 0 <= 14.38) If (feature 0 <= 11.95) Predict: -0.3290262091199092 Else (feature 0 > 11.95) Predict: 0.8543511625463592 Else (feature 0 > 14.38) If (feature 2 <= 1016.21) Predict: -0.7208476709379852 Else (feature 2 > 1016.21) Predict: -4.40928839539672 Else (feature 0 > 15.33) If (feature 0 <= 16.22) If (feature 2 <= 1013.19) Predict: 4.554268903891635 Else (feature 2 > 1013.19) Predict: 1.538781048856137 Else (feature 0 > 16.22) If (feature 1 <= 46.93) Predict: -1.1488437756174756 Else (feature 1 > 46.93) Predict: 0.1634274865006602 Tree 57 (weight 0.1): If (feature 2 <= 1007.46) If (feature 1 <= 73.67) If (feature 1 <= 71.43) Predict: -0.28457458674767294 Else (feature 1 > 71.43) Predict: -2.556284198496123 Else (feature 1 > 73.67) If (feature 3 <= 60.81) Predict: 4.31886476056719 Else (feature 3 > 60.81) Predict: 0.3197495651743129 Else (feature 2 > 1007.46) If (feature 0 <= 17.84) If (feature 1 <= 46.93) Predict: 0.04575453109929229 Else (feature 1 > 46.93) Predict: -2.141138284310683 Else (feature 0 > 17.84) If (feature 1 <= 56.57) Predict: 1.3439965861050847 Else (feature 1 > 56.57) Predict: -0.02904919315788331 Tree 58 (weight 0.1): If (feature 0 <= 31.8) If (feature 1 <= 66.25) If (feature 1 <= 64.84) Predict: -0.006836636445003446 Else (feature 1 > 64.84) Predict: -2.0890363043188134 Else (feature 1 > 66.25) If (feature 1 <= 69.05) Predict: 1.8596834938858298 Else (feature 1 > 69.05) Predict: -0.2637818907162569 Else (feature 0 > 31.8) If (feature 1 <= 69.34) If (feature 2 <= 1009.63) Predict: -4.53407923927751 Else (feature 2 > 1009.63) Predict: 1.2479530412848983 Else (feature 1 > 69.34) If (feature 1 <= 69.88) Predict: 5.672382101944148 Else (feature 1 > 69.88) Predict: -0.7728960613425813 Tree 59 (weight 0.1): If (feature 2 <= 1010.89) If (feature 1 <= 68.3) If (feature 1 <= 43.41) Predict: 0.423961936091299 Else (feature 1 > 43.41) Predict: -1.0411314850417004 Else (feature 1 > 68.3) If (feature 1 <= 68.67) Predict: 7.130757445704555 Else (feature 1 > 68.67) Predict: 0.1160942217864609 Else (feature 2 > 1010.89) If (feature 3 <= 93.63) If (feature 1 <= 58.86) Predict: 0.41091291246834866 Else (feature 1 > 58.86) Predict: -0.2764637915143923 Else (feature 3 > 93.63) If (feature 1 <= 41.74) Predict: -3.564757715833512 Else (feature 1 > 41.74) Predict: 1.1644353912440248 Tree 60 (weight 0.1): If (feature 1 <= 48.6) If (feature 1 <= 44.88) If (feature 2 <= 1016.57) Predict: 0.4410572983039277 Else (feature 2 > 1016.57) Predict: -0.44414793681792664 Else (feature 1 > 44.88) If (feature 2 <= 1014.35) Predict: -3.0626378082153085 Else (feature 2 > 1014.35) Predict: 2.0328536525605063 Else (feature 1 > 48.6) If (feature 1 <= 52.05) If (feature 2 <= 1009.9) Predict: 0.24004783900051171 Else (feature 2 > 1009.9) Predict: 3.1645061792332916 Else (feature 1 > 52.05) If (feature 0 <= 17.84) Predict: -1.95074879327582 Else (feature 0 > 17.84) Predict: 0.021106826304965107 Tree 61 (weight 0.1): If (feature 1 <= 74.87) If (feature 1 <= 71.43) If (feature 1 <= 68.3) Predict: -0.06241270845694165 Else (feature 1 > 68.3) Predict: 0.8051320337219834 Else (feature 1 > 71.43) If (feature 0 <= 24.57) Predict: 1.648459594873699 Else (feature 0 > 24.57) Predict: -1.2314608832462137 Else (feature 1 > 74.87) If (feature 1 <= 77.3) If (feature 0 <= 21.42) Predict: -7.482222216002697 Else (feature 0 > 21.42) Predict: 1.8228183337802573 Else (feature 1 > 77.3) If (feature 2 <= 1012.39) Predict: -1.4326641812285505 Else (feature 2 > 1012.39) Predict: 1.7079353624089986 Tree 62 (weight 0.1): If (feature 0 <= 5.18) If (feature 1 <= 42.07) If (feature 3 <= 96.38) Predict: 1.4583097259406885 Else (feature 3 > 96.38) Predict: 7.4053761713858615 Else (feature 1 > 42.07) If (feature 2 <= 1008.19) Predict: 0.311290850436914 Else (feature 2 > 1008.19) Predict: -5.145119802972147 Else (feature 0 > 5.18) If (feature 1 <= 38.62) If (feature 0 <= 18.4) Predict: -0.7259884411546618 Else (feature 0 > 18.4) Predict: -12.427884135864616 Else (feature 1 > 38.62) If (feature 1 <= 39.48) Predict: 1.131291291234381 Else (feature 1 > 39.48) Predict: -0.007004055574359982 Tree 63 (weight 0.1): If (feature 2 <= 1004.52) If (feature 1 <= 70.8) If (feature 1 <= 69.05) Predict: -0.45566718124370104 Else (feature 1 > 69.05) Predict: -3.3633539333883373 Else (feature 1 > 70.8) If (feature 3 <= 70.63) Predict: 1.7061073842258219 Else (feature 3 > 70.63) Predict: -0.35469491259927843 Else (feature 2 > 1004.52) If (feature 0 <= 15.33) If (feature 0 <= 14.13) Predict: 0.13165022513417465 Else (feature 0 > 14.13) Predict: -1.8886218519887454 Else (feature 0 > 15.33) If (feature 1 <= 43.13) Predict: 2.0897911694212086 Else (feature 1 > 43.13) Predict: 0.023571622513158218 Tree 64 (weight 0.1): If (feature 1 <= 41.92) If (feature 1 <= 41.58) If (feature 2 <= 1015.45) Predict: 0.6420804366913081 Else (feature 2 > 1015.45) Predict: -0.3393001000428116 Else (feature 1 > 41.58) If (feature 3 <= 91.38) Predict: -2.959889489145066 Else (feature 3 > 91.38) Predict: -14.822621379271645 Else (feature 1 > 41.92) If (feature 1 <= 43.13) If (feature 0 <= 15.33) Predict: 0.5584851317693598 Else (feature 0 > 15.33) Predict: 5.35806974907062 Else (feature 1 > 43.13) If (feature 1 <= 43.65) Predict: -2.5734171913252673 Else (feature 1 > 43.65) Predict: 0.06206747847844893 Tree 65 (weight 0.1): If (feature 2 <= 1010.89) If (feature 1 <= 66.93) If (feature 0 <= 20.6) Predict: -0.0679333275254979 Else (feature 0 > 20.6) Predict: -1.053808811058633 Else (feature 1 > 66.93) If (feature 1 <= 67.32) Predict: 7.372080266725638 Else (feature 1 > 67.32) Predict: 0.09996335027123535 Else (feature 2 > 1010.89) If (feature 3 <= 75.61) If (feature 1 <= 40.05) Predict: -0.9831581524231143 Else (feature 1 > 40.05) Predict: 0.5486160789249349 Else (feature 3 > 75.61) If (feature 1 <= 58.86) Predict: 0.19399224442246701 Else (feature 1 > 58.86) Predict: -1.5652059699408227 Tree 66 (weight 0.1): If (feature 0 <= 28.75) If (feature 1 <= 73.18) If (feature 1 <= 71.43) Predict: 0.05143978594106816 Else (feature 1 > 71.43) Predict: -1.436513600322334 Else (feature 1 > 73.18) If (feature 3 <= 73.33) Predict: 4.1459864582084975 Else (feature 3 > 73.33) Predict: 0.34965185037807356 Else (feature 0 > 28.75) If (feature 2 <= 1014.54) If (feature 2 <= 1013.43) Predict: -0.4008005884834272 Else (feature 2 > 1013.43) Predict: 3.683818693727259 Else (feature 2 > 1014.54) If (feature 1 <= 67.83) Predict: -0.82614879352537 Else (feature 1 > 67.83) Predict: -4.535981326886069 Tree 67 (weight 0.1): If (feature 1 <= 47.83) If (feature 0 <= 23.02) If (feature 0 <= 18.71) Predict: -0.0010074123242523121 Else (feature 0 > 18.71) Predict: -3.2926535011699234 Else (feature 0 > 23.02) If (feature 2 <= 1012.39) Predict: 1.3034696914565052 Else (feature 2 > 1012.39) Predict: 11.235282784300427 Else (feature 1 > 47.83) If (feature 1 <= 56.57) If (feature 0 <= 17.84) Predict: -1.039931035628621 Else (feature 0 > 17.84) Predict: 1.9905896386111916 Else (feature 1 > 56.57) If (feature 1 <= 57.19) Predict: -2.3357601760278204 Else (feature 1 > 57.19) Predict: -0.0355403353056693 Tree 68 (weight 0.1): If (feature 0 <= 24.79) If (feature 3 <= 41.26) If (feature 1 <= 45.87) Predict: 2.4904273637383265 Else (feature 1 > 45.87) Predict: 13.013875696314063 Else (feature 3 > 41.26) If (feature 1 <= 49.02) Predict: -0.18642415027276396 Else (feature 1 > 49.02) Predict: 0.47121076166963227 Else (feature 0 > 24.79) If (feature 1 <= 65.27) If (feature 1 <= 64.84) Predict: -0.5...

Step 8: Deployment

Now that we have a predictive model it is time to deploy the model into an operational environment.

In our example, let's say we have a series of sensors attached to the power plant and a monitoring station.

The monitoring station will need close to real-time information about how much power that their station will generate so they can relay that to the utility.

So let's create a Spark Streaming utility that we can use for this purpose.

See http://spark.apache.org/docs/latest/streaming-programming-guide.html if you can't wait!

After deployment you will be able to use the best predictions from gradient boosed regression trees to feed a real-time dashboard or feed the utility with information on how much power the peaker plant will deliver give current conditions.

// Let's set the variable finalModel to our best GBT Model
val finalModel = gbtModel.bestModel
finalModel: org.apache.spark.ml.Model[_] = pipeline_e6a84d2d75ba
//gbtModel.bestModel.asInstanceOf[PipelineModel]//.stages.last.asInstanceOf[GBTRegressionModel]
//        .write.overwrite().save("dbfs:///databricks/driver/MyTrainedGbtPipelineModel")
//val finalModel = PipelineModel.load("dbfs:///databricks/driver/MyTrainedGbtPipelineModel/")

Let's create our table for predictions

%sql 
DROP TABLE IF EXISTS power_plant_predictions ;
CREATE TABLE power_plant_predictions(
  AT Double,
  V Double,
  AP Double,
  RH Double,
  PE Double,
  Predicted_PE Double
);
OK

This should be updated to structured streaming - after the break.

Now let's create our streaming job to score new power plant readings in real-time.

CAUTION: There can be only one spark streaming context per cluster!!! So please check if a streaming context is already alive first.

import java.nio.ByteBuffer
import java.net._
import java.io._
import concurrent._
import scala.io._
import sys.process._
//import org.apache.spark.Logging
import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.Minutes
import org.apache.spark.streaming.StreamingContext
//import org.apache.spark.streaming.StreamingContext.toPairDStreamFunctions
import org.apache.log4j.Logger
import org.apache.log4j.Level
import org.apache.spark.streaming.receiver.Receiver
import sqlContext._
import net.liftweb.json.DefaultFormats
import net.liftweb.json._

import scala.collection.mutable.SynchronizedQueue


val queue = new SynchronizedQueue[RDD[String]]()

val batchIntervalSeconds = 2

var newContextCreated = false      // Flag to detect whether new context was created or not

// Function to create a new StreamingContext and set it up
def creatingFunc(): StreamingContext = {
    
  // Create a StreamingContext
  val ssc = new StreamingContext(sc, Seconds(batchIntervalSeconds))
  val batchInterval = Seconds(1)
  ssc.remember(Seconds(300))
  val dstream = ssc.queueStream(queue)
  dstream.foreachRDD { 
    rdd =>
      // if the RDD has data
       if(!(rdd.isEmpty())) {
          // Use the final model to transform a JSON message into a dataframe and pass the dataframe to our model's transform method
           finalModel
             .transform(read.json(rdd.toDS).toDF())
         // Select only columns we are interested in
         .select("AT", "V", "AP", "RH", "PE", "Predicted_PE")
         // Append the results to our power_plant_predictions table
         .write.mode(SaveMode.Append).format("hive").saveAsTable("power_plant_predictions")
       } 
  }
  println("Creating function called to create new StreamingContext for Power Plant Predictions")
  newContextCreated = true  
  ssc
}

val ssc = StreamingContext.getActiveOrCreate(creatingFunc)
if (newContextCreated) {
  println("New context created from currently defined creating function") 
} else {
  println("Existing context running or recovered from checkpoint, may not be running currently defined creating function")
}

ssc.start()
Creating function called to create new StreamingContext for Power Plant Predictions New context created from currently defined creating function <console>:303: warning: class SynchronizedQueue in package mutable is deprecated: Synchronization via selective overriding of methods is inherently unreliable. Consider java.util.concurrent.ConcurrentLinkedQueue as an alternative. val queue = new SynchronizedQueue[RDD[String]]() ^ import java.nio.ByteBuffer import java.net._ import java.io._ import concurrent._ import scala.io._ import sys.process._ import org.apache.spark.SparkConf import org.apache.spark.storage.StorageLevel import org.apache.spark.streaming.Seconds import org.apache.spark.streaming.Minutes import org.apache.spark.streaming.StreamingContext import org.apache.log4j.Logger import org.apache.log4j.Level import org.apache.spark.streaming.receiver.Receiver import sqlContext._ import net.liftweb.json.DefaultFormats import net.liftweb.json._ import scala.collection.mutable.SynchronizedQueue queue: scala.collection.mutable.SynchronizedQueue[org.apache.spark.rdd.RDD[String]] = SynchronizedQueue() batchIntervalSeconds: Int = 2 newContextCreated: Boolean = true creatingFunc: ()org.apache.spark.streaming.StreamingContext ssc: org.apache.spark.streaming.StreamingContext = org.apache.spark.streaming.StreamingContext@4a48de26

Now that we have created and defined our streaming job, let's test it with some data. First we clear the predictions table.

%sql truncate table power_plant_predictions
OK

Let's use data to see how much power output our model will predict.

// First we try it with a record from our test set and see what we get:
queue += sc.makeRDD(Seq(s"""{"AT":10.82,"V":37.5,"AP":1009.23,"RH":96.62,"PE":473.9}"""))

// We may need to wait a few seconds for data to appear in the table
Thread.sleep(Seconds(5).milliseconds)
%sql 
--and we can query our predictions table
select * from power_plant_predictions
10.8237.51009.2396.62473.9472.659932584668

Let's repeat with a different test measurement that our model has not seen before:

queue += sc.makeRDD(Seq(s"""{"AT":10.0,"V":40,"AP":1000,"RH":90.0,"PE":0.0}"""))
Thread.sleep(Seconds(5).milliseconds)
%sql 
--Note you may have to run this a couple of times to see the refreshed data...
select * from power_plant_predictions
10401000900474.5912134899266
10.8237.51009.2396.62473.9472.659932584668

As you can see the Predictions are very close to the real data points.

%sql 
select * from power_plant_table where AT between 10 and 11 and AP between 1000 and 1010 and RH between 90 and 97 and v between 37 and 40 order by PE 
10.3737.831006.590.99470.66
10.2237.831005.9493.53471.79
10.6637.51009.4295.86472.86
10.8237.51009.2396.62473.9
10.4837.51009.8195.26474.57

Now you use the predictions table to feed a real-time dashboard or feed the utility with information on how much power the peaker plant will deliver.

Make sure the streaming context is stopped when you are done, as there can be only one such context per cluster!

ssc.stop(stopSparkContext = false) // gotto stop or it ill keep running!!!

Datasource References:

  • Pinar Tüfekci, Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods, International Journal of Electrical Power & Energy Systems, Volume 60, September 2014, Pages 126-140, ISSN 0142-0615, Web Link
  • Heysem Kaya, Pinar Tüfekci , Sadik Fikret Gürgen: Local and Global Learning Methods for Predicting Power of a Combined Gas & Steam Turbine, Proceedings of the International Conference on Emerging Trends in Computer and Electronics Engineering ICETCEE 2012, pp. 13-18 (Mar. 2012, Dubai) Web Link