ScaDaMaLe Course site and book

Distributed Deep Learning

CNN's with horovod, MLFlow and hypertuning through SparkTrials

William Anzén (Linkedin), Christian von Koch (Linkedin)

2021, Stockholm, Sweden

This project was supported by Combient Mix AB through a Master Thesis project at ISY, Computer Vision Laboratory, Linköpings University.

** Resources: **

These notebooks were inspired by Tensorflow's tutorial on Image Segmentation.

01ImageSegmentationUNet

In this chapter a simple U-Net architecture is implemented and evaluated against the Oxford Pets Data set. The model achieves a validation accuracy of 88.6% and a validation loss of 0.655 after 20 epochs (11.74 min).

02ImageSegmenationPSPNet

In this chapter a PSPNet architecture is implemented and evaluated against the Oxford Pets Data set. The model achieves a validation accuracy of 89.8% and a validation loss of 0.332 after 20 epochs (14.25 min).

03ICNetFunction

In this chapter the ICNet architecture is implemented and evaluated against the Oxford Pets Data set. MLFlow is added to keep track of results and parameters. The model achieves a validation accuracy of 86.1% and a validation loss of 0.363 after 19/20 epochs (6.8 min).

04ICNetFunction_hvd

In this chapter we add horovod to the notebook, allowing distributed training of the model. MLFlow is also integrated to keep track of results and parameters. Achieving validation accuracy of 84.4% and validation loss of 0.454 after 16/20 epochs (13.19 min - 2 workers). (2 workers lead to a slower run because of the overhead being too large in comparison to computational gain)

05ICNetFunctionTuningparallel

In this chapter we run hyperparameter tuning with hyperopt & SparkTrials allowing the tuning runs to be made in parallel across multiple workers. MLFlow is added to keep track of the outcomes from the parallel hyperparameter tuning runs. Achieved 0.43 loss with parameters({'batchsize': 32, 'learningrate': 0.007874409614279713})

sds-3.x/ScaDaMaLe