- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Simplify Distributed TensorFlow Training for Fast Image Categorization at Starbu
展开查看详情
1 . STARBUCKS TECHNOLOGY Simplifying Deep Learning with HorovodRunner at Starbucks
2 . About the presenters Denny Lee is a Technology Vishwanath Subramanian is a Evangelist with Databricks; he Director of Data and Analytics is a hands-on data sciences Engineering at Starbucks. engineer with more than 15 Vishwanath has over 15 years of years of experience experience with a background in developing internet-scale distributed systems, product infrastructure, data platforms, management, software and distributed systems for Denny Lee Vishwanath Subramanian engineering and Analytics. both on-premises and cloud. His key focuses surround At Starbucks, his key focus is on solving complex large scale providing Next Generation data problems – providing not Analytics platforms and enabling only architectural direction large scale data processing and but the hands-on machine learning to enable implementation of these Business Intelligence and Data systems. Services across Starbucks.
3 .Scenarios • Smarter checkout experiences • Predicting customer traffic • On-Demand one click Provisioning of Seamlessly integrated Infrastructure Bill of Material for Data Science and Intelligent Apps. • Planogram Analysis • Secured Connectivity to Enterprise Data Platform completely abstracted from Analytics teams. • And more… • Solution template containing organization of deployments to enable Adhoc experiments, shared data engineering and Intelligent App Development
4 .Current State • Solving complex / streaming image and video analytics is hard • It also typically involves distributing the problem to multiple nodes • But how do I perform Keras+TensorFlow on a distributed environment?
5 .Convolutional Neural Networks
6 .Convolutional Neural Networks 28 x 28 28 x 28 14 x 14 0 1 Fully Connected Dropout Convolution Convolution Subsampling 8 32 filters 64 filters Stride (2,2) 9 Feature Extraction Classification
7 . DEMO Running Keras CNNs Standalone Keras, TensorFlow, HorovodRunner, and MLflow: https://dbricks.co/2D58PDw
8 .Introducing HorovodRunner • HorovodRunner is a general API to run distributed learning workloads on Databricks using Uber’s Horovod framework • On-Demand one click Provisioning of Seamlessly integrated • Combining Horovod with Apache Spark’s barrier mode allows longer- Infrastructure Bill of Material for running deep learning training jobs Data Science and Intelligent Apps. • Secured Connectivity to Enterprise Data Platform completely • A Horovod MPI job is embedded as a Spark job using barrier abstracted from Analytics teams. execution mode • Solution template containing organization of deployments to enable Adhoc experiments, shared data engineering and Intelligent App Development
9 .HorovodRunner • HorovodRunner takes a Python method that contains DL training code with Horovod hooks • The first executor collects the IP address of all of the task executors using BarrierTaskContext • Then it triggers a Horovod job using mpirun. • Each Python MPI process loads the pickled program back, deserializes it, and runs it.
10 .HorovodRunner driver workers
11 .HorovodRunner driver runCNN(): model.add(Conv2D(32, …)) model.add(Conv2D(64, …)) model.add(MaxPooling2D(…)) model.add(Dense(128, …) model.add(Dense(10, ’softmax’) workers optimizer = keras.optimizers \ .Adadelta(1.0) In standalone or hvd local mode, the code is running on the driver
12 .HorovodRunner variables driver runCNN_hvd(): hvd.init() config.tf.ConfigProto() workers # Original code runCNN() callbacks = [] With HorovodRunner, we wrap the original code and code and variables are pushed to the workers
13 .HorovodRunner driver workers With HorovodRunner, we wrap the original code and code and variables are pushed to the workers
14 .HorovodRunner driver workers With HorovodRunner, we wrap the original code and code and variables are pushed to the workers
15 .HorovodRunner driver workers With HorovodRunner, we wrap the original code and code and variables are pushed to the workers
16 .HorovodRunner driver workers Variables are transferred from driver to workers Code is executed at the workers
17 .Migrate to HorovodRunner # Primary code differences are noted below + hvd.init() + config.tfConfigProto() • On-Demand one click Provisioning + config.gpu_options.allow_growth of=Seamlessly True integrated Infrastructure + config.gpu_options.visible_device_list = Bill of Material for str(hvd.local_rank()) Data Science and Intelligent Apps. + epochs = int(math.ceil(12.0 / hvd.size())) • Secured Connectivity to Enterprise + callbacks = [ Data Platform completely + abstracted from Analytics teams. hvd.callbacks.BroadcastGlobalVariablesCallback(0), + ] • Solution template containing organization of deployments to enable Adhoc experiments, shared data engineering and Intelligent App Development
18 .Comparing the runs using MLflow • On-Demand one click Provisioning of Seamlessly integrated Infrastructure Bill of Material for Data Science and Intelligent Apps. • Secured Connectivity to Enterprise Data Platform completely abstracted from Analytics teams. • Solution template containing organization of deployments to enable Adhoc experiments, shared data engineering and Intelligent App Development
19 . DEMO Object Detection Keras, TensorFlow, HorovodRunner, and MLflow
20 . Object Detection Approaches RCNN (2012) • Region proposal algorithms - give you a set of regions in the image that are likely to contain objects. • Run those images in the bounding boxes to a pre-trained alexnet to compute the features for that bounding box. • Support vector machine, to classify what the object in the image is of. • Run the box through a linear regression model to output tighter coordinates for the box. • RCNN -> Fast RCNN ->Faster RCNN Rich feature hierarchies for accurate object detection and semantic segmentation - Girshick, Donahue, Darrell, Malik Fast R-CNN - Girshick Faster R-CNN: Towards Real-Time ObjectDetection with Region Proposal Networks - Ren, He, Girshick, Su
21 .Object Detection Approaches (contd.) • YOLO – detection as a regression problem • Not a traditional classifier • Divide image into grid, each cell is responsible for predicting n bounding boxes • Output confidence score that predicted bounding box • Gives a probability distribution of all the classes its trained on • Confidence score and class prediction is combined is combined into a score for object classification • Based on threshold, we determine relevant boxes. • All the boxes fed to the neural network all at once. You Only Look Once: Unified, Real-Time Object Detection - Redmon, Divvala, Girshick, Farhadi
22 . https://www.starbucks.com/careers/ TALENTED TECHNOLOGISTS DELIVERING TODAY A LEADING INTO THE FUTURE aava