- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
基于APACHE MXNET和Apache Sead的大数据集分布式推理
展开查看详情
1 .Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI *
2 .Outline • Review of Deep Learning • Apache MXNet Framework • Distributed Inference using MXNet and Spark
3 .Deep Learning CAR PERSON DOG Output (object identity) 3rd hidden layer • Originally inspired by our biological (object parts) neural systems. 2nd hidden layer (corners & contours) • A System that learns important 1st hidden layer features from experience. (edges) Input layer • Layers of Neurons learning concepts. (Raw pixels) • Deep learning != deep understanding Credit: Ian Goodfellow etal., Deep Learning Book
4 . Algorithmic Advances (Faster Learning) Abundance of Data High Performance Compute (Deeper Networks) GPUs (Faster Experiments) Bigger and Better Models = Better AI Products
5 .Why does Deep Learning matter? Health care Autonomous Personal Assistants Vehicles Solve Intelligence ???
6 .Deep Learning & AI, Limitations DL Limitations: Artificial Intelligence • Requires lots of data and compute power. Machine Learning • Cannot detect Inherent bias in data - Transparency. Deep Learning • Uninterpretable Results.
7 . Deep Learning Training forward dog ? error dog backward labels data • Pass data through the network – forward pass forward pass w5 X1 w1 = 0.5 h1 =0 .4 w3 0.1 y = 1.0 • Define an objective – Loss function =0 y` = 0.9 5 .5 y 0. 0.1 loss = y – y` = 4 w 0.5 • Send the error back – backward pass w2 = 0.5 w6 = l = 0.1 X2 h2 backward pass Model: Output of Training a neural network
8 .Deep Learning Inference forward model dog • Real time Inference: Tasks that require immediate result. • Batch Inference: Tasks where you need to run on a large data sets. o Pre-computations are necessary - Recommender Systems. o Backfilling with state-of-the art models. o Testing new models on historic data.
9 .Types of Learning • Supervised Learning – Uses labeled training data learning to associate input data to output. Example: Image classification, Speech Recognition, Machine translation • Unsupervised Learning - Learns patterns from Unlabeled data. Example: Clustering, Association discovery. • Active Learning – Semi-supervised, human in the middle.. • Reinforcement Learning – learn from environment, using rewards and feedback.
10 .Outline • Apache MXNet Framework • Distributed Inference using MXNet and Spark
11 .Why MXNet
12 .MXNet – NDArray & Symbol • NDArray– Imperative Tensor Operations that work on both CPU and GPUs. • Symbol APIs – similar to NDArray but adopts declarative programming for optimization. Symbolic Program Computation Graph
13 .MXNet - Module High level APIs to work with Symbol 1) Create Graph 2) Bind 3) Pass data
14 .Outline • Distributed Inference using MXNet and Spark
15 .Distributed Inference Challenges High Performance DL framework • Similar to large scale data Distributed Cluster processing systems Resource Management Apache Spark: Job Management • Multiple Cluster Managers • Works well with MXNet. Efficient Partition of Data • Integrates with Hadoop & big data tools. Deep Learning Setup
16 .MXNet + Spark for Inference. • ImageNet trained ResNet-18 classifier. • For demo, CIFAR-10 test dataset with 10K Images. • PySpark on Amazon EMR, MXNet is also available in Scala. • Inference on CPUs, can be extended to use GPUs.
17 .Distributed Inference Pipeline mapPartitions download create RDD fetch batch decode to run collect S3 keys and of images numpy array prediction predictions on driver partition on executor initialize model only once
18 . MXNet + Spark for Inference. On the driver
19 .On the executor
20 . Summary • Overview of Deep Learning o How Deep Learning works and Why Deep Learning is a big deal. o Phases of Deep Learning o Types of Learning • Apache MXNet – Efficient deep learning library o NDArray/Symbol/Module • Apache MXNet and Spark for distributed Inference.
21 .What’s Next ? • Released simplified Scala Inference APIs (v1.2.0) o Available on Maven : org.apache.mxnet • Working on Java APIs for Inference. • Dataframe support is under consideration. • MXNet community is fast evolving, join hands to democratize AI.
22 .Resources/References • https://github.com/apache/incubator-mxnet • Blog- Distributed Inference using MXNet and Spark • Distributed Inference code sample on GitHub • Apache MXNet Gluon Tutorials • Apache MXNet – Flexible and efficient deep learning. • The Deep Learning Book • MXNet – Using pre-trained models • Amazon Elastic MapReduce
23 . Thank You nswamy@apache.org