- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
基于Apache Spark的分布式深度学习框架BigDL介绍
展开查看详情
1 .BigDL: A scalable & easy deep learning solution on Apache Spark Yiheng Wang (yiheng.wang@intel.com) Big Data Technology, Software and Service Group, Intel Intel® Confidential — INTERNAL USE ONLY
2 .Build an End-2-End Solution https://github.com/intel-analytics/BigDL 2
3 .Build an End-2-End Solution Practical challenges: compatible with different data source performance and scalability stability & fault tolerant data management / pre-processing resource sharing programming tools / languages … https://github.com/intel-analytics/BigDL 3
4 .Build an End-2-End Solution on Hadoop/Spark Stre Gra MLli Big SQL ami phX b DL ng Apache Spark https://github.com/intel-analytics/BigDL 4
5 . An example of end-2-end large scale machine learning • Historical data is stored on Hive • Data preprocessing with SparkSQL • Spark ML pipeline for complex feature engineering • Use multiple BigDL NN models • Use Sample+Bagging to solve unbalance problem • Grid search for hyper parameter tuning Powered by BigDL https://github.com/intel-analytics/BigDL 5
6 .Build on Apache Spark MLlib GraphX BigDL SQL Streaming ML Pipelines RDD / Data Frame Spark Core https://github.com/intel-analytics/BigDL 6
7 .BigDL is easy to use A friendly API compatible with Torch and Keras Provide Scala and Python programming API Support Apache Spark SQL / Streaming / ML Pipeline https://github.com/intel-analytics/BigDL 7
8 .High performance from your server Powered by Intel Math Kernel Library Extremely high performance on Xeon CPUs – Order of magnitude faster than out of box caffe / torch / tensorflow Good scalability – Hundreds of nodes https://github.com/intel-analytics/BigDL 8
9 . Competitive End-to-end Performance and Scalability Compares the throughput of K40 and Intel® Xeon® processors in the image feature extraction pipeline. You can find more info at: https://software.intel.com/en-us/articles/building-large-scale-image-feature-extraction-with-bigdl-at-jdcom https://github.com/intel-analytics/BigDL 9
10 .Development and Deploy are Really Easy with BigDL Yarn / Mesos Spark Spark Spark BigDL BigDL BigDL https://github.com/intel-analytics/BigDL 10
11 .BigDL is Normal Spark applications BigDL library files are submit with Spark job. You don’t need to install extra files on your cluster https://github.com/intel-analytics/BigDL 11
12 .BigDL Feature Overview • Training, evaluation and prediction • Fine-tune / Streaming / Batch / Java Web application • More than 200 layers • Linear, Conv2D, Conv3D, Embedding, Recurrent… • Dozens of loss functions and optimization algorithms • CrossEntroypy, CTC, Adam, SGD … • Support Load model file from other framework • Torch / Caffe / Tensorflow / Keras https://github.com/intel-analytics/BigDL 12
13 .Algorithms • Auto-encoders, VAE • Wide-and-deep • CNN models(AlexNet, Inception, • Deep Speech Vgg, ResNet, MobileNet, DenseNet, • Chatbot Squeezenet) • Reinforcement Learning • RNN / LSTM / GRU / Seq2Seq • SSD / Faster-RCNN • Neural Recommendation • FraudDetection https://github.com/intel-analytics/BigDL 13
14 .Visualize Training Process https://github.com/intel-analytics/BigDL 15
15 . Use Cases https://github.com/intel-analytics/BigDL https://software.intel.com/bigdl 16
16 .Public Cloud Running BigDL, Deep Learning Use BigDL on Microsoft* for Apache Spark, on AWS* Azure* HDInsight* BigDL on Alibaba* Cloud (Amazon* Web Service) https://azure.microsoft.com/en- E-MapReduce* https://aws.amazon.com/blogs/ai/running- us/blog/use-bigdl-on-hdinsight-spark-for- https://yq.aliyun.com/articles/73347 bigdl-deep-learning-for-apache-spark-on-aws/ distributed-deep-learning/ Using Apache Spark with BigDL on CDH* and Cloudera* Intel’s BigDL on Databricks* Intel BigDL on Mesosphere* Data Science Workbench* https://databricks.com/blog/2017/02/09/in DC/OS* (by Lightbend*) http://blog.cloudera.com/blog/2017/04/bigdl- tels-bigdl-databricks.html http://developer.lightbend.com/blog/2017- on-cdh-and-cloudera-data-science-workbench/ 06-22-bigdl-on-mesos/ https://github.com/intel-analytics/BigDL 17
17 . Image Feature Extraction in JD.com http://mp.weixin.qq.com/s/xUCkzbHK4K06-v5qUsaNQQ https://software.intel.com/en-us/articles/building-large-scale-image-feature-extraction-with-bigdl-at-jdcom https://github.com/intel-analytics/BigDL https://software.intel.com/bigdl
18 . Image Similarity Search for MLS Listing https://github.com/intel-analytics/BigDL https://software.intel.com/bigdl https://homes-prod-homes-poc.azurewebsites.net/Property/ml81678150/5738-san-lorenzo-dr-san-jose-ca-95123
19 . Neural Recommendation Engine in China Life https://github.com/intel-analytics/BigDL https://strata.oreilly.com.cn/strata-cn/public/schedule/detail/59722?locale=en https://software.intel.com/bigdl
20 . User-Merchant Propensity Modeling in MasterCard https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63897 https://github.com/intel-analytics/BigDL https://software.intel.com/bigdl
21 . Fraud detection in Union-Pay • Historical data is stored on Hive • Data preprocessing with SparkSQL • Spark ML pipeline for complex feature engineering • Use multiple BigDL NN models • Use Sample+Bagging to solve unbalance problem • Grid search for hyper parameter tuning Powered by BigDL https://github.com/intel-analytics/BigDL 22
22 .Cray Urika-XC provide BigDL https://github.com/intel-analytics/BigDL 23
23 .Medical Image Analysis https://www.ucsf.edu/news/2017/01/405536/ucsf-intel-join-forces- develop-deep-learning-analytics-health-care https://github.com/intel-analytics/BigDL 24
24 .Deep Speech 2 on BigDL conv biRNN 1 biRNN 2 ... biRNN k 9 layers biRNN: >50 Million parameters affine softmax CTC https://github.com/intel-analytics/BigDL 25
25 .Language Model - RNN Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ https://github.com/intel-analytics/BigDL 26
26 .Language Model – Generate Shakespeare Poems Output of RNN: Long live the King . The King and Queen , and the Strange of the Veils of the rhapsodic . and grapple, and the entreatments of the pressure . Upon her head , and in the world ? `` Oh, the gods ! O Jove ! To whom the king : `` O friends ! Her hair, nor loose ! If , my lord , and the groundlings of the skies . jocund and Tasso in the Staggering of the Mankind . and https://github.com/intel-analytics/BigDL 27
27 . Transfer Learning Melancholy Fine-tune Macro BigDL Model Load Caffe BigDL Torch Model Model Model Sunny Image source: https://www.flickr.com/photos/ • Train on different dataset based on pre-trained model • Predict image style instead of type • Save training time and improve accuracy https://github.com/intel-analytics/BigDL 28
28 .Integrate with Spark Stream Integrations with Spark Streaming for runtime training and prediction Kafka Flume Train HDFS/S3 Spark BigDL Evaluator StreamWriter Streaming RDDs Model Kinesis Predict Twitter https://github.com/intel-analytics/BigDL 29
29 . Intelligent Query Tight Integrations with Spark SQL, DataFrames and Structured Streaming df.select($’image’) .withColumn( “image_type”, ImgClassifier(“image”)) .filter($’image_type’ == ‘dog’) .show() Image classification on ImageNet(http://www.image-net.org) https://github.com/intel-analytics/BigDL 30