微软Azure平台利用Intel Analytics Zoo构建AI客服支持实践

播放视频

视频文档

微软Azure平台利用Intel Analytics Zoo构建AI客服支持实践

下载 2

Apache Spark中国技术交流社区

发布于

4207

人观看

#信息技术

1月10日【微软Azure平台利用Intel Analytics Zoo构建AI客服支持实践】
讲师：
黄凯——Intel大数据技术团队软件工程师。
卫雨青——Microsoft C+AI 团队软件工程师。

简介：本次分享将为大家介绍Intel的Analytics Zoo工具包，并分享微软Azure智能客服平台使用Intel Analytics Zoo的实践经验。

阿里巴巴开源大数据EMR技术团队成立Apache Spark中国技术社区，定期打造国内Spark线上线下交流活动。请持续关注。
钉钉群号：21784001
团队群号：HPRX8117
微信公众号：Apache Spark技术交流社区

展开查看详情

1 .Use Intel Analytics Zoo to build an intelligent QA Bot for Microsoft Azure Jan 10th, 2019

2 . Kai Huang Software Engineer from Intel Data Analytics Technology Team About Us Yuqing Wei Software Engineer from Microsoft C+AI Team

3 .  Introduction to Analytics Zoo.  How to develop NLP modules using Analytics Zoo. Outline  Engineering experience in building a chat bot.  Bot Demo.

4 . Bringing Deep Learning To Big Data Platform • Distributed deep learning framework for Apache Spark. • Make deep learning more accessible to big data users and data scientists o Write deep learning applications as standard Spark programs. o Run on existing Spark/Hadoop clusters (no changes needed). BigDL • High performance (on CPU) • Powered by Intel MKL and multi-threaded programming. • Efficient scale-out • Leveraging Spark for distributed training & inference. https://github.com/intel-analytics/BigDL https://bigdl-project.github.io/

5 . A unified analytics + AI platform for distributed TensorFlow, Keras and BigDL on Apache Spark Feature transformations for Feature Engineering • Image, 3D images, text, time series, speech, etc. • Keras, autograd and transfer learning APIs for model definition. • Native deep learning support in Spark DataFrames and ML High-Level Pipeline APIs Pipelines. Analytics • POJO style API for model serving/inference pipelines. Zoo Built-In Deep Learning Models Image classification, object detection, text classification, recommendations, text matching, anomaly detection, seq2seq etc. Seamlessly unites Spark, TensorFlow, Keras and BigDL programs into Backends an integrated pipeline. https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

6 . 1. Read images as ImageSet from zoo.common.nncontext import init_nncontext from zoo.feature.image import * sc = init_nncontext() local_image_set = ImageSet.read(image_path) distributed_image_set = ImageSet.read(image_path, sc, 2) Feature 2. Built-in ImageProcessing operations Engineering transformer = ChainedPreprocessing([ImageBytesToMat(), ImageColorJitter(), ImageExpand(max_expand_ratio=2.0), ImageResize(300, 300, -1), ImageHFlip()]) new_local_image_set = transformer(local_image_set) new_distributed_image_set = transformer(distributed_image_set) Image Augmentations Using Built-in Image Transformations (w/ OpenCV on Spark) https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

7 . Use Keras-Style API to create an Analytics Zoo model and train, evaluate or tune it in a distributed fashion. from zoo.pipeline.api.keras.models import Sequential from zoo.pipeline.api.keras.layers import * model = Sequential() model.add(Reshape((1, 28, 28), input_shape=(28, 28, 1))) Keras-Style model.add(Convolution2D(6, 5, 5, activation="tanh", name="conv1_5x5")) model.add(MaxPooling2D()) model.add(Convolution2D(12, 5, 5, activation="tanh", name="conv2_5x5")) API model.add(MaxPooling2D()) model.add(Flatten()) model.add(Dense(100, activation="tanh", name="fc1")) model.add(Dense(10, activation="softmax", name="fc2")) model.compile(optimizer, loss, metrics) model.fit(x, batch, epoch) model.predict(x, batch) model.evaluate(x, batch) https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

8 . Autograd API provides automatic differentiation for math operations to easily define custom layers or losses. import zoo.pipeline.api.autograd as A Autograd log = A.log(in_node + 1.0) dot = A.batch_dot(embed1, embed2, axes=[2, 2]) API from zoo.pipeline.api.autograd import * def mean_absolute_error(y_true, y_pred): result = mean(abs(y_true - y_pred), axis=1) return result https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

9 . Use transfer learning APIs to easily customize pretrained models for feature extraction or fine-tuning: from zoo.pipeline.api.net import * from zoo.pipeline.api.keras.layers import Dense, Input, Flatten from zoo.pipeline.api.keras.models import Model Transfer # Load a pretrained model full_model = Net.load_caffe(def_path, model_path) Learning # Remove the last few layers model = full_model.new_graph(outputs=["pool5"]).to_keras()) API # Freeze the first few layers model.freeze_up_to(["res4f"]) # Append a few layers input = Input(shape=(3, 224, 224)) resnet = model.to_keras()(input) flatten = Flatten()(resnet) logits = Dense(2)(flatten) new_model = Model(input, logits) https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

10 . Running TensorFlow model on Spark for distributed training and inference. 1. Data wrangling and analysis using PySpark from zoo.common.nncontext import init_nncontext from zoo.pipeline.api.net import TFDataset sc = init_nncontext() Distributed # Each record in the train_rdd consists of a list of NumPy ndrrays TensorFlow train_rdd = sc.parallelize(file_list)\ .map(lambda x: read_image_and_label(x)) .map(lambda image_label: decode_to_ndarrays(image_label)) # TFDataset represents a distributed set of elements, # in which each element contains one or more TensorFlow Tensor objects. dataset = TFDataset.from_rdd(train_rdd, names=["features", "labels"], shapes=[[28, 28, 1], [1]], types=[tf.float32, tf.int32], batch_size=BATCH_SIZE) https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

11 . Running TensorFlow model on Spark in a distributed fashion. 2. Deep learning model development using TensorFlow import tensorflow as tf slim = tf.contrib.slim images, labels = dataset.tensors Distributed labels = tf.squeeze(labels) with slim.arg_scope(lenet.lenet_arg_scope()): logits, end_points = lenet.lenet(images, num_classes=10, is_training=True) TensorFlow loss = tf.reduce_mean(tf.losses.sparse_softmax_cross_entropy(logits=logits, labels=labels)) 3. Distributed training on Spark and BigDL from zoo.pipeline.api.net import TFOptimizer from bigdl.optim.optimizer import MaxIteration, Adam, MaxEpoch, TrainSummary optimizer = TFOptimizer(loss, Adam(1e-3)) optimizer.set_train_summary(TrainSummary("/tmp/az_lenet", "lenet")) optimizer.optimize(end_trigger=MaxEpoch(5)) saver = tf.train.Saver() https://github.com/intel-analytics/analytics-zoo saver.save(optimizer.sess, "/tmp/lenet/") https://analytics-zoo.github.io/

12 . Running TensorFlow model on Spark in a distributed fashion. 4. Distributed inference dataset = TFDataset.from_rdd(testing_rdd, names=["features"], shapes=[[28, 28, 1]], types=[tf.float32]) Distributed predictor = TFPredictor.from_outputs(sess, [logits]) TensorFlow predictions_rdd = predictor.predict() For Keras users: optimizer = TFOptimizer.from_keras(keras_model, dataset) predictor = TFPredictor.from_keras(model, dataset) predictions_rdd = predictor.predict() https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

13 . Native DL support in Spark DataFrames and ML Pipelines 1. Load images into DataFrames using NNImageReader from zoo.common.nncontext import * from zoo.pipeline.nnframes import * sc = init_nncontext() imageDF = NNImageReader.readImages(image_path, sc) NNFrames 2. Process loaded data using DataFrame transformations API getName = udf(lambda row: ...) df = imageDF.withColumn("name", getName(col("image"))) 3. Processing image using built-in feature engineering operations from zoo.feature.image import * transformer = ChainedPreprocessing( [RowToImageFeature(), ImageChannelNormalize(123.0, 117.0, 104.0), ImageMatToTensor(), ImageFeatureToTensor()]) https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

14 . Native DL support in Spark DataFrames and ML Pipelines 4. Define model using Keras-style API from zoo.pipeline.api.keras.layers import * from zoo.pipeline.api.keras.models import Sequential model = Sequential() NNFrames model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(1, 28, 28))) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()).add(Dense(10, activation='softmax'))) API 5. Train model using Spark ML Pipelines Estimater = NNEstimater(model, CrossEntropyCriterion(), transformer) \ .setLearningRate(0.003).setBatchSize(40).setMaxEpoch(1) \ .setFeaturesCol("image") nnModel = estimater.fit(df) https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

15 . • Load existing TensorFlow, Keras, Caffe, Torch, ONNX model • Useful for inference and model fine-tuning. • Allows for transition from single-node for distributed application deployment. • Allows for model sharing between data scientists and production engineers. Models Interoperability from zoo.pipeline.api.net import Net Net.load_tf(path, inputs=None, outputs=None, Support byte_order=“little_endian”, bin_file=None) Net.load_keras(hdf5_path, json_path=None, by_name=False) Net.load_caffe(def_path, model_path) Net.load_torch(path) https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

16 . • Object detection API • High-level API and pretrained models (e.g., SSD, Faster-RCNN, etc.) for object detection. • Image classification API Built-in • High-level API and pretrained models (e.g., VGG, Inception, ResNet, MobileNet, etc.) for image classification. Deep Learning • Recommendation API • High-level API and pre-defined models (e.g., Neural Collaborative Filtering, Wide and Deep Models Learning, etc.) for recommendation. • Text classification API • High-level API and pre-defined models (using CNN, LSTM, etc.) for text classification. https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

17 . 1. Load pretrained model in Detection Model Zoo from zoo.common.nncontext import init_nncontext from zoo.models.image.objectdetection import * sc = init_nncontext() Object model = ObjectDetector.load_model(model_path) 2. Off-the-shell inference using the loaded model Detection image_set = ImageSet.read(img_path, sc) API output = model.predict_image_set(image_set) 3. Visualize detection results config = model.get_config() visualizer = Visualizer(config.label_map(), encoding="jpg") visualized = visualizer(output).get_image(to_chw=False).collect() https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

18 . • Anomaly Detection • Using LSTM network to detect anomalies in time series data. • Fraud Detection • Using feed-forward neural network to detect frauds in credit card transaction data. • Recommendation Reference • Use Analytics Zoo Recommendation API (i.e., Neural Collaborative Filtering, Wide and Deep) for recommendations on data with explicit feedback. Use Cases • Sentiment Analysis • Sentiment analysis on movie reviews using neural network models (e.g. CNN, LSTM, GRU, Bi-LSTM). • Variational AutoEncoder • Use VAE to generate digital numbers and faces. https://github.com/intel-analytics/analytics-zoo/tree/master/apps https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

19 . Deployed on AliCloud* E-MapReduce* https://yq.aliyun.com/articles/73347 Listed in Microsoft* Azure* Marketplace* https://azure.microsoft.com/en-us/blog/bigdl-spark-deep-learning-library-vm-now-available-on-microsoft-azure-marketplace/ Available on Google* Cloud Dataproc* Public Cloud https://cloud.google.com/blog/big-data/2018/04/using-bigdl-for-deep-learning-with-apache-spark-and-google-cloud-dataproc Deployment Optimized for Amazon* EC2* C5 instanced, and listed in AWS* Marketplace* https://aws.amazon.com/blogs/machine-learning/leveraging-low-precision-and-quantization-for-deep-learning-using-the-amazon-ec2-c5-instance-and-bigdl/ Deployed on IBM* Data Science Expetience* https://medium.com/ibm-data-science-experience/using-bigdl-in-data-science-experience-for-deep-learning-on-spark-f1cf30ad6ca0 Available on Telefonica* Open Cloud* https://support.telefonicaopencloud.com/en-us/ecs/doc/download/20180329/20180329111611_166372a698.pdf https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

20 . Industrial Inspection Platform in Midea* and KUKA* https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed- tensorflow-on-analytics Object Detection and Image Feature Extraction in JD Customer https://software.intel.com/en-us/articles/building-large-scale-image-feature-extraction-with-bigdl-at-jdcom Use Cases Image Similarity Based House Recommendation for MLSlistings https://software.intel.com/en-us/articles/using-bigdl-to-build-image-similarity-based-house- recommendations 3D Medical Image Analysis in UCSF https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64023 https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

21 .  Chat Bot is often used for recent intelligent customer platforms.  To enhance user experience and relieve human workload. Background  To provide technical support for Azure users effectively and efficiently.  AI modules provided by Analytics Zoo: text classification, question answering, intent extraction, named entity recognition, etc.

22 .  Neural networks are easier for feature extraction.  TextClassifier module can be modified for sentiment analysis. Why neural  Neural networks generally have better performance, networks? especially on QA tasks and when we lack data.  Common parts can share for different AI modules.

23 .  Analytics Zoo provides pipeline APIs, prebuilt models and use cases for NLP tasks. Why  To provide practical experience for Azure big Analytics Zoo data users to build AI applications.  Preinstalled image on Azure Marketplace for for NLP? easy deployment.

24 .  Read cleaned text data as RDD where each record contains two columns (text, label).  Common Steps  Tokenization: https://github.com/fxsjy/jieba  Stopwords removal Data  Sequence aligning Preprocessing  Word2Vec: https://github.com/facebookresearch/fastText  Conversion to BigDL Sample -> RDD[Sample]

25 . from zoo.models.textclassification import TextClassifier text_classifier = TextClassifier(class_num, token_length, sequence_length=500, encoder="cnn", encoder_output_dim=256) Define a TextClassifier • class_num: The number of text categories to be classified. model • token_length: The size of each word vector. • sequence_length: The length of a sequence. • encoder: The encoder for input sequences. 'cnn' or 'lstm' or 'gru'. • encoder_output_dim: The output dimension for the encoder. *Photo from: https://blog.csdn.net/littlely_ll/article/details/79151403

26 . Keras-Style API for distributed training: text_classifier.compile(optimizer=Adagrad(learning_rate, decay), loss="sparse_categorical_crossentropy", metrics=["accuracy"]) Training, text_classifier.set_checkpoint(path) prediction and text_classifier.set_tensorboard(log_dir, app_name) evaluation text_classifier.fit(train_rdd, batch_size=…, nb_epoch=…, validation_data=val_rdd) text_classifier.save_model(model_path) text_classifier.predict(test_rdd) text_classifier.predict_classes(test_rdd)

27 .  Check your data first (quality, quantity, etc.).  Use custom dictionary for tokenization if necessary. Ways for  Train word2vec for unknown words if necessary. improvement  Hyper parameters tuning (learning rate, etc.).  Add character embedding, etc.

28 .  Prediction service implemented in Java  POJO-like API for low-latency local inference public class TextClassificationModel extends AbstractInferenceModel { public JTensor preProcess(String text) { Service } //Preprocessing steps using Java API Integration } TextClassificationModel model = new TextClassificationModel(); model.load(path); String sampleText = "text content"; JTensor input = model.preProcess(sampleText); List<JTensor> inputList = new ArrayList<>(); inputList.add(input); List<List<JTensor>> result = model.predict(inputList); WebService example: https://github.com/intel-analytics/analytics-zoo/tree/master/apps/web-service-sample

29 .  Input: a query and a document pair.  Similar preprocessing steps.  Output: Relevance score or probability.  Evaluation: Use normalized discounted cumulative gain (NDCG) or mean average precision (MAP). QA Ranker

10点赞

5收藏

2下载