- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Servic
展开查看详情
1 .The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Services Mark Hamilton, Microsoft, marhamil@microsoft.com Anand Raman, Microsoft, aram@microsoft.com #UnifiedAnalytics #SparkAISummit
2 .Overview • The Cognitive Services on Spark – Basic Usage – Fluent Design • HTTP on Spark – Architecture and Principles • Clusters with Embedded Services – Kubernetes, Databricks • Examples – GANs + the Metropolitan Museum of Art #UnifiedAnalytics #SparkAISummit 2
3 .Motivation • Azure Cognitive Services provide high quality pre- built intelligent services • No need for time intensive model training or deployment • Can quickly create intelligent applications • Leverage Microsoft • http://www.seeingai.com Research and Azure ML #UnifiedAnalytics #SparkAISummit 3
4 . Vision Speech Language Knowledge Search Object, scene, and Speech transcription Language detection Q&A extraction from Ad-free web, news, image, activity detection (speech-to-text) unstructured text and video search results Named entity recognition Face recognition Custom speech models for Knowledge base creation Trends for video, news and identification unique vocabularies or Key phrase extraction from collections of Q&As complex environment Image identification, Celebrity and landmark Text sentiment analysis Semantic matching for classification and recognition Text-to-speech knowledge bases knowledge extraction Multilingual and contextual Emotion recognition Custom Voice spell checking Customizable content Identification of similar personalization learning images and products Text and handwriting Real-time speech translation Explicit or offensive text recognition (OCR) content moderation Named entity recognition Customizable speech and classification Customizable image transcription and translation PII detection for text recognition moderation Knowledge acquisition Speaker identification for named entities Video metadata, audio, and verification Text translation and keyframe extraction Search query autosuggest Customizable text translation and analysis Ad-free custom search Contextual language Explicit or offensive engine creation understanding content moderation
5 .Azure Cognitive Services on Spark • Easy to use integration between Spark and the Azure Cognitive Services • Composable and pipelinable with all other val df = new TextSentiment() .setTextCol(“text”) SparkML models! .setOutputCol(“sentiment”) • Python, Scala, R (Beta) .transform(inputs) #UnifiedAnalytics #SparkAISummit 5
6 .http://www.seeingai.com
7 .Fluent API for Advanced Orchestration • Any parameter can be set with a dataframe column or with a single value queries Cat Get results for multiple search terms: Dog Antelope new BingImageSearch() .setQueryCol(“queries”) Car Bob Ross #UnifiedAnalytics #SparkAISummit 7
8 .Fluent API for Advanced Orchestration • Any parameter can be set with a dataframe column or with a single value offsets Get the first N pages of Bing for a 0 specific term: 100 new BingImageSearch() 200 .setQuery(“cats”) 300 .setOffsetCol(“offsets”) 400 #UnifiedAnalytics #SparkAISummit 8
9 .Fluent API for Advanced Orchestration • Any parameter can be set with a dataframe column or with a single value offsets queries keys Get the get fist 200 results for many 0 Cat 17… terms using several different accounts: 100 Cat 17… new BingImageSearch() 0 Tree 3e… .setQueryCol(“queries”) 100 Tree 4q… .setOffsetCol(“offsets”) 0 Car G1… .setKeyCol(“keys”) #UnifiedAnalytics #SparkAISummit 9
10 .High Performance Capabilities OOTB • Asynchronous Parallelism (P) Features Time (s) Errors # None 30.8 18993 • Automatic Batching (B) EBO+BP 1163.0 0 • Automatic Retries EBO+BP+B 57.1 0 – Exponential Back-offs EBO+BP+B+P 49.7 0 (EBO) – Backpressure (BP) 10 nodes, 20k Requests, 1k req/min limited service #UnifiedAnalytics #SparkAISummit 10
11 . on • Full Integration between HTTP Protocol and df = SimpleHTTPTransformer() Spark SQL .setInputParser(JSONInputParser()) .setOutputParser(JSONOutputParser() • Spark as a Microservice .setDataType(schema)) .setOutputCol("results") Orchestrator .setUrl(…) • Spark + X #UnifiedAnalytics #SparkAISummit 11
12 . on Web Service Local Local Local HTTP Service Service Service Requests and Client Client Client Responses Client Client Client Partition Partition Partition Partition Partition Partition Spark Worker Spark Worker #UnifiedAnalytics #SparkAISummit 12
13 . Cognitive Service Containers Now In Public Preview • No app changes & Compatible with full Cognitive Services feature-set • Support for 6 key AI capabilities: • Key Phrase Extraction • Language Detection • Sentiment Analysis • Face & Emotion Detection • OCR / Text Recognition • Language Understanding • Run & manage locally, Try for free • Connect to Billing service for report back, unified billing with on-cloud and off-cloud transactions • Additional Capabilities coming soon (e.g. Speech) #UnifiedAnalytics #SparkAISummit 13
14 . Clusters with Embedded Services • Deploy cognitive services directly onto Local PySpark Cognitive cluster worker nodes Service • Bring the compute to the Pyspark Protocol HTTP data Spark Scala Process • Use low latency in- machine networking Spark Worker #UnifiedAnalytics #SparkAISummit 14
15 . Azure Kubernetes Service + Helm Kubernetes (AKS, ACS, GKE, On-Prem etc) • Works on any k8s cluster K8s worker K8s worker K8s worker • Helm: Package Manager Cloud Cognitive Service Cognitive Spark Service Worker Cognitive Service Container Container Container for Kubernetes Cognitive Services HTTP on Spark HTTP on Spark HTTP on Spark HTTP on Spark Spark Spark Spark Worker Worker Worker helm repo add mmlspark \ Storage or https://dbanda.github.io/charts other Databases Spark Serving Hotpath Jupyter, Zepplin Spark Zepplin, Serving helm install mmlspark/spark \ Spark Readers Load LIVY, or Spark Jupyter Balancer --set localTextApi=true Submit LB REST Requests to Submit Jobs, Run Notebooks, Deployed Models Manage Cluster, etc Dalitso Banda, dbanda@microsoft.com Users / Apps Microsoft AI Development Acceleration Program #UnifiedAnalytics #SparkAISummit 15
16 .Creating a Visual Search Engine for the Metropolitan Museum of Art https://gen.studio #UnifiedAnalytics #SparkAISummit 16
17 . Intelligent Image Annotation • The MET Query Released 400k Image: Images under Open Access Describe A picture A picture A fish • Pipe images Image containing a containing a swimming through Output: person glass, cup underwater Computer Vision API to annotate Deep image for Feature searching Nearest Neighbors: #UnifiedAnalytics #SparkAISummit 17
18 . Reverse Image Search Architecture Query ResNet Deep Fast Nearest Closest Image Featurizer Features Neighbor Match Lookup MMLSpark SparkML LSH or Annoy Filters from Zeiler + Fergus 2013 #UnifiedAnalytics #SparkAISummit 18
19 . Example Nearest Neighbors Query Images Neighbors Nearest #UnifiedAnalytics #SparkAISummit 19
20 .Spark x Azure Search • Azure Search Sink for Spark • Allows for pushing thousands of documents per second into Azure Search instances • Built on HTTP on Spark • Use to create search APIs on top of Spark Dataframe #UnifiedAnalytics #SparkAISummit 20
21 . Microsoft Machine Learning for Apache Spark v0.16 Microsoft’s Open Source Contributions to Apache Spark Cognitive Spark Model LightGBM Deep Networks HTTP on Services Serving Interpretability Gradient Boosting with CNTK Spark www.aka.ms/spark Azure/mmlspark #UnifiedAnalytics #SparkAISummit 21
22 .Conclusions • Can now embed Cognitive Services into Spark Workflows www.aka.ms/spark • Can harness Spark Help us advance Spark: Cluster for Azure/mmlspark Microservices • Get started now with Contact: marhamil@microsoft.com interactive examples! mmlspark-support@microsoft.com #UnifiedAnalytics #SparkAISummit 22
23 .Thanks To • Sudarshan Raghunathan • Ilya Matiach • Microsoft NERD Garage Team + MIT Externship Program • Microsoft Development Acceleration Team: – Dalitso Banda, Casey Hong, Karthik Rajendran, Manon Knoertzer, Tayo Amuneke, Alejandro Buendia • Pablo Castro, Chris Hoder, Ryan Gaspar, Henrik Neilsen, Joseph Sirosh, Andrew Schonhoffer, Daniel Ciborowski, Markus Cosowicz • Azure CAT, AzureML, and Azure Search Teams #UnifiedAnalytics #SparkAISummit 23
24 .Backup Slides #UnifiedAnalytics #SparkAISummit 24
25 . Training Data Real or Generated ? Noise Real or Vector Generated ? Generated Generator Discriminator Image
26 . Target Image 𝐿𝑜𝑠𝑠𝑝𝑖𝑥𝑒𝑙 + 𝐿𝑜𝑠𝑠𝑠𝑒𝑚𝑎𝑛𝑡𝑖𝑐 × 𝜆 Learned Noise Vector Generator Generated Pretrained ResNet 50 Image
27 . Code Space Interpolation 𝐺 −1 𝐺 −1 Inverted Inverted Noise Vector Noise Vector 1 2 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺