- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
展开查看详情
1 .WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
2 .An AI-powered Chatbot to Simplify Spark Performance Management Shivnath Babu Cofounder/CTO, Unravel Adjunct Professor, Duke University #UnifiedAnalytics #SparkAISummit
3 .Meet the speaker • Cofounder/CTO at Unravel • Adjunct Professor of Computer Science at Duke University • Focusing on ease-of-use and manageability of data-intensive systems • Recipient of US National Science Foundation CAREER Award, three IBM Faculty Awards, HP Labs Innovation Research Award #UnifiedAnalytics #SparkAISummit 3
4 .What is a Chatbot? #UnifiedAnalytics #SparkAISummit 4
5 .A program which conducts a conversation via text or voice #UnifiedAnalytics #SparkAISummit 5
6 .Chatbots are making a real difference #UnifiedAnalytics #SparkAISummit 6
7 . Source: https://chatbottle.co/awards/2018 #UnifiedAnalytics #SparkAISummit 7
8 . TOBi generates 2x more ecommerce conversions in ½ the time for Vodafone #UnifiedAnalytics #SparkAISummit 8
9 . Zara provides fast services to 20% of Zurich Insurance customers #UnifiedAnalytics #SparkAISummit 9
10 . Woebot, the therapist chatbot, talks to more people in a day than a human therapist does in a lifetime #UnifiedAnalytics #SparkAISummit 10
11 .Chatbots ó Spark Performance What is the connection? #UnifiedAnalytics #SparkAISummit 11
12 .The happy Spark user • Spark is fast • Spark has easy-to-use and comprehensive APIs • Wow, I can do SQL, Streaming, AI/ML, and Graphs in one system! • Spark has a rich ecosystem #UnifiedAnalytics #SparkAISummit 12
13 .The frustrated Spark user “I have no idea why “My app my app is failed and I slow” don’t know why!” “I have no clue which cloud instance type to pick for my workload” “My cloud costs are getting out of control. Help!” #UnifiedAnalytics #SparkAISummit 13
14 .Typical app failure in Spark • Many levels of correlated stack traces • Identifying the root cause is hard and time consuming #UnifiedAnalytics #SparkAISummit 14
15 .Spark User Spark Chatbot “My app failed and I don’t know “I know that sucks! Let me take why!” a look here …” “I see the problem. Executors are running out of memory” “Setting spark.executor.memory to 12g “Wow. fixes the problem. I have Thanks. verified it. See this run here” You are awesome!” #UnifiedAnalytics #SparkAISummit 15
16 .I will show you a Chatbot that • Makes you more productive • Saves you time and money • Becomes your AI-driven Spark Expert in a Bot! #UnifiedAnalytics #SparkAISummit 16
17 .My app is too slow… DATA ENGINEER #UnifiedAnalytics #SparkAISummit 17
18 .I need to make it faster… DATA ENGINEER #UnifiedAnalytics #SparkAISummit 18
19 .Current approach 1. Review Spark/YARN UI to find the app 2. Review metrics in the UI 3. Review jobs and stages associated with the app 4. Identify all containers associated with the app 5. Review and debug container logs 6. Identify “problematic” jobs, stages, or containers 7. Guess which parameters to tune for performance 8. Do trial-and-error by changing a parameter setting 9. Rinse & repeat #UnifiedAnalytics #SparkAISummit 19
20 .There has to be a better way #UnifiedAnalytics #SparkAISummit 20
21 .What is going on here? #UnifiedAnalytics #SparkAISummit 21
22 .Chatbot Architecture from 30000 ft Messaging Bot’s NLP Bot’s Backend Platform Layer Layer #UnifiedAnalytics #SparkAISummit 22
23 .Algorithm running in bot’s backend Recommendation Monitoring Algorithm Data App,Goal Probe Algorithm Historic Data & Xnext Probe Data Orchestrator Cluster Services On-premises and Cloud #UnifiedAnalytics #SparkAISummit 23
24 . Spark tuning parameters spark.driver.cores 2 PERFORMANCE spark.executor.cores 10 … spark.sql.shuffle.partitions 300 spark.sql.autoBroadcastJoinThres 20MB hold … SKEW('orders', 'o_custId') true spark.catalog.cacheTable(“orders") true … We represent this setting as vector X X #UnifiedAnalytics #SparkAISummit 24
25 .Given: App + Goal PERFORMANCE • Find the setting of X that best meets the goal • Challenge: Response surface y = ƒ(X) is unknown X #UnifiedAnalytics #SparkAISummit 25
26 .Challenge: Response surface y = ƒ(X) is unknown Model the response surface as PERFORMANCE !t ! yˆ ( X ) = f ( X ) b +Z ( X ) Here: !t ! f ( X )b is a regression model Z(X ) is the residual captured as a Gaussian Process The Gaussian Process model captures the uncertainty in our current knowledge of the response surface X #AI7SAIS 26
27 . Opportunity We can now estimate the expected improvement EIP(X) from doing a probe at any setting X PERFORMANCE p= y( X * ) EIP( X )= ò ( y( X ) - p ) pdf yˆ ( X ) ( p )dp * p = -¥ Improvement at any Probability density setting X over the best function (uncertainty performance seen so far estimate) Gaussian Process model helps estimate EIP(X) X #UnifiedAnalytics #SparkAISummit 27
28 . Bootstrap 1 Get initial set of monitoring data from history or via probes: <X1,y1>, PERFORMANCE <X2,y2>, …, <Xn,yn> Probe Algorithm 2 Select next probe Xnext based on all Until the history and probe data stopping condition available so far to is calculate the setting reached with maximum expected improvement EIP(X) X #AI7SAIS 28
29 . Performance 8 6 y EIP(X) 4 2 U U U 0 4 6 8 10 12 X x1 Xnext: Do next This approach probe here balances Exploration Vs. Exploration Exploitation Exploitation #UnifiedAnalytics #SparkAISummit 29