- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
展开查看详情
1 .Lessons Learned Using Apache Spark for Self-Service Data Prep (and More) in SaaS World Pavel Hardak (Product Manager, Workday) Jianneng Li (Software Engineer, Workday) #UnifiedAnalytics #SparkAISummit
2 . Safe Harbor Statement This presentation may contain forward-looking statements for which there are risks, uncertainties, and assumptions. If the risks materialize or assumptions prove incorrect, Workday’s business results and directions could differ materially from results implied by the forward-looking statements. Forward-looking statements include any statements regarding strategies or plans for future operations; any statements concerning new features, enhancements or upgrades to our existing applications or plans for future applications; and any statements of belief. Further information on risks that could affect Workday’s results is included in our filings with the Securities and Exchange Commission which are available on the Workday investor relations webpage: www.workday.com/company/investor_relations.php Workday assumes no obligation for and does not intend to update any forward-looking statements. Any unreleased services, features, functionality or enhancements referenced in any Workday document, roadmap, blog, our website, press release or public statement that are not currently available are subject to change at Workday’s discretion and may not be delivered as planned or at all. Customers who purchase Workday, Inc. services should make their purchase decisions upon services, features, and functions that are currently available. #UnifiedAnalytics #SparkAISummit 2
3 .Agenda ● Workday - Finance and HCM in the cloud ● Workday Platform - “Power of One” ● Prism Analytics - Powered by Apache Spark ● Production Stories & Lessons Learned ● Questions #UnifiedAnalytics #SparkAISummit 3
4 . Execute Financial Management Human Capital Planning Management Plan ● “Pure” SaaS apps suite ○ Finance and HCM Prism Analytics and Reporting ● Customers: 2,500+ ○ 200+ of Fortune 500 ● Revenue: $2.82B ○ Growth: 32% YoY Analyze #UnifiedAnalytics #SparkAISummit 4
5 . Workday Confidential #UnifiedAnalytics #SparkAISummit 5
6 .One Source for Data | One Security Model | One Experience | One Community One Platform Business Process Object Reporting and Framework Data Model Analytics Security Machine Integration Learning Cloud #UnifiedAnalytics #SparkAISummit 6
7 . One Source for Data | One Security Model | One Experience | One Community One Platform Business Process Object Reporting and Framework Data Model Analytics Security Machine Integration Learning Cloud Object Data Model Durable Extensible Metadata #UnifiedAnalytics #SparkAISummit 7
8 . One Source for Data | One Security Model | One Experience | One Community One Platform Business Process Object Reporting and Framework Data Model Analytics Security Machine Integration Learning Cloud Security Encryption Privacy and Trust Compliance #UnifiedAnalytics #SparkAISummit 8
9 . One Source for Data | One Security Model | One Experience | One Community One Platform Business Process Object Reporting and Framework Data Model Analytics Security Machine Integration Learning Cloud Reporting and Analytics Dashboards Distribution Collaboration #UnifiedAnalytics #SparkAISummit 9
10 . Execute Financial Management Human Capital Planning Management Plan Prism Analytics and Reporting Analyze #UnifiedAnalytics #SparkAISummit 10
11 . Workday Financial Management Execute Financial Management Workday Human Capital Workday Planning Human Capital Planning Management Management Plan Workday Prism Analytics and Reporting Prism Analytics and Reporting Integrate 3rd Party Data Data Management Data Preparation Data Discovery Report Publishing Prism Analytics Analyze #UnifiedAnalytics #SparkAISummit 11
12 .Workday Prism Analytics The full spectrum of Finance and HCM insights, all within Workday. Workday Data + Non-Workday Data #UnifiedAnalytics #SparkAISummit 12
13 . Prism Analytics Workflow Acquisition Preparation Analysis Data Discovery Finance, HCM Cleanse and Transform Map Reporting Blend Datasets Operational CRM Service ticketing Ingest Apply Security Permissions Surveys Point of Sale Worksheets Industry systems Stock grants Legacy systems More… Publish Data Source #UnifiedAnalytics #SparkAISummit 13
14 .Spark in Prism Analytics Spark Spark Driver Executor Interactive Data Prep Spark Spark Spark Prism Executor Driver Executor Data Prep Publishing Prism Spark Spark Driver Executor Query Engine Prism YARN HDFS / S3 #UnifiedAnalytics #SparkAISummit 14
15 . Interactive Data Prep in Prism Number of samples Examples and statistics Transform Stages #UnifiedAnalytics #SparkAISummit 15
16 .Interactive Data Prep in Prism #UnifiedAnalytics #SparkAISummit 16
17 .Interactive Data Prep in Prism Powered by Spark Edit Transform #UnifiedAnalytics #SparkAISummit 17
18 .Data Prep Publishing in Prism Also powered by Spark #UnifiedAnalytics #SparkAISummit 18
19 .Data Prep: Interactive vs. Publishing Interactive Publishing Data size 100 - 100K rows Billions of rows Sampling Yes No Caching Yes No Latency Seconds Minutes to hours Result Returned in memory Written to disk SLA Best effort Consistent performance #UnifiedAnalytics #SparkAISummit 19
20 .Data Prep: Interactive vs. Publishing Same plan! #UnifiedAnalytics #SparkAISummit 20
21 .Prism Logical Model #UnifiedAnalytics #SparkAISummit 21
22 .Prism Logical Model • Superset of SQL operators • Compiles to Spark plans through Spark SQL • Implements custom Catalyst rules and strategies #UnifiedAnalytics #SparkAISummit 22
23 .Example: Interactive Data Prep Operators IngestSampler Prism Logical Plan LogicalIngestSampler Spark Logical Plan IngestSamplerExec Spark Physical Plan IngestSamplerRDD RDD #UnifiedAnalytics #SparkAISummit 23
24 .Prism Data Types #UnifiedAnalytics #SparkAISummit 24
25 .Implementing Additional Data Types • Prism has a richer type system than Catalyst • Uses StructType and StructField to implement additional data types #UnifiedAnalytics #SparkAISummit 25
26 .Example: Prism Currency Type object CurrencyType extends StructType( Array( StructField(“amount”,DecimalType(26, 6)), StructField(“code”, StringType))) >> { “amount”: 1000.000000, “code”: “USD” } >> { “amount”: -999.000000, “code”: “YEN” } #UnifiedAnalytics #SparkAISummit 26
27 .Lessons Learned #UnifiedAnalytics #SparkAISummit 27
28 .Lessons #1: Nested SQL #UnifiedAnalytics #SparkAISummit 28
29 .Lesson #1: Nested SQL • SQL requires computed columns to be nested – SELECT 1 as c1, c1 + 1 as c2; /* ✗ */ – SELECT c1 + 1 as c2 FROM (SELECT 1 as c1); /* ✓ */ • First version: one nesting per computed column – Does not scale to 100s of columns – Takes a long time to compile and optimize #UnifiedAnalytics #SparkAISummit 29