- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- <iframe src="https://www.slidestalk.com/Spark/Continuous_Applications_at_Scale_of_100_Teams_with_Databricks?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
- 微信扫一扫分享
Continuous Applications at Scale of 100 Teams with Databricks Delta and Structur
展开查看详情
1 . Continuous Applications at the Scale of 100 Teams with Databricks Delta and Structured Streaming Viacheslav Inozemtsev Max Schultze April 25, 2019
2 .OUTLINE Introduction of Zalando Zalando’s Processing Platform Databricks Use Cases Lessons Learned
3 . Who we are Viacheslav Inozemtsev Max Schultze ● Data Engineer ● Data Engineer ● Degrees in Applied Math and in ● MSc in Computer Computer Science Science ● Working with Spark since 0.9.2 ● Took part in early development of Apache Flink 3
4 .Introduction of Zalando 4
5 . Zalando’s Data Lake Ingestion Serving Storage 5
6 . Zalando’s Data Lake Ingestion Serving Data Center Storage DWH Event Bus 6
7 . Zalando’s Data Lake Ingestion Serving Data Center Storage DWH Event Bus Metastore 7
8 . Zalando’s Data Lake Ingestion Serving Ad-Hoc querying Data Center Storage DWH Event Bus Metastore 8
9 . Zalando’s Data Lake Ingestion Serving Ad-Hoc Querying Data Center Storage DWH Processing Platform Event Bus Metastore 9
10 . Zalando’s Data Lake Ingestion Serving Ad-Hoc Querying Data Center Storage DWH Processing Platform Event Bus Metastore 10
11 .Zalando’s Databricks Processing Platform 11
12 . Zalando’s Databricks Processing Platform - Technical Setup 12
13 . Zalando’s Databricks Processing Platform - Technical Setup 13
14 . Zalando’s Databricks Processing Platform - Technical Setup 14
15 . Zalando’s Databricks Processing Platform - Technical Setup 15
16 . Zalando’s Databricks Processing Platform - Technical Setup 16
17 . Zalando’s Databricks Processing Platform - Organizational Setup Introduction to Databricks ● RSA ● Office Hours 17
18 . Zalando’s Databricks Processing Platform - Organizational Setup Introduction to Databricks Initial Setup ● RSA ● Inner Source ● Office Hours Configuration 18
19 . Zalando’s Databricks Processing Platform - Organizational Setup Introduction to Databricks Initial Setup ● RSA ● Inner Source ● Office Hours Configuration Development Phase ● Office Hours ● Guest Developer 19
20 . Zalando’s Databricks Processing Platform - Organizational Setup Introduction to Databricks Initial Setup ● RSA ● Inner Source ● Office Hours Configuration Development Phase Productionizing ● Office Hours ● 24/7 Support ● Guest Developer 20
21 .Databricks Use Cases 21
22 . Batch Ingestion from Data Warehouse Ingestion Serving Ad-Hoc Querying Data Center Storage DWH Processing Platform Event Bus Metastore 22
23 . Batch Ingestion from Data Warehouse 23
24 . Batch Ingestion from Data Warehouse ● Problem 1: extraction from databases via JDBC can be slow 24
25 . Batch Ingestion from Data Warehouse ● Problem 1: extraction from databases via JDBC can be slow ● Solution: ○ use parallelism of Spark JDBC reader 25
26 . Batch Ingestion from Data Warehouse ● Problem 1: extraction from databases via JDBC can be slow ● Solution: ○ use parallelism of Spark JDBC reader ○ for partitioned tables a view with a column PARTITION_ID can be created 26
27 . Batch Ingestion from Data Warehouse ● Problem 1: extraction from databases via JDBC can be slow ● Solution: ○ use parallelism of Spark JDBC reader ○ for partitioned tables a view with a column PARTITION_ID can be created ○ works especially well for tables partitioned on multiple machines 27
28 . Batch Ingestion from Data Warehouse ● Problem 2: data warehouse is still often on premises 28
29 . Batch Ingestion from Data Warehouse ● Problem 2: data warehouse is still often on premises ● Solution: ○ resolve this early! 29







