- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
【05 大沙】2018_SH_FlinkMeetup(1)
展开查看详情
1 . Flink计算的现状和未来 融入Flink生态 参与开源开发 王绍翾(大沙) 2018.7 2018 上海 Flink China Meetup
2 . Peking University University of California at San Diego EECS Computer Engineer Broadcom Facebook Alibaba Group High-Perf Platform Social Graph Storage Real-Time Data Infra 王绍翾 Flink Committer 阿里花名大沙 Since 2017 wshaoxuan@gmail.com shaoxuan@apache.org
3 .流计算核⼼心技术 低延迟 快速容错 通⽤用的API 易易⽤用性 弹性 ⾼高性能
4 . 贡献社区 天然的Flink纯流式计算的低延迟 主导制定 Flink SQL 语义 • Dynamic Table 2016-2017 • Retraction 2016-2017 贡献社区 ⼤大规模⾼高并发部署的优化 2016 贡献社区 完善 Flink SQL 功能 • Aggregation,Join,Window 2017 • 跑通全部TPCH Query 2018 快速的容错 贡献社区 • Incremental Checkpoint 2016 • Fain-grained Recovery 2016 性能提升 部分贡献社区 • Barrier Alignment Improvement 2017 • ⼤大量量的Query Optimization 2017-2018 性能提升 部分贡献社区 • Async Operator 2016 资源配置⾃自动化 2018 • Credit Based Flow Control 2017 • Load Auto Balance 2017
5 . 贡献社区 天然的Flink纯流式计算的低延迟 主导制定 Flink SQL 语义 • Dynamic Table 2016-2017 • Retraction 2016-2017 贡献社区 ⼤大规模⾼高并发部署的优化 2016 贡献社区 完善 Flink SQL 功能 • Aggregation,Join,Window 2017 • 跑通全部TPCH Query 2018 快速的容错 贡献社区 • Incremental Checkpoint 2016 • Fain-grained Recovery 2016 性能提升 部分贡献社区 • Barrier Alignment Improvement 2017 • ⼤大量量的Query Optimization 2017-2018 性能提升 部分贡献社区 • Async Operator 2016 资源配置⾃自动化 2018 • Credit Based Flow Control 2017 更更多细节,请参⻅见过去3年年我们在Flink • Load Auto Balance 2017 Forward,Hadoop Summit上的talk
6 .流和批的统⼀一 Batch Processing Stream Processing correctness VS real-time return one final result VS emit results as early as possible in stream processing, it emits intermediate results, and keeps refining the results to ensure correctness
7 .流和批的统⼀一 WHAT & HOW: results are calculated Can be fully described by SQL WHEN: to emit a (intermedia) result Does not affect business logic HOW: to refine the results Can be solved by SQL engine
8 .流和批的统⼀一 WHAT & HOW: results are calculated Can be fully described by SQL WHEN: to emit a (intermedia) result Does not affect business logic HOW: to refine the results Can be solved by SQL engine ANSI SQL can Describe Stream Processing 可以⽤用SQL统⼀一流和批的计算
9 .阿⾥里里巴巴Blink Introducing Alibaba Blink Blink1.0: enterprise edition of Flink with lots of improvements Alibaba’s Improvements Apache Flink contributed by Alibaba Blink2.0: a new unified high performance compute engine for complete data applications
10 .Blink SQL Engine的架构 Same SQL Query SQL & Table API Relational Query Processor Query Optimizer & Query Executor Batch mode Stream Mode Runtime DAG API & Operators Local Cluster Cloud Single JVM Standalone, YARN GCE, EC2 Same Results SQL Logical Physical Execution Optimizer & TableAPI Plan Plan DAG completely same between batch & stream processing stream processing has some unique design
11 .优化流计算的挑战和机遇 Batch Processing Stream Processing Predicate, Projection push-down Same as batch Joining Reorder Collect stats in different ways State (MapState/ValueState) Sort related rules VS Retraction EMIT SLA -> MicroBatch Not useful for stream Stream has unique design
12 .Stream Processing TPCH13: StateIO-Cost Plays a Big Role on Plan Choosing ValueState ValueState Join on CountAgg CountAgg custID HashJoin ValueState MapState 25x 150million 100million Join on Join on Customer CountAgg custID custID PK:custID 150million 1.5billion 150million 1.5billion 1.5billion Customer Order Customer Order Order PK:custID PK:orderID PK:custID PK:orderID PK:orderID Batch Processing Stream Processing
13 .Local-Global Agg to Improve Data Skew 1 3 2 7 5 1 3 2 7 5 1 17 SUM 25 18 17 9 8 7 6 5 1 4 3 8 6 1 4 3 8 6 4 18 5 5 4 1 1 3 2 9 5 1 3 2 9 5 5 15 Simple (forwarding) Local- Aggregation Global Aggregation
14 .Local-Global Agg to Improve Data Skew A,1 B,1 Local A,4 A,2 A,3 A,1 B,2 A,4 Agg A,2 Count Map A,3 Distinct A,4 A,2 A,3 A,1 B,2 B,2 A,1 A,2 B,2 Global Agg Local Agg A,2 A,2 A,4 A,2 B,2 A,2 A,2 A,2 A,4 A,2 B,2 B,2 A,4 Map A,3 A,1 Local Global A,4 Agg Agg A,4 A,2 A,4 A,1 B,1 A,4 A,2 A,4 A,1 B,1 B,1 A,2 Map A,4 Local Agg (forwarding) Local-Global (keyed-shuffle) Local-Global Aggregation Aggregation
15 .Flink的现状 被广泛的用于超低延迟的流计算的场景中 阿里巴巴计划贡献Blink给Flink,完善批计算的场 景,提升流和批的整体性能
16 .Flink的未来 RT(实时) + BI(分析) + AI(智能)
17 .Flink未来的应⽤用场景 • 批流统一的计算 • AI/ML • IoT • 实时BI • 规则引擎
18 .融⼊入开源社区
19 .融⼊入开源社区
20 .融⼊入开源社区
21 .融⼊入开源社区
22 .融⼊入开源社区
23 .融⼊入开源社区
24 .参与开源开发
25 .参与开源开发
26 .参与开源开发
27 .参与开源开发
28 .参与开源开发
29 . We are Hiring! 杭州 / 北京/上海/深圳, China 西雅图 / 旧金山湾区, US blink-jobs@list.alibaba-inc.com Thanks Shaoxuan wang shaoxuan.wsx@alibaba-inc.com shaoxuan@apache.org 2018.7.6