- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Streaming SQL 基础
展开查看详情
1 . Foundations of streaming SQL or: how I learned to love stream & table theory Slides: https://s.apache.org/streaming-sql-qcon-london Tyler Akidau Apache Beam PMC Software Engineer at Google @takidau Covering ideas from across the Apache Beam, Apache Calcite, Apache Kafka, and Apache Flink communities, with thoughts and contributions from Julian Hyde, Fabian Hueske, Shaoxuan Wang, Kenn Knowles, Ben Chambers, Reuven Lax, Mingmin Xu, James Xu, Martin Kleppmann, Jay Kreps and many more, not to mention that whole database community thing... QCon London 2018 1
2 .Table of Contents 01 Stream & Table Theory A Basics Chapter 7 B The Beam Model 02 Streaming SQL A Time-varying relations Chapter 9 B SQL language extensions 2
3 .01 Stream & Table Theory TFW you realize everything you do was invented by the database community decades ago... A Basics B The Beam Model 3
4 .Stream & table basics https://www.confluent.io/blog/making-sense-of-stream-processing/ https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/ 4
5 .Special theory of stream & table relativity streams → tables: The aggregation of a stream of updates over time yields a table. tables → streams: The observation of changes to a table over time yields a stream. 5
6 .Non-relativistic stream & table definitions Tables are data at rest. Streams are data in motion. 6
7 .01 Stream & Table Theory TFW you realize everything you do was invented by the database community decades ago... A Basics B The Beam Model 7
8 .The Beam Model What results are calculated? Where in event time are results calculated? When in processing time are results materialized? How do refinements of results relate? 8
9 .Reconciling streams & tables w/ the Beam Model ● How does batch processing fit into all of this? ● What is the relationship of streams to bounded and unbounded datasets? ● How do the four what, where, when, how questions map onto a streams/tables world? 9
10 .MapReduce input Map Reduce output 10
11 .MapReduce input MapRead ReduceRead Map Reduce MapWrite ReduceWrite output 11
12 .MapReduce ? MapRead ReduceRead ? ? Map Reduce ? ? MapWrite ReduceWrite ? ? 12
13 .MapReduce table MapRead ReduceRead ? ? Map Reduce ? ? MapWrite ReduceWrite ? table 13
14 .Map phase table MapRead ? Map ? MapWrite ? 14
15 .Map phase API void map(K1 key, V1 value, Emit<K2, V2>); 15
16 .Map phase API void map(K1 key, V1 value, Emit<K2, V2>); 16
17 .Map phase table MapRead stream Map ? MapWrite ? 17
18 .Map phase API void map(K1 key, V1 value, Emit<K2, V2>); 18
19 .Map phase table MapRead stream Map stream MapWrite ? 19
20 .Map phase API void map(K1 key, V1 value, Emit<K2, V2>); void reduce(K2 key, Iterable<V2> value, Emit<V3>); 20
21 .Map phase table MapRead stream Map stream MapWrite table 21
22 .MapReduce table MapRead ReduceRead stream ? Map Reduce stream ? MapWrite ReduceWrite table table 22
23 .Map phase API void map(K1 key, V1 value, Emit<K2, V2>); void reduce(K2 key, Iterable<V2> value, Emit<V3>); 23
24 .Map phase API void map(K1 key, V1 value, Emit<K2, V2>); void reduce(K2 key, Iterable<V2> value, Emit<V3>); 24
25 .MapReduce table MapRead ReduceRead stream stream Map Reduce stream stream MapWrite ReduceWrite table table 25
26 .Reconciling streams & tables w/ the Beam Model ● How does batch processing fit into all of this? 1.● Tables What isare read into streams. the relationship of streams to bounded and unbounded datasets? 2. Streams are processed into new streams until a grouping operation is hit. ● How do the four what, where, when, how questions map onto a streams/tables 3. Grouping world? into a table. turns the stream 4. Repeat steps 1-3 until you run out of operations. 26
27 .Reconciling streams & tables w/ the Beam Model ● How does batch processing fit into all of this? ● What is the relationship of streams to bounded and unbounded datasets? ● HowStreams arewhat, do the four the where, in-motion when,form of data map how questions onto a streams/tables world? both bounded and unbounded. 27
28 .Reconciling streams & tables w/ the Beam Model ● How does batch processing fit into all of this? ● What is the relationship of streams to bounded and unbounded datasets? ● How do the four what, where, when, how questions map onto a streams/tables world? 28
29 .The Beam Model What results are calculated? Where in event time are results calculated? When in processing time are results materialized? How do refinements of results relate? 29