- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Stream&Segment - best way to access events in Pulsar
展开查看详情
1 .Stream/Segment - best way to access events in Pulsar Neng Lu streamnative.io
2 .Who Am I ❏ StreamNative Software Engineer ❏ Ex-Twitter ❏ Contributed to Apache Projects - Heron, Pulsar ❏ Interested in event streaming technologies
3 .Pulsar 1.X
4 .Apache Pulsar “Flexible Pub/Sub Messaging Backed by Durable Log Storage”
5 .Pulsar 2.X
6 .Apache Pulsar “Cloud-native Messaging and Event Streaming Platform”
7 .Pulsar Use Cases ❏ Unified Event Center/Bus (Queuing + Streaming) ❏ Billing Service ❏ Push Notification ❏ Worker Queue ❏ Logging Pipeline ❏ IoT ❏ Streaming-first, unified data processing
8 .Data Processing with Apache Pulsar
9 .Data Processing Categories ❏ Batch ❏ The amount of data is huge ❏ Can run on a huge cluster ❏ Fine-grained fault tolerance
10 .Data Processing Categories ❏ Batch ❏ Streaming ❏ The amount of data is huge ❏ Long running jobs ❏ Can run on a huge cluster ❏ Time critical ❏ Fine-grained fault tolerance ❏ scalability as well as fault tolerant
11 .Data Processing Categories ❏ Batch ❏ Streaming ❏ The amount of data is huge ❏ Long running jobs ❏ Can run on a huge cluster ❏ Time critical ❏ Fine-grained fault tolerance ❏ scalability as well as fault tolerant ❏ Interactive ❏ Time critical ❏ Medium data size ❏ Rerun on failures
12 .Data Processing Categories ❏ Batch ❏ Streaming ❏ The amount of data is huge ❏ Long running jobs ❏ Can run on a huge cluster ❏ Time critical ❏ Fine-grained fault tolerance ❏ scalability as well as fault tolerant ❏ Interactive ❏ Serverless ❏ Time critical ❏ Simple, light-weight processing ❏ Medium data size ❏ Processing data with high ❏ Rerun on failures velocity
13 .Apache Pulsar Layered Architecture Stateless Serving Durable Storage
14 .Pulsar Messaging API ❏ Read data from brokers with different Subscription Modes ❏ Consume / Seek / Receive ❏ Reprocessing data by rewinding (seeking) the cursors
15 .Subscription Mode ❏ Exclusive ❏ Failover ❏ Shared ❏ Key_Shared
16 .Pulsar Segment API ❏ Read data from storage (bookkeeper or tiered storage) ❏ Fine-grained Parallelism ❏ Predicate pushdown (publish timestamp)
17 .Segment Centric Storage ❏ Topic Partition (Managed Ledger) ❏ The storage layer for a single topic partition ❏ Segment (Ledger) ❏ Single writer, append-only ❏ Replicated to multiple bookies
18 .Tired Storage ❏ Long retention ❏ Low cost ❏ Easy to access
19 .Apache Pulsar Data APIs Producer Consumer Messaging API Broker 1 Broker 2 Broker 3 Segment API Bookie1 Bookie2 Bookie3 Bookie4 Bookie5 S3 GCS HADOOP
20 .Pulsar - Infinite Event Stream Storage
21 .Pulsar - Topic
22 .Pulsar - Topic Partitions
23 .Pulsar - Segments
24 .Pulsar - Stream
25 .Pulsar - Infinite Event Stream Storage
26 .Benefits ❏ Unlimited Topic Partition Storage ❏ Instant Scaling without Data Rebalancing ❏ Broker Failure Recovery ❏ Bookie Failure Recovery ❏ Cluster Expansion ❏ Low latency reading for messaging data ❏ High throughput reading for batch data ❏ Reduced cost for whole data storage
27 .Pulsar SQL Case
28 .Pulsar Flink Case 1 1 1 Flink 9 8 7 6 5 4 3 2 1 2 1 0 Job1 Flink Job2
29 .Conclusion ❏ Apache Pulsar is a cloud-native messaging streaming system ❏ Multi layered architecture ❏ Segment centric storage ❏ Two levels of reading API: Pub/Sub + Segment ❏ Apache Pulsar provides a unified view of data