- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Pulsar介绍
展开查看详情
1 . Apache Pulsar 实时数据处理理中 消息,计算和存储的统⼀一 演讲者/streamlio 翟佳
2 . 4 What’s the state of the art
3 . 5 What’s the state of the art
4 . 6 Apache Pulsar — Unify Messaging Computing Pulsar Broker Pulsar Functions Segment Store Stream Table BookKeeper
5 . 7 Why Apache Pulsar? Durability Ordering Delivery Guarantees Data replicated and Guaranteed ordering At least once, at most synced to disk once and effectively once Geo-replication Multi-tenancy Low Latency Out of box support for A single cluster can Low publish latency of geographically support many tenants 5ms at 99pct distributed and use cases applications Unified messaging High throughput Highly scalable model Can reach 1.8 M Can support millions of Support both messages/s in a topics Streaming and single partition Queuing in a single model
6 . 8 Pulsar Architecture Producer Consumer Separate layers between brokers bookies • Broker and bookies can Pulsar Broker 1 Pulsar Broker 1 Pulsar Broker 1 be added independently • Traffic can be shifted very quickly across Bookie 1 Bookie 2 Bookie 4 Bookie 5 Bookie 3 brokers Apache BookKeeper • New bookies will ramp up on traffic quickly Apache Pulsar
7 . 9 Messaging Messaging Computing Pulsar Broker Pulsar Functions Segment Store Stream Table BookKeeper
8 . 10 Messaging - Concepts
9 . 11 Messaging - Namespace
10 . 12 Messaging - Queuing & Streaming (kafka, kinesis, …) (SQS, ActiveMQ, RabbitMQ, …)
11 . 13 Messaging - ACK Cumulative Individual
12 . 14 Messaging - Retention
13 . 15 Storage Messaging Computing Pulsar Broker Pulsar Functions Segment Store Stream Table BookKeeper
14 . 16 Storage - Apache BookKeeper • A replicated log storage • Low-latency durable writes • Simple repeatable read consistency • Highly available • Store many logs per node • I/O Isolation
15 . 17 Storage - Apache BookKeeper
16 . 18 Storage - Segment Centric
17 . 19 Storage - Segment/Stream/Table Messaging Computing Pulsar Broker Pulsar Functions Segment Store Stream Table BookKeeper
18 . 20 Compute Messaging Computing Pulsar Broker Pulsar Functions Segment Store Stream Table BookKeeper
19 . 21 Compute Representation Abstract View f(x) Incoming Messages Output Messages
20 . 22 Lessons learnt A significant percentage of transformations are simple ETL/Reactive Services/Classification/Real-time Aggregation Event Routing/Microservices The emergence of Serverless Simple Function API Run per event Composition APIs to do complex things Wildly popular
21 . 23 Whats needed: Stream-Native Compute Insight gained from serverless Simplest possible API Method/Procedure/Function Multi Language API Scale developers Stream native concepts Input/Output/Log as topics Flexible runtime Simple standalone applications vs system managed applications
22 . 24 Pulsar Functions — API SDK less API import java.util.function.Function; public class ExclamationFunction implements Function<String, String> { @Override public String apply(String input) { return input + "!"; } } SDK API import org.apache.pulsar.functions.api.PulsarFunction; import org.apache.pulsar.functions.api.Context; public class ExclamationFunction implements PulsarFunction<String, String> { @Override public String process(String input, Context context) { return input + "!"; } }
23 . 25 Pulsar Functions Running as a standalone application bin/pulsar-admin functions localrun \ --input persistent://sample/standalone/ns1/test_input \ --output persistent://sample/standalone/ns1/test_result \ --className org.mycompany.ExclamationFunction \ --jar myjar.jar Runs as a standalone process Run as many instances as you want. Framework automatically balances data Run and manage via Mesos/K8/Nomad/your favorite tool
24 . 26 Pulsar Functions: Use Cases Sensor devices generate tons of data Lot of local actions Edge Computing Simple filtering, threshold detection, regex matching, etc Resource Constrained Limited scope for Full blown schedulers/Job Managers Models computed via offline analysis Model Incoming requests should be classified using the model Serving Function is a natural representation for the classification action Model itself can be stored in Bookkeeper
25 . 27 Pulsar Functions Unify Messaging and Compute cluster into one Function executed for every message of input topic Supports multiple topics as inputs Runtime User Controlled Guarantees: ATMOST_ONCE / ATLEAST_ONCE / EFFECTIVE_ONCE Built-in State Management: Unified Stream & State Store with BookKeeper. Simplified application development
26 . 28 Unified Streaming Solution Messaging Computing Spark DATA Pulsar Broker Pulsar Functions Flink DATA DATA HDFS DATA 。。。 Segment Store Stream Table DATA BookKeeper
27 . 29 Messaging Benchmark https://github.com/openmessaging/openmessaging-benchmark
28 . 30 Benchmark • Testing goals • Throughput & latency under different conditions • Min 2 guaranteed copies • Running on 3 EC2 VMs with local SSDs
29 . 31 Kafka settings • Topic settings replicationFactor=3 min.insync.replicas=2 log.flush.interval.ms= # Using default: means no fsyncs • Kafka producer config acks=all linger.ms=1 batch.size=131072