- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Ownership: A Distributed Futures System for Fine-Grained Tasks
Outline
- An overview of distributed futures
- System requirements and challenges
- Ownership: Achieving fault tolerance without giving up
performance - Evaluation
展开查看详情
1 .Ownership: A Distributed Futures System for Fine-Grained Tasks Stephanie Wang, Eric Liang, Edward Oakes, Ben Hindman, Frank Luan, Audrey Cheng, Ion Stoica
2 .Outline 1. An overview of distributed futures 2. System requirements and challenges 3. Ownership: Achieving fault tolerance without giving up performance 4. Evaluation
3 .RPC model Driver Worker 1 Worker 2 o1 = f() o1=f() o2 = f() o1 o3 = add(o1, o2) o2=f() o2 Problems: o1 o2 ● Data movement o3=add( ● Parallelism o1,o2) o3
4 .Data movement: RPC model +distributed memory Driver Worker 1 Worker 2 Distributed memory: Ability to reference data stored in the o1=f() memory of a remote process. o1 o2=f() ● Application can pass by o2 reference o3=add( o1 o1,o2) ● System manages data movement o3 o3
5 .Parallelism: RPC model +futures Driver Worker 1 Worker 2 Futures: Ability to reference data that has not yet been computed. o1 o1=f() o2=f() ● Application can specify o2 parallelism and data o1 o2 dependencies o3=add( o1,o2) ● System manages task o3 scheduling
6 .Distributed futures Driver Worker 1 Worker 2 ● Performance: System handles data movement and o1=f() o2=f() parallelism o1 o2 o3=add( ● Generality: RPC-like interface o1 o1,o2) (data is immutable). o3 o3 Application does not specify when or where computation should execute.
7 .Distributed futures today Distributed futures are growing in popularity, with applications in a variety of domains: ● Data processing: CIEL, Dask ● Machine learning: Ray, Distributed PyTorch Most systems focus on coarse-grained tasks (>100ms): ● A centralized master for system metadata. ● Lineage reconstruction (re-execution of the tasks that created an object) for fault tolerance.
8 .A distributed futures system for fine-grained tasks For generality, the system must impose low overhead. Analogy: gRPC can execute millions of tasks/s. Can we do the same for distributed futures? Goal: Build a distributed futures system that guarantees fault tolerance with low task overhead. Enable applications that dynamically generate fine-grained tasks. → Check out the paper for more details!
9 .Outline 1. An overview of distributed futures 2. System requirements and challenges 3. Ownership: Achieving fault tolerance without giving up performance 4. Evaluation
10 .Distributed futures introduce shared state Legend driver Task (RPC) Invocation f() o1 Data dependency add(o1,o2) o2 f()
11 .Distributed futures introduce shared state Multiple processes refer to the same value. driver 1. The process that specifies how the value is created and used. f() o1 2. The process that creates the value. add(o1,o2) 3. The process that uses the value. o2 f() 4. The physical location of the value. Dereferencing a distributed future requires coordination.
12 .System requirements Requirements for dereferencing a value: ● Retrieval: The location of the value ● Garbage collection: Whether the value is referenced Requirements in the presence of failures: ● Detection: The location of the task that returns the value. ● Recovery: A description of the task and its dependencies. ● Persistence: Metadata should survive failures.
13 .System requirements Requirements for dereferencing a value: ● Retrieval: The location of the value Challenge: ● Garbage Recording collection: this metadata, Whether the value iswhile ensuring referenced latency and throughput Requirements infor thedynamic presence andof fine-grained failures: tasks. ● Detection: The location of the task that returns the value. ● Recovery: A description of the task and its dependencies. ● Persistence: Metadata should survive failures.
14 .Existing solutions Architecture Coordination Performance Centralized Master records all Can scale through master metadata updates and sharding, but high handles all failures. overhead due to synchronous updates. Leases Workers coordinate. For Asynchronous (decentralized) example, use leases to metadata updates. detect a task failure. Scale by adding more worker nodes.
15 .Outline 1. An overview of distributed futures 2. System requirements and challenges 3. Ownership: Achieving fault tolerance without giving up performance 4. Evaluation
16 .Our approach: Ownership Existing solutions do not take advantage of the inherent structure of a distributed futures application. driver 1. Task graphs are hierarchical. 2. A distributed future is often f() o1 passed within the scope of the add(o1,o2) caller. o2 f()
17 .Our approach: Ownership Existing solutions do not take advantage of the inherent structure of a distributed futures application. driver 1. Task graphs are hierarchical. 2. A distributed future is often f() o1 passed within the scope of the add(o1,o2) caller. o2 f() Insight: By leveraging the structure of distributed futures applications, we can decentralize without requiring expensive coordination.
18 .Our approach: Ownership Insight: By leveraging the structure of distributed futures applications, we can decentralize without requiring expensive coordination. Architecture Failure handling Performance Ownership: Each worker is a No additional writes on The worker that “centralized master” for the critical path of task calls a task owns the objects that it owns. execution. Scaling the returned through nested distributed future. function calls.
19 .Ownership: Challenges ● Failure recovery ○ Recovering a lost worker ○ Recovering a lost owner ● Garbage collection and memory safety ● Handling first-class distributed futures, i.e. distributed futures that leave the caller’s scope
20 .Ownership: Challenges ● Failure recovery ○ Recovering a lost worker ○ Recovering a lost owner ● Garbage collection and memory safety ● Handling first-class distributed futures, i.e. distributed futures that leave the caller’s scope → Check out the paper for more details!
21 . A Task scheduling B X C Y Worker A Worker Worker Obj Task Val Loc X B() Object Object Y C(X) Store Store Node 1 Node 2 Node 3
22 . A Task scheduling B X C Y 2 Worker A Worker Worker B Obj Task Val Loc X B() 1 N2 Object Object Y C(X) Store Store Node 1 Node 2 Node 3 A task’s pending location is written locally at the owner.
23 . A Distributed memory management B X C Y Worker A Worker Worker B Obj Task Val Loc 4 X: N2 3 X B() 5 *X N2 Object Object Y C(X) Store X Store Node 1 Node 2 Node 3 Owner tracks locations of objects stored in distributed memory.
24 . A Task scheduling with dependencies B X C Y Worker A Worker Worker C Obj O. Obj Task Val Loc X W1 X B() *X N2 Object Object Y C(X) N3 Store X Store Node 1 Node 2 Node 3
25 . A Worker failure B X C Y Worker A Worker Worker C Obj O. Obj Task Val Loc X W1 X B() *X N2 Object Object Y C(X) N3 Store X Store Node 1 Node 2 Node 3 Reference holders only need to check whether the owner is alive.
26 . A Worker recovery B X C Y Worker A Worker Worker B C Obj O. Obj Task Val Loc X W1 X B() *X N2 N4 Object Object Y C(X) N3 Store Store Node 1 Node 4 Node 3 Owner coordinates lineage reconstruction.
27 . A Owner failure B X C Y Worker A Worker Worker C Obj O. Obj Task Val Loc X W1 X B() *X N2 Object Object Y C(X) N3 Store X Store Node 1 Node 2 Node 3
28 . A Owner recovery B X C Y Worker Worker References C Obj O. fate-share with the X W1 object’s owner. Object Object Store X Store Node 2 Node 3
29 . A’s owner A Owner recovery B X C Y Worker Worker References C Obj O. fate-share with the X W1 object’s owner. Object Object Store X Store Node 2 Node 3