- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Milvus 系统设计理念的再次思考
展开查看详情
1 .2021.06 Shanghai Rethinking the Milvus Architecture Design Guo Rentong
2 .郭人通 兴趣领域: 分布式系统、异构计算、缓存系统 CCF 分布式计算与系统专委会委员 计算机软件与理论博士 合伙人 & 系统架构师
3 . 01 The New Data Foundation in the Unstructured Data Era 02 The System Challenges C O N T E N T S 03 The Milvus Solution
4 . The New Data Foundation in the Unstructured Data Era
5 .80% data growth is unstructured, over 40,000 Exabytes per year
6 .Data are Increasing Horizontally : Types image Int, float, text json video domain specific string, … audio ABCDEFG 2021.04.10 Structured data Unstructured data
7 .Data are Increasing Vertically : Semantics Richer semantics embeddings embeddings embeddings
8 .Embedding as a New Source of Data Types with high abstraction boolean Int float string … embedding
9 .System Challenges
10 .The System Challenges 01 Semantic Complexity 02 Cost-efficiency 03 System Iteration Speed
11 .Semantic Complexity Vector Database Only Vector Database x Predicates Vector Database x Key-value Database Vector Database x Text Analytics Engine Vector Database x …
12 .Cost-efficiency Cost Another CAP to Consider ! Accuracy Performance
13 .ANNS : Recall vs. Performance
14 . System Iteration Speed Stone Age Bronze Age Iron Age Roman Times DB2 Sybase HBase Relational Model SQL CAP Magnetic Taps Oracle MySQL Cloud Native SIGMOD System R CODASYL IMS Ingres VLDB ER DBMSs for PC Informix PostgresSQL Snowflake Database 1970s 1980s 1990s Now Pytorch ONNX MATLAB GPU support Chainer CNTK Torch Tensorflow Caffe Theano Keras Caffe2 Frameworks OpenNN MXNet For AI Model 2010s 2016 Now Wearehere DB-plugin Faiss ES-plugin Milvus Proxima hnswlib Pinecone Frameworks Annoy For AI Data Now 2019
15 .The Milvus Solution
16 .The Basic Lessons Learned 01 Semantic Complexity à Do Not Try to Put Every Flavor Indexing/Analytics Capabilities in an Unified Framework! 02 Cost-efficiency à Open Architecture beats the Deep Customized Ones! 03 System Iteration Speed à Control Your System Complexity. Loosely Coupling Matters!
17 .The Basic Lessons Learned Bazaar beats Cathedral !
18 .The ‘Bazaar’Architecture
19 .The ‘Bazaar’Architecture Query Parser/Planner Search & Analytics Engines Vector Engine KV Engine Text Engine … Latent-semantic Data (Embedding Vectors) Raw Data (Structured/Unstructured Data)
20 .Logical Log as the System Backbone
21 .Logical Log as the System Backbone
22 .From 1.0 to 2.0: Decouple Persistent State and Functionality Milvus Functionalities Proxy, 状态的副本, DDL handling, DML handling, 易失存储 DQL handling Milvus TxnKV Msg KV Stream Meta Log Pub- Data sub etcd S3 状态, Pulsar ⾮易失存储 将系统状态的可靠性问题托管 MinIO 给成熟引擎,同时解决部署环 RocksDB 境差异化的问题。
23 .From 1.0 to 2.0: Decouple State and Stateless Milvus Functionalities Query Data Index Root Proxy, Coord Coord Coord Coord DDL handling, DML handling, DQL handling Proxy Query Data Index Node Node Node Node Msg TxnKV KV Stream 主要的负载尽可能以⽆状态的⽅式执⾏ Meta Log Pub- Data sub TxnKV Msg KV etcd S3 Stream Pulsar MinIO RocksDB Meta Log Data Pub-sub
24 .From 1.0 to 2.0: Decouple Functionality and Communication Query Data Index Root Query Data Index Root Coord Coord Coord Coord Coord Coord Coord Coord Proxy Query Data Index Node Node Node Node Proxy Query Data Index Node Node Node Node Msg TxnKV KV Msg Stream TxnKV KV Stream Meta Log Data Log Meta Data Pub-sub Pub-sub
25 .From 1.0 to 2.0: Decouple Functionality and Communication Query Data Index Root Query Data Index Root Coord Coord Coord Coord Coord Coord Coord Coord Proxy Query Data Index Node Node Node Node Proxy Query Data Index Node Node Node Node Msg TxnKV KV Msg Stream TxnKV KV Stream Meta Log Data Log Meta Data Pub-sub Pub-sub
26 .From 1.0 to 2.0: Decouple Functionality and Communication Query Data Index Root Query Data Index Root Coord Coord Coord Coord Coord Coord Coord Coord Proxy Query Data Index Node Node Node Node Proxy Query Data Index Node Node Node Node Msg TxnKV KV Msg Stream TxnKV KV Stream Meta Log Data Log Meta Data Pub-sub Pub-sub
27 .From 1.0 to 2.0: K8S-based component management Sidecar Sidecar Sidecar Sidecar Query Data Index Root Query Data Index Root Coord Coord Coord Coord Coord Coord Coord Coord Sidecar Sidecar Sidecar Sidecar Proxy Query Data Index Proxy Query Data Index Node Node Node Node Node Node Node Node Msg TxnKV KV Stream Msg TxnKV KV Stream Meta Log Data Pub-sub Meta Log Data Pub-sub k8s
28 .THANKS!