- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Deep dive#3 Milvus 接入层和主要数据处理流程
|Deep dive是由Milvus社区发起的代码解析系列直播,针对开源数据库 Milvus 整体架构开放式解读,与社区交流与分享 Milvus 最核心的设计理念。
对本期内容感兴趣的小伙伴,想要和讲师实时QA,欢迎大家添加小助手微信:Zilliz-tech 备注”直播“加入讨论群与大家共同交流!
本期分享大纲:
- Milvus2.0 系统架构回顾
- 代码组织结构分享
- 数据处理和读写请求流程
- Proxy代码模块介绍
- Q&A
展开查看详情
1 .2021.08 Milvus Deep Dive #3 Access Layer and Major Data Processing Flow Cao Zhenshan zhenshan.cao@zilliz.com
2 .About me • Zilliz Senior Software Engineer • Education: Master, Huazhong University of Science and Technology • Interests: Databases, Distributed systems, Spatio-Temporal data analysis and processing
3 . 01 Milvus Architecture Overview 02 Code Organization C O N T E N T S 03 Major Data Processing Flow 04 Access Layer Code
4 .Milvus Architecture Overview
5 .Log As Data State machine Replication Principle Log is all you need to restore system state Log is a append only time sequence Next 1st Record Record t0 t1 t2 t3 t4 t5 now
6 .Log Sequence Pub-sub as System Backbone Distributed log on a pub-sub systems Ø Disaggregate Log and database, make failure recovery easy and fast Ø Guarantee data durability Ø Make System extendable Ø Reduce system complexity
7 .Incremental + Historical Relying only on log stream for reads is not practical (too slow) Periodically backfill history data to segments and handoff growing segments to historical. Time tick Time tick Window 1 Window 2 1 2 6 5 8 9 10
8 .Micro Service Style Disaggregate storage and computation Scale independently Reduce downtime through fault isolation Easier to understand code and debugging
9 .Architecture
10 .Code Organization
11 .Languages Golang as the distributed layer development language C++ as the engine layer language 120,000 lines of Golang 80,000 lines of C++
12 .Source Code Tree - Go Directories . Project Directory /cmd: ├── Makefile Main applications for this project ├── cmd ├── configs /internal: ├── docs Private application and library code ├── go.mod ├── internal ├── go.sum ├── ruleguard.rules.go ├── scripts ├── tests └── tools
13 .Source Code Tree . (internal) ├── allocator Remote Procedure Call ├── core ├── datacoord Local Procedure Call ├── datanode ├── indexcoord . (distributed) Share the same functionality ├── indexnode ├── datacoord ├── kv ├── datanode ├── log ├── indexcoord ├── metrics ├── indexnode ├── distributed ├── proxy ├── msgstream ├── querycoord ├── proto ├── querynode ├── proxy └── rootcoord ├── querycoord ├── querynode ├── rootcoord ├── storage ├── tso ├── types └── util
14 .Decouple Functionality and Communication Query Data Index Root Coord Coord Coord Coord Proxy Query Data Index Node Node Node Node Msg TxnKV KV Stream Meta Log Data Pub-sub
15 .Decouple Functionality and Communication Query Data Index Root Coord Coord Coord Coord Proxy Query Data Index Node Node Node Node Msg TxnKV KV Stream Meta Log Data Pub-sub
16 .Source Code Tree - Packages and Directories allocator : global unique id and local buffered id core : vector/scalar search engine kv : kv interface and implementations tso : timestamp allocator msgstream: MsgStream interface and implementations metrices : monitor logic storage : data management types : type definitions util log proto
17 .Major Data Processing Flow
18 .Data Model
19 .MsgStream Interface type MsgStream interface { Start() Close() AsProducer(channels []string) AsConsumer(channels []string, subName string) SetRepackFunc(repackFunc RepackFunc) Produce(*MsgPack) error Broadcast(*MsgPack) error Consume() *MsgPack Chan() <-chan *MsgPack Seek(offset []*MsgPosition) error }
20 .Write Path DmChannels Save Binlog Files Collections may share physical channels Notify DataCoord
21 .Write Path - Flowgraph Flowgraph to filter collection data
22 .Write Path – MsgStream Creation When to create MsgStream
23 .Read Path DqRequestChannels DqResultChannel
24 .Read Path - Flowgraph Same to write path
25 .Read Path – MsgStream Creation Triggered by load operation
26 .Read Path Merge to maintain data completeness
27 .DDL Flow Data Definition Language Ordered serial execution by timestamp
28 .Index Building - IndexState
29 .Index Building - IndexCoord