- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Deep dive#4 Milvus 数据写入与持久化
Deep dive是由Milvus社区发起的代码解析系列直播,针对开源数据库 Milvus 整体架构开放式解读,与社区交流与分享 Milvus 最核心的设计理念。
对本期内容感兴趣的小伙伴,想要和讲师实时QA,欢迎大家添加小助手微信:Zilliz-tech 备注”直播“加入讨论群与大家共同交流!
本期分享大纲:
- Milvus 2.0整体写入流程介绍
- 数据分配流程
- 文件结构和数据持久化
- QA
展开查看详情
1 .2021.09 Milvus Deep Dive #4 Data Insertion and Persistency
2 .Speaker bio Bingyi Sun Software Engineer • Milvus2.0 Development • KV Database/Distributed System • Reading/Bask etball/Computer Gaming
3 .Agenda • Milvus 2.0 Data Insertion Process Overview • Data Allocation • File Structure and Data Persistency • Q&A
4 .01 Data Insertion Process Overview
5 .Milvus Architecture Overview
6 .Data Insertion Related
7 .Prox y
8 .Data Flow Details
9 .DataCoord & DataNode
10 .Data Flow Details Collection Channels Assigned By DataCoord V1 DataNode V2 V3 DataNode V4
11 .RootCoord & Time Tick First read: 1, 2 Second read: 6, 7, 8
12 .02 Data Allocation
13 .Data organization
14 .Segment
15 .Channel Channels are assigned to DataNodes according to different strategies (eg. Consistent Hash ->)
16 .When
17 .How InsertRequest (CollectionID, PartitionID, Channel, NumOfRows) Check if there’s enough space to save that much rows. If so, we return a list of segments as response. Or we will open new segments. 1 request ßà 1 or n segments A segment’s max size is determined by ” segment.maxSize” in data_coord.yaml
18 .Data Ex piration Again, time tick plays a crucial role in inserting. A time tick in a channel means that all data before this time tick are sent to this channel We return a segment allocation with an expiration time T. Proxy can not use this allocation to insert data with time tick bigger T. By default, the expiration time of a single allocation is 2000ms which is defined by “segment.assignmentExpiration” in data_coord.yaml.
19 .When to Seal 1. 2. Receive a Flush Collection Request 3. A segment‘s lifetime is too long 4. Too much growing segments in a channel
20 .When to Flush DataNode will report the time tick of a channel to DataCoord. If the time tick received by DataCoord is larger than the time tick of a segment’s last allocation, the segment’s allocated space is released.
21 .Some Details 1. How to ensure that a segment will be flushed after all data is consumed? 2. How to ensure that no data will be written to a segment after the segment is flushed? 3. Is segment limited to a max size strictly? 4. How to estimate a segment’s max rows num? 5. What happens when users call “Flush” frequently? 6. How to ensure that no data will be consumed multi times after DataNodes restart? 7. When to create index?
22 .03 File Structure and Data Persistency
23 .DataNode Flush
24 .File Structure Binlog: 1. Restore Data; 2. Create Index
25 .Persistency 单击此处添加文本
26 .TODO 1. Delete By ID 2. Compaction to merge small segments and release space 3. Bulk load
27 .Thanks & QA 扫码加入直播交流群 关注 Milvus 视频号 与讲师实时QA 直播视频早知道