- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
1.现代存储架构下的系统优化实践-卢冕
随着存储技术的发展,现代存储架构呈现出了前所未有的分级复杂性和功能上的革命(比如内存数据持久化),因此也对上层软件如何优化提出了新的挑战。本次分享基于我们的实践经验,演示如何在现代存储架构下,通过创新性的技术,比如持久化内存数据结构、分级存储、冷热数据分离等技术进行系统优化,以利用现代存储架构的特性。
卢冕,第四范式系统架构师
展开查看详情
1 . 基于现代存储架构的 系统优化实践 卢冕 博士 第四范式 21 Aug 2021 第四范式(北京)技术有限公司 Copyright ©2021 4Paradigm All Rights Reserved.
2 .Outline • 现代存储架构 • 特征工程数据库在 PMem 上的优化 • 基于分级存储架构的 Kafka 优化 • MemArk 技术社区 Copyright ©2021 4Paradigm All Rights Reserved. 2
3 . 现代存储架构 第四范式(北京)技术有限公司 Copyright ©2021 4Paradigm All Rights Reserved.
4 .代表性存储技术发展时间线 磁带 机械硬盘 CD-ROM 软盘 NAND Flash DVD SD Card 1951 1956 1965 1971 1989 1996 1999 外存 傲腾持久内存 B+tree Oracle NTFS 3D XPoint 2018 1973 1979 1993 2015 非易失性内存 LRU algorithm PMDK 1993 2014 磁芯存储器 SRAM DRAM DDR/GDDR DDR 4 HBM2 GDDR 6 1947 1961 1966 1998 2011 2016 2018 内存 Copyright ©2021 4Paradigm All Rights Reserved. CUDA 4 2007
5 .现代储架构金字塔 寄 存 器 L1/L2 cache DRAM (DDR/GDDR/...) 复杂的 非易失性内存 革命性的 分级存储架构 (傲腾持久内存) 内存数据持久化 Optane SSD 3D NAND SSD HDD Copyright ©2021 4Paradigm All Rights Reserved. 5
6 .非易失性内存 (Non-Volatile Memory) • 英特尔® 傲腾™ 持久内存 (PMem) – 大容量 – 低成本 – 高性能持久化 • 内存模式(Memory Mode) • 应用直接读写模式(App Direct Mode) Copyright ©2021 4Paradigm All Rights Reserved. 6
7 .现代内存架构优势与挑战 分级存储架构 内存数据持久化 特点 特点 • 丰富的存储层级 • 内存数据持久化 优势 优势 • 容量、性能、成本的权衡 • 内存数据快速恢复 挑战 挑战 • 分级存储、分级持久化、分级缓存算法 • 持久化编程模型 Copyright ©2021 4Paradigm All Rights Reserved. 传统软件的系统优化 7
8 .PMem – 学术界热潮、工业界厚积薄发 Kioxia 4Paradigm Alibaba Two sessions, 8 papers! Copyright ©2021 4Paradigm All Rights Reserved. 8
9 .第四范式在现代存储架构上的技术实践 • 对内:面向 AI 全流程的优化 – 持久内存 Memory Mode,低成本内存扩容 – HyperPS: 针对推理的参数服务器 • 对内 + 对外:开源工作 memark.io: 现代存储架构技术社区 – 【已开源】PmemStore: 针对人工智能负载优化的存储引擎 - VLDB 2021, 持久化跳表即将合入 PMDK 核心库 libpmemobj-cpp - https://github.com/4paradigm/pmemstore – 【已开源】Pafka v0.1.1: Kafka 优化版本 - 10x 性能提升,0 代码修改 - https://github.com/4paradigm/pafka – 【9月开源】OpenEmbedding: 训练参数服务器,和 TensorFlow 整合 – 【2021 Q4 开源】HyperVec: 基于PMem的向量搜索库 – 【2021 Q4 开源】Elasticsearch 的分级存储优化版本 – 【2022 H1 开源】 PmemStore 驱动的内存工程数据库 OpenMLDB - OpenMLDB (DRAM version): https://github.com/4paradigm/OpenMLDB/ Copyright ©2021 4Paradigm All Rights Reserved. 9
10 . 面向 AI 全流程的异构存储优化 持久内存 AD MODE 优化模块 持久内存 MEM MODE 优化模块 在线推理流程 User/APP/Web 离线探索/自学习流程 预测结果 预测请求 行为反馈 HyperPS:参数服务器 OpenMLDB:特征数据库 Pafka:消息队列 在线推理 实时特征抽取 消息队列 (模型) (特征工程脚本) 在线推理 离线探索/自学习 自动模型训练 自动特征工程 内存扩容 内存扩容 离线探索 Copyright ©2021 4Paradigm All Rights Reserved. 10
11 . 特征工程数据库在 PMem 上的优化 第四范式(北京)技术有限公司 Copyright ©2021 4Paradigm All Rights Reserved.
12 .What is On-line Decision Augmentation (OLDA) Finance Media Retail OLDA has been widely used in many applications: • Real-time fraud detection • Personalized recommendation Medical Internet Energy • Production forecast ... Copyright ©2021 4Paradigm All Rights Reserved. 12
13 . Strict time constraint OLDA Workflow History Data Response Applications Trained Prediction Model / Score Data Warehouse Train Original Record Feature Validate Feature Get Extraction Structured Engineering Test Parameter Data In-memory Selected Database Model Features Matrix Features Vector Structured Transactional Database Raw Data OpenMLDB Off-line Training On-line Inference Copyright ©2021 4Paradigm All Rights Reserved. 13
14 . The Characteristics of the Feature Extraction Card ID Date Time Amount Types of Currency POS Info ... A New Transaction 9527xxxxxx 20200702 13:26 124.8 USD 8880xxxxxx ... /Purchasing Record Card Info: Card Level, Activation Date . . . Shop Info: Shop ID, Type, City, Country, . . . Basic Account Info: Card Number, Current Balance . . . Features (hundreds of) Pattern of the Transaction Time: Pattern of Visited Shops: • The top 3 most frequent transactions • The top 3 shops that most frequently occurred in the past 10s, 1 min, 5 mins and 10 mins. appear in the last 10s, 1 min, 5 min and 10 mins. Real-time Features . . . . . . (thousands of) Copyright ©2021 4Paradigm All Rights Reserved. 14
15 . The Challenges of the Feature Extraction Existing in-memory database can not meet the time constraint 1. Most of the real-time features can not be pre- extracted. Strict timing constraint 2. Most of the real-time features are computed over multiple time windows. (<tens of milliseconds) 3. A large number of real-time features extracted for each online inference prediction. Copyright ©2021 4Paradigm All Rights Reserved. 15
16 .Feature Engineering Database (OpenMDLB) • FEQL Engine • FEQL • a SQL-like language • LLVM-based execution engine • Storage Engine • High availability • Double-layered skiplist: optimized data structure for “time-windowed real- time feature extraction” * OpenMLDB is renamed from FEBD, which is the original name used in the paper Copyright ©2021 4Paradigm All Rights Reserved. 16
17 .FEQL FEQL Copyright ©2021 4Paradigm All Rights Reserved. 17
18 .Double-layered Skiplist In-memory Double-layered Skiplist Copyright ©2021 4Paradigm All Rights Reserved. 18
19 .Comparing DRAM-based OpenMLDB with Existing Databases Micro Benchmark: Performance of Extracting Real-time Features over Varied Number of Time Windows Copyright ©2021 4Paradigm All Rights Reserved. 19
20 .Limitation of DRAM-based OpenMLDB 1. Rapid data growth (~10 TB) + limited capacity of the DRAM High hardware cost 2. Sync data from volatile DRAM to HDDs/SSDs Long tail latency 3. Reload data from backup storage to DRAM after failure Long recovery time Copyright ©2021 4Paradigm All Rights Reserved. 20
21 .Explore Different Ways of Using PMem in OpenMLDB Copyright ©2021 4Paradigm All Rights Reserved. 21
22 .Challenge of Compare-And-Swap on PMem Power OFF Thread 1 CAS (ADD T1) Flush(T1) Thread 2 Compute Features Flush(F1) F1 based on T1 Time Data inconsistency after recovery T1 not exist F1 exist Copyright ©2021 4Paradigm All Rights Reserved. 22
23 .In-memory Double-layered Persistent Skiplist PCAS Persistent-Compare-And-Swap Low overhead consistency guarantee without lock: 1. Flush-on-read 2. Smart pointer Copyright ©2021 4Paradigm All Rights Reserved. 23
24 .Evaluating FEDB on Real-world Fraud Detection Real-world Workload Latency under Real-world Workloads On Different DBMS 37X ~ 610X faster DBMS Configurations Copyright ©2021 4Paradigm All Rights Reserved. 24
25 .OpenMLDB: DRAM-based vs PMem-based Reduce ~20% Reduce 99.7% Save 58.4% of long tail latency of recovery time of total cost Copyright ©2021 4Paradigm All Rights Reserved. 25
26 . 基于分级存储架构的 Kafka 优化 第四范式(北京)技术有限公司 Copyright ©2021 4Paradigm All Rights Reserved.
27 .消息队列系统:Kafka ➔ Pafka Kafka • 广泛的应用场景:消息传输,日志搜集,流处 理…… • 高性能、高可扩展、高可用 • 性能受限于存储设备(HDD/SSD) Pafka (PMem Accelerated Kafka) • 基于 Kafka 架构和 APIs, 业务代码迁移零成本 • 使用持久内存打破 Kafka 性能瓶颈 • 大幅提升单节点吞吐,降低硬件成本达 10 倍 Copyright ©2021 4Paradigm All Rights Reserved. 27
28 .Pafka 架构 • 基于 Kafka 高可扩展性架构 • 通过 PMDK ,将 segment 的存储 从 HDD/SSD 赋能为可在 PMem 上 持久化 • 引入 MixChannel 概念,实现 PMem 和 次级外存的分级持久化 • 引入后台数据迁移策略,冷热数据 分开存储(v0.2.0 开发中) Copyright ©2021 4Paradigm All Rights Reserved. 28
29 .性能测试结果 (v0.1.x) 单节点吞吐(越大越好) 延迟(越小越好) 相比较于数据中心常用的 SATA SSD,Pafka 在吞吐和延迟上均可以达到将近 20x 的性能改善 Copyright ©2021 4Paradigm All Rights Reserved. 29