- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
AliHB Real-time cold data backup
展开查看详情
1 .AliHB Real-Time Cold data Backup 孟庆义(mengqingyi)
2 .⽬目录 Content 01 HBase Backup State Alibaba’s requirements 02 on Backup AliHB Real-Time Cold 03 data Bakcup 04 Future works
3 . HBase Backup State Against Against User Hardware Application RPO RTO failure error Snapshot NO YES N/A N/A Replication YES NO seconds seconds Increase with HBase Backup Restore YES YES minutes data size AliHB Real-Time cold data backup YES YES seconds minutes
4 .Alibaba’s requirements for Backup • RPO < 1minutes • Predictable RTO for PB scale data • Low Cost • NO affect on Online service • Easy Management
5 .AliHB Real-Time Cold data Backup • Real-Time incremental backup • Independent with HBase - No need for snapshot • Stateless worker node • Backup in heterogeneous Storage maintained by another team
6 .Backup Overview Backup Cluster Source Cluster Target Cluster(pangu) Full backup HFile HFile HFile HFile HFile HFile Region Copy Increment backup Log Log Log Log Log Log Log Tracker Log Copy
7 .Full Backup • Job copy for a table • Task copy for a region • Challenge: region’s file list keep changing - Compaction remove old files - Split remove the entire region - Merge remove the entire region
8 .Compaction • At first we have file 1,2,3,4,5 • When copy 4, found it missing • Refresh list we have 1,2,6 • Copy 6 Copy File 1 2 3 4 5 Compaction 1 2 6
9 .Split • We are the parent region - Found region missing, reload meta and resubmit tasks • We are the child region - Copy the reference file and it’s original file - If referenced file missing, refresh the file list and continue • Merge works like split
10 .Algorithm start No Yes All files Select next File copied Copy file file exist? ? Yes No Yes Refresh Region file list exist? No Reload meta end and submit new task
11 .Incremental Backup Source Cluster Backup Cluster Register new log <logName, state, offset> Log Zookeeper Tracker HBase Scan logs Copy log HDFS Worker Worker Worker Latency < 10 seconds
12 .Log Lifecycle • Writing - Log Tracker period scan and find new logs • Closed - If not the latest log of the region server or in the “.oldlogs” • Finished - If worker has copied the whole closed Log • Deleted - If Log Tracker can not find it in HBase and it’s finished on backup, then delete the log record on backup system
13 .Data Consistence • Full comparison - Do sample comparison - Sample on every region - Balanced sample, use index of the largest file for each region • Incremental comparison - Compare recent logs
14 .Restore Scenes • Cluster Level - Restore the whole cluster • Table Level - Restore one or list of tables • Region Level - Restore ranged data of some table • Restore to given time point
15 .Restore Tools • Bulkload the full backup - Filter hfiles by table name and range • Use LogRestore tool to restore logs - Filter by table name - Filter by range - Filter by timestamp
16 .Restore Runtime • HFiles - Split by region, one region one task • Logs Restore Manager - Each log is a task Submit tasks Bulkload Worker Worker Worker LogRestore
17 .Real-Time Cold data Backup Master Log Backup Tracker Manager Data Restore Cleaner Manager Worker Worker Worker Copy Copy Log Region Log Bulkload Restore
18 .WEB UI
19 .Performance Backup System 200Nodes 110TB data backup 22minutes Restore 53minutes HBase 377Nodes
20 .Conclusion • AliHB Real-time Cold data backup - Realtime incremental backup keep the latency in seconds - Scale out ability to obtain more power on restore - Use less resources on normal backup - Independent with HBase, easy to deploy and upgrade
21 .Future works • Incremental Restore - Recognize Hot / Cold Data - Resume the hbase service after Restore hot data - Access the cold data through reference file - Background restore cold data • Put log lifecycle manage on HBase - Period scan on .oldlogs cause pressure on NN - Keep only the necessary logs on zookeeper • Compact hlogs to Hfile - Save storage space - Speed up restore
22 .谢谢观看 Thanks
23 .
24 .