- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
PingCAP-Infra-Meetup-105-Chaos practicein TiDB
展开查看详情
1 .Chaos practice in TiDB PingCAP 舒科 2019 年 6 月 1 日
2 .Testings in TiDB ● Unit Testing ● Integration Testing ● Performance Testing ● Schroinger Testing ○ Chaos Testing
3 .content ● Why Chaos ● Practice in TiDB ● Schrodinger
4 .Why Chaos ● Use fault injection ● by Netflix 2010 ○ Break things ● Why 2010 ○ Netflix move to AWS ○ lots of errors ■ hardware ■ network latency ■ ...
5 .Why Chaos (cont.) ● Goal: ○ Make system stronger ● Steps
6 .Why Chaos (cont.) ● Samples ○ Chaos in EMC ■ Robot to remove harddisk in BMW POC ○ Chaos in Facebook ■ shutdown a data center ● lack of Chaos ○ 737 - max
7 .Why Chaos (cont.) ● Micro service ○ Too complex to understand ● Error always happens ● Do Chaos to Gain confidence
8 .Why Chaos (cont.)
9 .Why Chaos (cont.) ● ETCD bug ● RocksDB bug ● Leader partitioned ● Transfer leader if busy ● Too many regions ● Crashed when processing batch raft ● ...
10 .● Why Chaos ● Practice in TiDB ● Schrodinger
11 .Chaos practise in TiDB
12 .Chaos practise in TiDB (cont.) ● Region hearbeats ○ check what happened when huge number regions on a machine ○ choose metric: CPU ○ Hypothesize: ■ CPU is still low ○ Experiments ■ 40k regions on a machine ○ What happened? ■ OOM ■ 30% CPU occupied
13 .Chaos practise in TiDB (cont.) ● Choose Metrics ○ often QPS ○ CPU ○ memory ● Hypothesis ○ QPS revert to previous level in X seconds ○ QPS drop 1/x
14 .Chaos practise in TiDB (cont.) Error injection
15 .Chaos practise in TiDB (cont.) ● Applications ○ kill, kill -9 ○ renice ○ sigstop, sigcont
16 .Chaos practise in TiDB (cont.) ● Memory ○ cgroup ● Storage ○ fuse ○ rm -rf ● Network ○ tc ○ iptable
17 .Chaos practise in TiDB (cont.) ● other errors ○ ETCD key deleted ○ NTP errors ○ ...
18 .Chaos practise in TiDB (cont.) ● Observe results ○ Learn from history
19 .Chaos practise in TiDB (cont.) ● Observe results ○ Learn from log
20 .Chaos practise in TiDB (cont.) ● Automation ○ Take some machines from SRE ○ Deploy ○ Experiment ○ Debug ○ Return
21 .● Why Chaos ● Practice in TiDB ● Schrodinger
22 .Schrodinger
23 .Schrodinger (cont.)
24 .Schrodinger (cont.)
25 .Schrodinger (cont.)
26 .Schrodinger (cont.) cat
27 .Schrodinger (cont.)
28 .Schrodinger (cont.) Chaos Operator
29 .Schrodinger (cont.) Run with your own Helm charts