- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Lessons from database failures
加入珀科纳的首席福音传道者科林·查尔斯,他介绍了“数据库故障的教训”。
MySQL在规模上的失败可以教会我们很多东西。它们可以引出关于高可用性(HA)、地理冗余和自动故障转移等主题的讨论。在本次网络研讨会中,科林将展示案例研究材料(自动故障转移如何导致Github离线,为什么Facebook使用辅助故障转移而不是完全自动故障转移,以及其他场景),以了解MySQL世界是如何使事情变得更好的。例如,一种方法是使用半同步复制来运行完全可扩展的服务。
网络研讨会将以一个明显的例子开始,说明由于不正确的MySQL备份过程,一个企业是如何死亡的。议程包括备份(和验证)、复制(和故障转移)和安全(和加密)。
网络研讨会将涵盖来自该领域的各种大的“失败鲸”问题,以及如何通过正确地构建解决方案来避免这些问题。
展开查看详情
1 .Lessons from database failures Colin Charles, Chief Evangelist, Percona Inc. colin.charles@percona.com / byte@bytebot.net http://www.bytebot.net/blog/ | @bytebot on Twitter Percona Webminar 18 January 2017
2 .whoami • Chief Evangelist (in the CTO office), Percona Inc • Focusing on the MySQL ecosystem (MySQL, Percona Server, MariaDB Server), as well as the MongoDB ecosystem (Percona Server for MongoDB) + 100% open source tools from Percona like Percona Monitoring & Management, Percona xtrabackup, Percona Toolkit, etc. • Founding team of MariaDB Server (2009-2016), previously at Monty Program Ab, merged with SkySQL Ab, now MariaDB Corporation • Formerly MySQL AB (exit: Sun Microsystems) • Past lives include Fedora Project (FESCO), OpenOffice.org • MySQL Community Contributor of the Year Award winner 2014
3 .Agenda • Backups (and verification) • Replication (and failover) • Security (and encryption)
4 .ma.gnolia.com
5 .ma.gnolia.com’s failure • January 30 2009: complete outage • February 17 2009: data corruption in the UDB, essentially dead • What happened? • Ruby on Rails on four self-hosted Mac Mini’s, a couple of XServe’s, 500GB+ MySQL 5 DB • Filesystem corruption, corrupted database backup • No versioning, didn’t check if the backups worked, made use of rsync to backup the database over Firewire network
6 .ma.gnolia.com today? • EC2 for the app with EBS snapshots, RDS with snapshots, Multi-AZ deployment • Self-hosted? • xtrabackup • START TRANSACTION WITH CONSISTENT SNAPSHOT + mysqldump —single-transaction —master-data • Backup a replica • Replication event checksums
7 .Couchsurfing, 2006
8 .Couchsurfing problems 1. major, avoidable hard drive crash 2. incremental backups weren’t executed in the correct manner, and twelve of our most important data files didn’t survive
9 .Time-delayed replication • MySQL 5.6+ has time-delayed replication. Stop replication when you know a mistake has happened before it propagates to all the slaves. • Feature suggestion since 2001! Bug reported August 2006 (mysql#21639). Pushed June 2010 (WL#344). GA February 2013.
10 .Why replicate? • Scale out • [automatic] (master) failover • Geographical redundancy across multiple data centres • Online schema changes
11 .Replication • Asynchronous (default) • (Enhanced loss-less) Semi-synchronous (plugin) • Synchronous (Galera, group replication, NDBCLUSTER) • DRBD
12 .Frameworks • MySQL-MMM • Percona Replication Manager • Severalnines ClusterControl (https://github.com/percona/ percona-pacemaker-agents/) • Orchestrator • Replication Manager • MySQL MHA (github.com/tanji/replication- • Tungsten Replicator manager) • 5.6+ utilities: mysqlfailover, mysqlrpladmin
13 .GitHub
14 .GitHub
15 .GitHub
16 .GitHub https://github.com/blog/1261-github-availability-this-week
17 .Fully automated failover a good idea? • False alarms • Repeated failover • Overloaded master? MHA doesn’t allow a failover within 8h, unless —last_failover_min=n is set • Data loss • id=103 latest, relay logs at id=101 => loss • group commit in the binary log • Split brain
18 .Proxies • MariaDB MaxScale • Popular use: load balancing Galera clusters • MySQL Router + MySQL Fabric • ProxySQL • Used alongside Galera clusters too • Included with Percona XtraDB Cluster 5.7
19 .
20 .Sharding • SPIDER • Tungsten Replicator • Tumblr JetPants
21 .Vitess • Servers & tools to scale MySQL for web written in Go • Has MariaDB support too (*) • Python client interface • DML annotation, connection pooling, shard management, workflow management, zero downtime restarts • Become super easy to use: http://vitess.io/ (with the help of Kubernetes)
22 .Failwhales • Twitter started on MySQL, and is still MySQL - you just need to “evolve” • Gizzard (sharding), Mesos + Apache Cotton • Digg started on MySQL, migrated to Cassandra, and came back to MySQL
23 .Security • Philippines voter data leave 55m at risk: 338GB MySQL dump • Ashley Madison: 6.9GB compressed dump, 36m email addresses leaked, 9.6m credit card transactions • Patreon: 13.7GB MySQL dump, 99 tables
24 .Mossack Fonseca: Panama Papers
25 .Prevent SQL injections • MariaDB MaxScale database firewall filter • Configurable filter actions on rule match (Allow the query, block the query or ignore the match), Logging of matching and/or non- matching queries • MySQL Enterprise firewall • ProxySQL
26 .Encryption at rest • MariaDB Server 10.1: table or tablespace encryption • design goal: Encrypt all user data that may touch the disk — InnoDB data, InnoDB logs, binary logs, temporary tables, temporary files • key management on the filesystem? [no key rotation] Amazon KMS? • caveats: mysqlbinlog needs work with encrypted binlogs; Galera Cluster gcache isn’t encrypted • MySQL 5.7: only encrypts InnoDB tablespaces (innodb_file_per_table; logs unencrypted)
27 .In conclusion… • Use semi-sync replication with a failover solution that ensures you don’t failover too often • Make good backups. Test them. Save them. • You’ll most definitely need to shard your data, use proven frameworks and get a proxy involved. Complete backups with multi- source replication when needed. • Use mysqldump and xtrabackup together (and mydumper for parallel backup/restore; mysqlpump) • Security is key: prevent SQL injections, encrypt your data at rest
28 .It’s 2016, you don’t want this…
29 .Percona Monitoring and Management (PMM) • http://pmmdemo.percona.com/