• Backups come first. You can’t recover unless you’ve first backed up, so your
attention naturally focuses on backups when building a system. It’s important to
counter this tendency by planning for recovery first. In fact, you shouldn’t build
your backup systems until you figure out your recovery requirements.
• Backups are routine. This makes you focus on automating and fine-tuning the
backup process, often without thinking of it. Five-minute tweaks to your backup
process may not seem important, but are you applying the same attention to
recovery, day in and day out? You should intentionally practice your recovery
procedure until it is as smooth and bug-free as your backup process.
• Backups aren’t usually made under extreme pressure, but recovery is usually a
crisis situation. It’s hard to overstate how important this is.
• Security often gets in the way. If you’re doing offsite backups, you’re probably
encrypting the backup data or taking other measures to protect it. It’s easy to
focus on how damaging it would be for your data to be compromised, and lose
sight of how damaging it is when nobody can unlock your encrypted volume to
recover your data—or when you need to extract a single file from a monolithic
encrypted file.
• One person can plan, design, and implement backups, especially with the excel-
lent tools available. That person might not be available when disaster strikes.
You need to train several people and plan for coverage, so you’re not asking an
unqualified person to recover your data.
Here are some of the things you should consider:
• How much data can you lose without serious consequences? Do you need point-
in-time recovery, or is it acceptable to lose whatever work has happened since
your last regular backup? Are there legal requirements?
• How fast does recovery have to be? What kind of downtime is acceptable? What
impacts (e.g., partial unavailability) can your application and users accept, and
how will you build in the capability to continue functioning when those scenar-
ios happen?
• What do you need to recover? Common requirements are to recover a whole
server, a single database, a single table, or just specific transactions or statements.
I Use Replication As a Backup.”
This is a mistake we see quite often. A replication slave is not a backup. Neither is a
RAID array. To see why, consider this: will they help you get back all your data if you
accidentally execute DROP DATABASE on your production database? RAID and replication
don’t pass even this simple test. Not only are they not backups, they’re not a substitute
for backups. Nothing but backups fill the need for backups.
The Big Picture
• Raw backups are practically a must-have for large databases: they’re fast, which
is very important. Snapshot-based backups are our favorite, but InnoDB Hot
Backup is a good alternative if you use only InnoDB tables.
• Back up your binary logs for point-in-time recovery.
• Keep several backup generations, and keep binary log files long enough that you
can restore from them.
• Test your backups and recovery process periodically by going through the entire
recovery process.
• Create logical backups (probably from the raw backups, for efficiency) periodi-
cally. Make sure you keep enough binary logs to recover from your last logical
backup.
• Test your raw backups, if possible, to make sure they’re useful for recovery. If
you can, test them during the backup process, before you copy them to the
destination.
• Think hard about security. What happens if someone compromises your
server—can he then get access to the backup server too, or vice versa?
• Monitor your backups and backup processes independently from the backup
tools themselves. You need external verification that they’re OK.
• Be smart about how you copy files between machines. There are much more effi-
cient ways to copy files than scp or rsync.
Why Backups?
Disaster recovery
Disaster recovery is what you do when hardware fails, a nasty bug corrupts your
data, or your server and its data become unavailable or unusable for some other
reason (the potential reasons are many and varied—use your imagination).
Although the odds of any particular disaster striking are fairly low, taken
together they add up. You need to be ready for everything from someone acci-
dentally connecting to the wrong server to type ALTER TABLE,* to the building
burning down, to a malicious attacker or a MySQL bug.
People changing their minds
You’d be surprised how often we’ve seen the need to recover at least some data
as it existed at some point. For some applications, this might happen even more
often than disasters (for example, if an important customer intentionally deletes
some data and then wants it back).
Auditing
Sometimes you need to know what your data or schema looked like at some
point in the past. You might be involved in a lawsuit, for example, or you might
discover a bug in your application and need to see what the code used to do
(sometimes just having your code in version control isn’t enough).
Testing
One of the easiest ways to test on realistic data is to periodically refresh a test
server with the latest production data. If you’re making backups, it’s easy; just
use the backup.
Online or Offline Backups?
If you can get away with it, shutting down MySQL to make a backup is the easiest,
safest, and overall best way to get a consistent copy of the data with minimal risk of
corruption or inconsistency.
However, taking a server offline is more expensive than it might seem. Even if you
can minimize the downtime, shutting down and restarting MySQL can take a long
time under demanding loads and high data volumes.
Logical backups
• They’re normal files that you can manipulate and inspect with editors and
command-line tools such as grep and sed. This can be very helpful when restor-
ing data, or when you just want to inspect the data without restoring.
• They’re easy to restore. You can just pipe them into mysql or use mysqlimport.
• You can back up and restore across the network—that is, on a different machine
from the MySQL host.
• They can be very flexible because mysqldump—the tool most people prefer to
use to make them†—can accept lots of options, such as a WHERE clause to restrict
what rows are backed up.
• They’re storage engine-independent. Because you create them by extracting data
from the MySQL server, they abstract away differences in the underlying data
storage. Thus, you can back up from InnoDB tables and restore to MyISAM
tables with very little work. You can’t do this with raw file copies.
• If you specify the right options to mysqldump, in many cases, you can even
import your logical backups into another database server, such as PostgreSQL.
• They can help avoid data corruption. If your disk drives are failing and you copy
the raw files, you’ll make a corrupt backup, and unless you check the backup,
you won’t notice it and it’ll be unusable later. If the data MySQL has in memory
is not corrupt, you can sometimes get a trustworthy logical backup when you
can’t get a good raw file copy.
Logical backups have their shortcomings, though:
• The server has to do the work of generating them, so they use more CPU cycles.
• Logical backups can be bigger than the underlying files in some cases.* The
ASCII representation of the data isn’t always as efficient as the way the storage
engine stores the data. For example, an integer requires 4 bytes to store, but
when written in ASCII, it can require up to 12 characters. You can often com-
press the files effectively, but this uses more CPU resources.
• The loss of precision in floating-point representations may prevent accurate res-
toration from dump files. (Google’s patches to the MySQL server include a patch
to mysqldump that works around this.)
• Restoring from a logical backup requires MySQL to load and interpret the state-
ments and rebuild indexes, which piles more work onto the server.
The biggest disadvantages are really the cost of dumping the data from MySQL and
the cost of loading data back in via SQL statements.
Here are the main problem areas:
Schema and data stored together
Although this is convenient if you want to restore from a single file, it makes
things difficult if you need to restore only one table, or want to restore only the
data. You can alleviate this concern by dumping twice—once for data, once for
schema—but you’ll still have the next problem.
Huge SQL statements
It’s a lot of work for the server to parse and execute all of the SQL statements.
This is a relatively slow way to load data.
A single huge file
Most text editors can’t edit large files or files with very long lines. Although you
can sometimes use command-line stream editors—such as sed or grep—to pull
out the data you need, it’s preferable to keep the files small.
Logical backups are expensive
There are more efficient ways to get data out of MySQL than sending it over the
client/server protocol as a result set.
These limitations mean that SQL dumps quickly become unusable as tables get large.
There’s another option, though: export data to delimited files.
Logical backups produced by mysqldump are not always text files. SQL dumps can contain many different
character sets, and can even include binary data that’s not valid character data at all. Lines can be too long
for many editors, too. Still, many such files will contain data a text editor can open and read, especially if you
run mysqldump with the –hex-blob option.
Raw backups
Raw backups have the following advantages:
• Raw file backups simply require you to copy the desired files somewhere else for
backup. The raw files don’t require any extra work to generate.
• Restoring raw backups can be simpler, depending on the storage engine. For
MyISAM, it can be as easy as just copying the files into their destinations.
InnoDB, however, requires you to stop the server and possibly take other steps
as well.
• Raw backups are generally pretty portable across platforms, operating systems,
and MySQL versions.
• It can be faster to restore raw backups, because the MySQL server doesn’t have
to execute any SQL or build indexes. If you have InnoDB tables that don’t fit
entirely in the server’s memory, it can be much faster to restore raw files.
Here are some disadvantages of raw backups:
• InnoDB’s raw files are often far larger than the corresponding logical backups.
The InnoDB tablespace typically has lots of unused space. Quite a bit of space is
also used for purposes other than storing table data (the insert buffer, the roll-
back segment, and so on).
• Raw backups are not always portable across platforms, operating systems, and
MySQL versions. Filename case sensitivity and floating-point formats are places
where you might encounter trouble. You might not be able to move files to a sys-
tem whose floating-point format is different (however, the vast majority of pro-
cessors use the IEEE floating-point format).
What to Back Up
Nonobvious data
Don’t forget data that’s easy to overlook: your binary logs and InnoDB transac-
tion logs, for example.
Code
A modern MySQL server can store a lot of code, such as triggers and stored pro-
cedures. If you back up the mysql database, you’ll back up much of this code,
but then it will be hard to restore a single database in its entirety, because some
of the “data” in that database, such as stored procedures, will actually be stored
in the mysql database.
Replication configuration
If you are recovering to a server that is involved in replication, you should
include in your backups whatever replication files you’ll need for that—e.g.,
binary logs, relay logs, log index files, and the .info files. At a minimum, you
should include the output of SHOW MASTER STATUS and/or SHOW SLAVE STATUS. It’s
also helpful to issue FLUSH LOGS so MySQL starts a new binary log. It’s easier to
do point-in-time recovery from the beginning of a log file than the middle.
Server configuration
If you have to recover from a real disaster—say, if you’re building a server from
scratch in a new data center after an earthquake—you’ll appreciate having the
server’s configuration files included in the backup.
Selected operating system files
As with the server configuration, it’s important to back up any external configu-
ration that is essential to a production server. On a Unix server, this might
include your cron jobs, user and group configurations, administrative scripts,
and sudo rules.
Incremental backups
A common strategy for dealing with too much data is to do regular incremental
backups. Here are some ideas:
• Back up your binary logs. This is the simplest, the most widely used, and overall
the best way to make incremental backups.
• Don’t back up tables that haven’t changed. Some storage engines, such as
MyISAM, record the last time each table was modified. You can see these times
by inspecting the files on disk or by running SHOW TABLE STATUS. If you use
InnoDB, a trigger can help you keep track of the last changes by recording the
change times in a small “last changed time” table. You need to do this only on
tables that change infrequently, so the cost should be minimal. A custom backup
script can easily determine which tables have changed.
If you have “lookup” tables that contain data such as lists of month names in
various languages or abbreviations for states or regions, it can be a good idea to
place them into a separate database, so you don’t have to back them up all the
time.
• Don’t back up rows that haven’t changed. If a table is INSERT-only, such as a
table that logs hits to a web page, you can add a TIMESTAMP column and back up
only rows that have been inserted since the last backup. You can also use the
Merge storage engine to good effect so that older data lives in static tables.
• Don’t back up some data at all. Sometimes this makes a lot of sense—for exam-
ple, if you have a data warehouse that’s built from other data and is technically
redundant, you can merely back up the data you used to build the warehouse,
instead of the data warehouse itself. This can be a good idea even if it’s very slow
to “recover” by rebuilding the warehouse from the original files. Avoiding the
backups can add up over time to much greater savings than the potentially faster
recovery time you’ll gain by having a full backup. You can also opt not to back
up some temporary data, such as tables that hold web site session data.
• Back up just the changes to the binary log. You can use rdiff to get binary deltas
of your binlogs and back up just the changes made since the last backup (period-
ically doing a full backup as well). Another useful tool we’ve used is rdiff-
backup, which combines rdiff and rsync functionality into a complete backup
solution. Or you can just use FLUSH LOGS to begin a new binary log after each
backup, so you don’t need to do binary deltas at all.
• Back up just the changes to the data files. This is like backing up differences in
your binary logs. Common Unix tools for this purpose are, again, rdiff and rdiff-
backup. This strategy is helpful for extremely large databases that don’t change
much. Suppose you have a terabyte of data, only 50 GB of which changes every
day. It might be a good idea to back up binary differences daily, and just do a
full backup once in a while. The benefit is that you can apply the binary differ-
ences to the full backup in a sequential disk read/write operation much faster
than you can apply a binary log. The binary difference backup itself might be
slower than making a full backup, though.
The drawback of incremental backups is increased complexity during recovery. If
you have do recovery under stress, you’ll appreciate being able to restore just one
backup instead of having to apply incremental backups one after the other. If you
can do full backups, we suggest that you do so for simplicity’s sake.
Bin log
Here’s our recommended server configuration for binary logging:
log_bin = mysql-bin
sync_binlog = 1
innodb_support_xa = 1 # MySQL 5.0 and newer only
innodb_safe_binlog # MySQL 4.1 only, roughly equivalent to innodb_support_xa
Filesystem Snapshots
Filesystem snapshots are a great way to make online backups. Snapshot-capable file-
systems can create a consistent image of their contents at an instant in time, which
you can then use to make a backup. Snapshot-capable filesystems and appliances
include FreeBSD’s filesystem, the ZFS filesystem, GNU/Linux’s Logical Volume
Manager (LVM), and many SAN systems and file-storage solutions, such as NetApp
storage appliances.
Recovering from a Backup
How you recover your data depends on how you backed it up. You might need to
take some or all of the following steps:
• Stop the MySQL server.
• Take notes on the server’s configuration and file permissions.
• Move the data from the backup into the MySQL data directory.
• Make configuration changes.
• Change file permissions.
• Restart the server with limited access, and wait for it to start fully.
• Reload logical backup files.
• Examine and replay binary logs.
• Verify what you’ve restored.
• Restart the server with full access.
Limit access
start MySQL with the –skip-networking and –socket=/tmp/mysql_recover.sock
Loading SQL files
$ mysql < sakila-backup.sql
OR
mysql> SET SQL_LOG_BIN = 0;
mysql> SOURCE sakila-backup.sql;
mysql> SET SQL_LOG_BIN = 1;
OR
$ gunzip -c sakila-backup.sql.gz | mysql
What if you want to restore only a single table (for example, the actor table)? If your
data has no line breaks, it’s not hard to restore the data if the schema is already in
place:
$ grep 'INSERT INTO `actor`' sakila-backup.sql | mysql sakila
Or, if the file is compressed:
$ gunzip -c sakila-backup.sql.gz | grep 'INSERT INTO `actor`'| mysql sakila
so you’ll probably have to do some command-line
work. It’s easy to use grep to pull out only the INSERT statements for a given table, as
we did in the previous commands, but it’s harder to get the CREATE TABLE statement.
Here’s a sed script that extracts the paragraph you need:
$ sed -e '/./{H;$!d;}' -e 'x;/CREATE TABLE `actor`/!d;q' sakila-backup.sql
We disable normal connections by adding the following to the server’s my.cnf file
while we work:
skip-networking
socket=/tmp/mysql_recover.sock
Now it’s safe to start the server:
server1# /etc/init.d/mysql start
The next task is to find which statements in the binary log we want to replay, and
which we want to skip. As it happens, the server has created only one binary log
since the backup at midnight. We can examine this file with grep and find the
offending statement:
server1# mysqlbinlog –database=sakila /var/log/mysql/mysql-bin.000215 | grep -B
3 -i 'drop table sakila.payment'
# at 352
#070919 16:11:23 server id 1 end_log_pos 429 Query thread_id=16 exec_time=0
error_code=0
SET TIMESTAMP=1190232683/*!*/;
DROP TABLE sakila.payment/*!*/;
The statement we want to skip is at position 352 in the log file, and the next state-
ment is at position 429. We can replay the log up to position 352, and then from 429
on, with the following commands:
server1# mysqlbinlog –database=sakila /var/log/mysql/mysql-bin.000215
–stop-position=352 | mysql -uroot -p
server1# mysqlbinlog –database=sakila /var/log/mysql/mysql-bin.000215
–start-position=429 | mysql -uroot -p
mysqldump
$ mysqldump –host=server1 test t1 | mysql –host=server2 test
To make a logical backup of everything on a server to a single file:
$ mysqldump –all-databases > dump.sql
To make a logical backup of only the Sakila sample database:
$ mysqldump –databases sakila > dump.sql
To make a logical backup of only the sakila.actor table:
$ mysqldump sakila actor > dump.sql
You can use the –result-file option to specify an output file, which can help prevent
newline conversion on Windows:
$ mysqldump sakila actor –result-file=dump.sql
–opt
Enables a group of options that disable buffering (which could make your server
run out of memory), write more data into fewer SQL statements in the dump, so
they’ll load more efficiently, and do some other useful things. Read your ver-
sion’s help for the details. If you disable this group of options, mysqldump will
store each table you dump in its memory before writing it to the disk, which is
impractical for large tables.
–allow-keywords, –quote-names
Make it possible to dump and restore tables that use reserved words as names.
–complete-insert
Makes it possible to move data between tables that don’t have identical
columns.
–tz-utc
Makes it possible to move data between servers in different time zones.
–lock-all-tables
Uses FLUSH TABLES WITH READ LOCK to get a globally consistent backup.
–tab
Dumps files with SELECT INTO OUTFILE, which is very fast to dump and to restore.
–skip-extended-insert
Causes each row of data to have its own INSERT statement. This can help you
selectively restore certain rows if necessary. The cost is larger files that are more
expensive to import into MySQL; you should enable this only if you need it.
Zmanda Recovery Manager
Zmanda Recovery Manager for MySQL, or ZRM (http://www.zmanda.com), is the
most comprehensive of the backup and recovery tools we mention here. It comes in
both free (GPL) and commercial versions.
Install at Ubuntu
sudo apt-get install mailx
download file http://www.zmanda.com/downloads/community/ZRM-MySQL/2.1/Debian/mysql-rm_2.1_all.deb
Install it
sudo dpkg -i mysql-zrm_2.1_all.deb
Watch demo for configuration
http://www.zmanda.com/ZRM-3-Demo/ZRM-3-Demo.html
Quick start
http://www.zmanda.com/quick-mysql-backup.html