Regular automated backup of WordPress blog
For my old uboot blog I had a very simple backup strategy.
The whole blog was available as an RSS feed. (I.e. not just the latest entries: the feed included every article right back to the beginning.) I set up a crontab job on a friendly Linux machine to download that with “wget”, then commit the resulting file to my private Subversion repository.
A similar technique is available with Wordpress. It has an “export” feature which allows you to download the entire blog in an “extended” RSS format, including comments etc.
You have to be logged in to the Wordpress admin system to use the export. However “wget” takes the “–accept-cookies” parameter which takes a “cookies.txt” file. The documentation assumes you’ll be running your browser on the same machine so it expects you’ll have a “cookies.txt” file available. I didn’t but it was a small matter to take the “cookies.txt” file from my Windows machine – where Mozilla correctly stores the file under the “Documents and Settings” Windows directory – find the two lines for the cookies “wordpressuser_nnn” and “wordpresspass_nnn” (the “nnn” is a long hex string) and strip out the rest. “scp” across to the Linux machine and “wget” accepted the file fine.
A small change to my script now downloads this Wordpress export, and commits it into my Subversion as before.
There was the slight inelegance that Subversion only creates a new revision when the files being committed have actually have different content: and I like this feature. This worked fine with the Uboot RSS feed but the Wordpress export includes a comment “generated on <date/time to minute accuracy>”. So every night when the script runs a new revision would be created. No matter, I entered a line using “sed” to strip that out from the comment, committed it, then did a “svn diff” to check that really only that line had been changed.
So now my script looks like:
#!/usr/local/bin/bash # in crontab: # 0 4 * * * ~/private-svn/blog/backup/download-backup.sh cd ~/private-svn/blog/backup export SITE=http://www.databasesandlife.com/ export FILE=databasesandlife.wordpress.xml wget --quiet \ --output-document $FILE.tmp --load-cookies cookies.txt \ "$SITE/wp-admin/export.php?author=all&download=true&submit=xx" # the created="xx" attribute in a comment causes each download # to be a new commit; yet i only want actual changes to show # up in the subversion revision history sed 's/created="....-..-.. ..:.."-->/-->/' < $FILE.tmp > $FILE svn commit -m '* Automatic blog download from crontab' $FILE
I’d be interested to know if you considered a straight mysqldump ?
Yeah I did consider it.
a) I only have phpMyAdmin access to the database. Each time I have to log on afresh, even if I just close the tab on the browser. So I presume not only cookies but some kind of parameter passing is used to determine authentication. I didn’t really want to have to discover how that worked and make (perhaps multiple) wget calls be able to retrieve the data from phpMyAdmin.
b) I felt that exporting an “extended RSS” was cleaner, as I could maybe feed it into some other blog system in the future. If this one was down, maybe I’d use the opportunity to change blog software rather than just reinstall the same one. Or a different version of Wordpress. How standard this “extended RSS” format is, I haven’t determined. But if it’s reasonably standard then there are more options than with a database dump.
Hi there, I just published an entery on my blog on how to export wordpress data with bash and wget and thought I’d search around a little if anybody else was doing the same thing. (I too use this to check it into subversion)
I found a way to actually create the login cookie, so feel free to have a look.
http://itpoetry.wordpress.com/2008/02/07/bash-script-to-export-and-backup-some-data/
BR,
Hugo