Monthly Archives: October 2011

Redis-only migration

The best critique and praise for Heroku is that it’s opinionated: only deploy with Git, only use PostgreSQL. On the whole this is a good thing, because it simplifies the choices that developers have to make. Fewer choices means more time to spend on things that matter.

But at $200/month for a dedicated database, there’s a very real cost with PostgreSQL on Heroku. For a service like “Tweet Marker”:http://tweetmarker.net/ that is currently free, I wanted to save money on hosting and simplify things down to a single database. I might have preferred to streamline to PostgreSQL-only, since it’s rock solid and Heroku’s tools are excellent for managing backups and moving data around, but Resque (which Tweet Marker uses to process background work) is built on Redis. If I wanted to streamline to a single database, it had to be to Redis.

Tweet Marker as essentially an OAuth-wrapped key/value store anyway, so why burden the app with a full relational database? The handful of extra tables I had for tracking users and sessions could also be migrated to Redis. And even though it’s a memory database at heart, it is persistent, so I can use it for real data.

I conducted the transition in stages:

  • First, I was using Redis for stats. Because Redis has an atomic increment command, it’s trivial to use it to keep counters on various metrics: total number of reads/writes per hour, day, or month, broken down by different Twitter clients. This was my first use of Redis outside Resque tasks and it performed great.

  • Next, I moved the marker data. I wrote a script to dump all the data from PostgreSQL to Redis, and at the same time modified Tweet Marker to write data to both locations when requests came in. I let that run for about 2 days to confirm that the data was identical in both places before shutting PostgreSQL down and routing reads to Redis only.

  • Finally, I finished moving some additional tables to Redis so that I could completely shutdown PostgreSQL. Again, simple is better.

All of this was using Heroku’s “Redis To Go”:https://redistogo.com/ add-on. I love it because there’s no configuration; click a couple buttons and you have a server. But as soon as I was running full time on Redis To Go, I started to be concerned that I had given up a great amount of control over my data. While Redis To Go conveniently offers daily backup snapshots, it doesn’t appear to have any commands for restoring data or otherwise getting access to the server and Redis files. And though I could probably set up a separate Redis To Go instance and manually send it a slaveof command to turn it into a slave, not being able to edit the config file meant that if the server was rebooted I would lose the slave until it could be reset. Not good.

I also expected I would quickly outgrow the smaller Redis To Go plans, eventually forcing the monthly pricing up to the same level as I was paying for PostgreSQL, but with less control. I had to run my own database server instead. And because Heroku is all on Amazon EC2, it also had to be on EC2 for the best performance between web server and database.

I went to work provisioning an EC2 instance. Initially it ran as a Redis slave to the master on Redis To Go. After I felt comfortable with EC2 and had automated backups in place, I added a separate instance on Amazon for the master and switched the app over to it. The cost for 2 “micro” instances is about the same as I was paying for Redis To Go, but with much more available memory, hourly backups to S3, and access to the Redis aof log. And my total monthly hosting costs are about half what they were before.

I’m hoping to keep this setup for at least the rest of the year and well into next. As the Tweet Marker user base grows, I’ll bump up to higher-capacity EC2 instances. Heroku continues to be really great, so I don’t expect to change anything there.