Wednesday, April 20, 2011

DB Sharding

This is not new and its in practice for close to a decade. Goal of this posting is to share how we implemented it in Amazon cloud.

When we started development we evaluated NOSQL databases before using DB Sharding. Due to lack of internal expertise, time and resource constraints we decided to go with traditional approach of sharding. We did move to Cassandra in iteration 2 and threw away all the hard work of iteration 1.

We are an Ad Network, currently serving daily 10-20 million ad impressions  for 2-3 million unique users.  We have no in-house DBA and wanted our db to be elastically scalable with minimum administrative efforts. Amazon RDS is a wonderful service if you are looking for minimal upfront cost, simplified DB management, zero DB software license cost and scalability.

Firstly,  we came up with a logic on how we create  cookie id's (primary key used for sharding) based on the region, number of RDS servers in the region and number of databases in each server.  We made concrete decision to sacrifice certain user cookie data when they get connected to different region. This will happen only when user travels from coast to coast.

Secondly, based on our privacy policy cookie data is retained only for 6 months. We decided to use it to our advantage and used MySQL table partition to store the cookie created in a given month in one physical file. We had a maintenance job that runs every month to reset the data file for six month old data on 8th month. Viola! Millions of records that are not needed are deleted in seconds  like magic without any costly delete or truncate operation.

Thirdly, we implemented our own Zend_DB_Table class that took care of connecting to correct RDS server, correct DB within the server based on cookie id depending on read or write operation.

Fourth, we used Multi-AZ RDS deployment to get the built-in failover protection. We never tried to see how it works and were fortunate and lucky enough quickly switched to Cassandra for OLTP.

Fifth, we had two write master RDS instances (one in each region) and two read replica's (one in each region) and each write RDS had 8 instances of our cookie database. This really helped us in scale and performance. All the reads and writes were screaming fast and never experienced dead-locks/performance issues.

With this solution we managed our explosive DB growth easily with no DBA and zero administrative costs. So far I have shared only good things about Amazon MySQL RDS and  will share my negative thoughts/limitations when I blog about our iteration 2 architecture.

Next post thinking about sharing our experience with Amazon cloud in general.

Hope you enjoyed it!


No comments:

Post a Comment