Wednesday, April 27, 2011

Cloud Provider - Amazon

Goal is not to talk about benefits of cloud,  just share our experiences with Amazon.

Good things first!

Management Interface: Amazon really stands out here with ease of use of tools; its well packaged, excellent  API and easy to use AWS management console.

AMI: Lot of community AMI's enables to quickly launch VM with required software pre-installed and configured with minimal effort. We used both Amazon's free AMI and third party AMI.

Security: Basic integrated firewall, you can easily setup basic security controls at the IP/Port level.

Database: Managed MySQL (RDS) works. Lot of DB management tasks like backup, monitoring replication etc are handled by Amazon support and we don't have to worry about it.

Functionality: You name it and Amazon has a solution for it. They are more like one-stop shop.

Backups: With the help of EBS and S3, backups are made easy.

Now the bad stuff!

1. Support: They have automated way too much and lacks human touch. It takes days if not weeks to get response from their sales support team. Also you need to enroll for Gold or Platinum level support to get phone support or quick turnaround and it can cost up to 10% of monthly bill.

2. Performance: I have lot to complain here:
  • Instance CPU lacks power compared to competition. Process that used to take 40-50% CPU on large instance in EC2 runs quietly on our new cloud partner's platform medium instance.
  • Inconsistent performance - At times we paid high price for load issues in Amazon's cloud. We had to closely keep eye on our Keynote monitoring and divert traffic between regions.
  • RDS - Write or read operation on average takes upto 80 - 100 milliseconds. When our goal is response time under 100ms we had to constantly redesign our db operations. This made us to migrate to Cassandra very quickly.
 3. Failures: We have been pretty happy until the recent mega crash in their east DC. Initially we weren't sure whether it's a temporary glitch or bigger problem and not sure on ETA. Fortunately, we were able to divert our traffic to N. California by updating our geo-load balancing rules.

Will share our experience with our new cloud provider later.


Wednesday, April 20, 2011

DB Sharding

This is not new and its in practice for close to a decade. Goal of this posting is to share how we implemented it in Amazon cloud.

When we started development we evaluated NOSQL databases before using DB Sharding. Due to lack of internal expertise, time and resource constraints we decided to go with traditional approach of sharding. We did move to Cassandra in iteration 2 and threw away all the hard work of iteration 1.

We are an Ad Network, currently serving daily 10-20 million ad impressions  for 2-3 million unique users.  We have no in-house DBA and wanted our db to be elastically scalable with minimum administrative efforts. Amazon RDS is a wonderful service if you are looking for minimal upfront cost, simplified DB management, zero DB software license cost and scalability.

Firstly,  we came up with a logic on how we create  cookie id's (primary key used for sharding) based on the region, number of RDS servers in the region and number of databases in each server.  We made concrete decision to sacrifice certain user cookie data when they get connected to different region. This will happen only when user travels from coast to coast.

Secondly, based on our privacy policy cookie data is retained only for 6 months. We decided to use it to our advantage and used MySQL table partition to store the cookie created in a given month in one physical file. We had a maintenance job that runs every month to reset the data file for six month old data on 8th month. Viola! Millions of records that are not needed are deleted in seconds  like magic without any costly delete or truncate operation.

Thirdly, we implemented our own Zend_DB_Table class that took care of connecting to correct RDS server, correct DB within the server based on cookie id depending on read or write operation.

Fourth, we used Multi-AZ RDS deployment to get the built-in failover protection. We never tried to see how it works and were fortunate and lucky enough quickly switched to Cassandra for OLTP.

Fifth, we had two write master RDS instances (one in each region) and two read replica's (one in each region) and each write RDS had 8 instances of our cookie database. This really helped us in scale and performance. All the reads and writes were screaming fast and never experienced dead-locks/performance issues.

With this solution we managed our explosive DB growth easily with no DBA and zero administrative costs. So far I have shared only good things about Amazon MySQL RDS and  will share my negative thoughts/limitations when I blog about our iteration 2 architecture.

Next post thinking about sharing our experience with Amazon cloud in general.

Hope you enjoyed it!


Tuesday, April 19, 2011

Geo-Load Balancing

During my learning years at previous job, we did launch our secondary DC but it was in active/passive rather than active/active. We were worried about DB replication, latency and implementing sharding for transactional systems. At the very end, we had solid ideas for DB Sharding but couldn't implement it for various reasons.

At current job,  I was lucky to start everything from scratch. We designed it for active/active using Amazon EC2, ELB, MySQL RDS and Dyn Inc's Dynect CDN Manager for our first iteration. We did migrate completely to different architecture in iteration two more about it later.

Our goal was to do all our processing of ad optimization and hand out the hints to our ad server within 100 ms. To do this we had to reduce latency issues as much as possible so we decided to take advantage of cloud solutions and geo load balancing.

Amazon Elastic Load Balancer (ELB) supports only load balancing with in the region  and we had to look for options to handle the geo load balancing. After lot of research we found Dyn Inc solution was very cost efficient and had products that met our needs. We used Dynect CDN Manager since ELB doesn't have static IP address. Also we defined  rules in CDN to send 100% of our west traffic to Amazon US-WEST ELB, 100% of east traffic to US-EAST ELB and the rest we split traffic to both regions.

Behind Amazon ELB we took advantage of multiple zones in each region purely for DR than for geo-load balancing, we had equal number of web servers in all zones, used central RDS in each region for DB writes, read replica's in each zone for reads. It looks easy when you read but it was interesting and tough journey for my team as they had to build and implement the entire solution in 3 months.

Note: Our support from Dyn Inc is really good and sometimes wonder whether our sales contact John D'Amato is sales or support person.

I am thinking of next topic to be either cloud providers or DB Sharding. As I mentioned earlier though we started with DB sharding moved to BigData.

Hope you enjoyed my first post!


Me too.....

After lot of thoughts I have started blogging now. To start with I am planning to share my experiences in building elastic scalable systems. My goal is to blog about my professional experience than of my personal experiences.

First thing first:

I have a habit of naming the product before starting it. At my current employer, we named our products as mInstinct, mShare, mAudience etc. So thought of naming this as mStory and get the url as But for now its going to be

Next Topic:
Geo-Load Balancing