Serving Content on S3

Feb 20, 2008 – 09:51 by Ryan

Amazone WS

After the recent Amazon S3 outage we decided to investigate why people use S3 to serve content. On the surface it seems like a good deal but if you actually investigate the bandwidth costs, geographic locations, reporting and fault-tolerance, you’ll see it may not be the best deal out there for serving content.

Billing Models

If your application constantly serves 1 Mbps then you will consume ~ 321 GB of bandwidth per month. This of course does not include the inbound traffic which Amazon also charges to you. In reality, applications don’t tend to serve a consistent amount of data. Bandwidth consumption fluctuates up and down, the peak to mean ratio for a site is usually around 2:1. One important aspect of paying by Mbps is that you typically enjoy billing on the 95th percentile. This means that each month you can throw out the top 5% of bandwidth consumption. With a bytes transferred model you pay for everything (in and out). As network systems become more efficient, it is likely that everything will be billed on a bytes transferred model but, for the next few years, take advantage!

Costs

Now, let’s look at the costs associated with serving content in S3. Each month, you’re paying approximately $58 / Mb. This is a decent price per Mb but it’s not the best out there. Once you reach S3’s top tier you’re paying around $41 / Mb. Of course to get to this level you must be serving 50+ TB of data or a consistent 160 Mbps. The real problem is that once you’re in the top tier, you’re able to buy the bandwidth yourself for a lot less. If you use a modern CDN then you’re going to pay somewhere in the same price range as S3 but you get geographic distribution, advanced reporting and fault-tolerance as well.

Content Delivery Network

Amazon really isn’t a CDN. They serve content out of a couple geographic locations. This may work well if you’re physically near one of the data centers but for the rest of the world, it’s not adequate. Also, as the recent outage confirmed, they’re not able to distribute load in the event of a failure. Oh, it looks like Google isn’t the only Internet company using a two level name system to host content. It appears as if Amazon may be using this approach as well.

Review

We think S3 is a good system for archiving data. Amazon provides a trusted remote data center for you to store your content at a reasonable price. Would we use it to serve content real-time? No. Would we use S3 as an origin server for a CDN? Yes!

Sphere: Related Content

Post a Comment