A common problem with large scale web applications

When working on large scale web application, typically hosted on multiple content delivery servers, regardless if it’s due to high load or a requirement for redundancy, there are quite a few pitfalls to handle.

One of them is on-demand, content delivery web server generated content. My most common case are ordinary web images. Images are typically selected by content authors and they don’t know anything about pixels, bandwidth etc. The people responsible of that are the web developers. The allowed image size constraints should be specified in the aspx/ascx/cshtml files. This means that you utilize some sort of function to resize the image on request of that image and you cache the result, typically as a file on the content deliver web server.

Since you’re good internet citizen, you have version control of your generated images as part of the image URL and you use long time client caching. And since you need speed, you use a Content Delivery Network (CDN) as well. Here I assume you use the site as CDN origin, i.e the images aren’t uploaded to a CDN storage.

Now, using multiple content delivery servers behind a load balancer, this usually causes problems. Let’s say you update something that’ll change the image and its URL (the versioning part). During the publishing sequence of your CMS, you’ll have a state where some content delivery servers are updated and some are not. The time may be short, but it’ll be there (*).

Since we’re talking large scale here, you will now have a problem. A visitor to your newly published site will get an updated html page with the new image URL. The client browser will then load the new image, typically through the CDN, and that request eventually goes into the content delivery cluster. Where will it end up? Possibly (or probably if your load balancer considers backend latency) on a server that hasn’t got the new image yet. We’re still in the middle of a publishing process, right. So, depending on your implementation, you’ll serve an old image, generate a new image based on old data, or give a 404. All are bad.

To make things worse, it’ll probably get cached by your CDN, so now all visitors will get an incorrect image.
Continue reading

Amazon CloudFront+Sitecore bugfix

Serving your Sitecore Media items, on public sites, through a Content Delivery Network (CDN) is always good. I like using Amazon CloudFront since it’s really cheap and easy to set up. For large, high volume sites, I’d look at alternatives as well. But since there is no startup cost or fixed monthly fees etc using CloudFront, I think all small and mid size Sitecore setups should leverage from it.

There are quite many tutorials out there on how to configure Sitecore for CloudFront, so I won’t get into that here. Maybe I’ll post something later..

There are a few minor problems with CloudFront though. One is a nasty bug that spoils proper caching.
Continue reading