One of Umbraco's strong points is its power to be used in a variety of hardware configurations with minimal set-up. If you have a lot of traffic to your website, or you get a sudden spike in traffic, then you need to consider how your website is going to scale. Out of the box, Umbraco is configured to work on a single server but with minimal configuration tweaks, it can be used in a load-balanced environment. There are a few hidden gotchas along the way. In today's post, I'm going to talk about some of the considerations that need to be thought about when configuring Umbraco in so you can cater for load-balanced scenarios.
How Should I Setup Umbraco To Work In A Load Balanced Environment?
One of the first questions that you need to consider when thinking about setting up Umbraco in a load-balanced environment, is how you will share the common resources. Each node will use the same connection string, so the database part is simple. Files and media are a little bit more complex.
When a content editor uploads some media in Umbraco it gets stored in the 'Media' folder. Let's say you have two nodes in your server set-up when a content editor uploads some media, it will be uploaded on one server but not the other. This means that half the time someone visits that page they would see a missing image, which is sub-optimal, to say the least. To solve this dilemma we have two options, automatically copy all the files from one server to the other one (replication), or create a shared folder somewhere that the two servers share.
When media gets uploaded into one site, all the other sites in your cluster will be able to see the same data. I've used both approaches before and they both work as good as each other. Really, which approach you take will probably depend on your current environment and budget.
When I've used replication before I used a program called SugarSync. You set Sugar Sync up on each server and point it to the folders you want to share (like the media folder). IF a change is made on any of the servers in that targetted folder they will be synced across. The more recommended approach is to use Microsoft's DFS (Distributed File System Replication). As I haven't personally used it, I can't really recommend it. When deploying Umbraco using file replication, the following folder and file should be excluded from the sync. If you don't do this you will see file locking issues:
When I've used this approach, I've found that the sync can take anything between a few seconds to minutes, so there's always a chance someone might see a missing image briefly.
How Do You Make Sure All Umbraco Nodes Are Serving The Same Content?
In a shared environment, you use something like a SAN, NAS, Clustered File Server or Network Share. When I've used this approach I've always used a SAN and it works pretty well, although there is an expense set against it.
Regardless of which approach you took, as we have all the websites in the cluster using the same files, are problems all solved right? The last part of the puzzle is cache invalidation. I've written previously, in , Umbraco Caching Explained For Beginners - What Is the Umbraco.config?. For a quick re-cap. To improve performance Umbraco uses in-memory cache and a file cache to keep information about the site in memory to reduce the number of required calls to the database. However, let's say we have two nodes in the cluster, a content editor updates the site on node a.
Even though the files are now in sync/shared the in-memory cache on each server will still be different until all of the nodes caches have been re-built. Without some sort of notification, how does a node know when to invalidate the cache? It doesn't, so if a content editor adds a page it may work on one node but 404 on another one due to the difference in the memory cache. Now, obviously getting a developer to refresh the app pool every time someone does some content is not a feasible solution.
Instead, Umbraco has an inbuilt feature that ensures updates are pushed out to all the servers in your cluster. In order to config this, we need to make Umbraco aware of all the nodes in the cluster. When Umbraco knows about all the servers, it can invalidate all of the caches whenever any of the nodes have an item published, edited or deleted.
Configuring Umbraco to support load-balanced clusters is probably the easiest part. First, we need to ensure that only one of the nodes is the 'admin' one, e.g. you only allow content editors to access the Umbraco backend from one node. This node should be configured with the Url admin.website.com and point to one of the nodes. Next, block or disable access to /Umbraco/ on the other nodes.
The next step is to tell Umbraco about the other nodes in the cluster. Like a lot of Umbraco config we need to look in, 'umbracoSettings.config'. You can find this file within your website's webroot 'Config' folder. In here you should find a distributed call element set to false, you will need to set it to true.
In the distributedCall section, you will need to add URLs, or the IP address to reach each node. One key part of this process is ensuring each node has a unique hostnames or IP address to access, e.g. something like node1.website.com, node2.website.com
The server names need to be unique names that reference each node, you can't just use the main website URL, or, IP as that will route it through the load balancer and mean the cache might not invalidate. If you are running in a shared environment, you also need to make sure that each server is set to use its own local cache and not a shared one. In your web.config there’s an option called, umbracoContentXMLUseLocalTemp, set this to false.
How To Configure Your Umbraco Backend In A Load Balanced Environment?
Now we have the files being shared correctly and the cache invalidating correctly, the last part if to make sure the Umbraco backend works as expected. In a load-balanced environment without any configuration if you tried to access the back-end you will be flip-flopped between the different nodes on each page request which can cause unexpected results.
Umbraco recommended approach is to have two unique hostnames for each node, like admin.website.com and node2.website.com. On the node, that admin.website.com you do nothing and this is used as the server all content editing is made on. On the server that resolved to node2.website.com you block the 'Umbraco' path. The rule to do this, might look something similar to:
This approach is recommended to stop the server from getting out of sync and potential nasty things happening. The reason you should use this approach and not something like a load-balanced affinity strategy is that data corruption if users and using the CMS concurrently. A few examples: Two editors save content at the same time, in Umbraco the last one wins.
In a load balancing environment, there is no guarantee of order so there's potential that a content item can end up without a parent and you'll get YSODsyou can not guarantee the state of the system. Another example,one editor could edit a page on server 1 whilst another user has deleted it on server 2.
In this scenario, due to the lack of order, it's possible a page could get saved minus a parent which will cause the website to blow up. The cache instructions could be in the wrong order and again you can't guarantee the state of the site. Maybe the issue on a single page basis doesn't scare you that much, the more likely scenario is that a content editor deletes a page on one node, while another editor moves a whole section on the second node. As the two things are happening simultaneously on different servers you can't guarantee how the cache will end up.
These issues highlight why you shouldn't use multiple servers to edit your Umbraco content. The configuration is not supported and if you ignore the warning then the stability of your website is at risk when load balancing. If you don't care about this risk because you only have one content editor then it is possible to use a load-balanced environment in an affinity strategy.
In an affinity load-balanced strategy, after a content editor has made an initial connection with the server, the load balancer will always assign that user to the same server on the cluster as long as you keep your browser window open. I've seen this strategy used on low traffic, rarely edited Umbraco sites and I've never seen an issue. In general, I strongly recommend always going with the future-proof recommended approach, which with Umbraco is only allowing content editing on one node.
Another possible approach is to configure the load balancer to favor one server for a www.website.com/umbraco request, but that depends on your environment. If you want more information about how to set this up, I suggest you look at, How To Change The Umbraco Backend Url.
In today's guide, I've written in detail about the consideration when setting up a load-balanced Umbraco environment. It's a pretty easy thing to set-up but it's definitely not good practice to just treat each website as standalone instances, they need to be set-up to work in a load-balanced environment. Where you use a file replication, or, a shared drive solution, like a SAN, is up to you.
Umbraco will work either way so this is definitely a consideration based on your server set-up. If pressed I would usually recommend a shared resource approach, like a SAN. When you set up Umbraco in a load balanced environment the backend should be configured to only work from one node ever. If you ignore this rule then the content editors may take your site down. If you only have one content editor this might be a gamble you ant to take, however, setting it up correctly can be done in 10 minutes so it seems a bit silly not to.