In this tutorial, I will hopefully make you look at release management in a new light 💡. There are a million and one different ways that you can ship code into production. Searching for the best way to release code online results in lots of articles AND lots of opinions 😕. Over the years, I've made hundreds of code deployments using a very wide range of technology stacks inclusing C#, React and JAMStack. One technique that I have personally found works really well is the tried and trusted canary release process🐦🐦🐦. If you are unsure what a canary release strategy is, or how you can easily start adopting this pattern in your organisation, this is the tutorial for you 🔥🔥🔥
Before talking about a canary release, let us first consider some of the pain points around releasing code using the 'normal' pattern:
- If you release new code that contains a bug to all of your customers you will impact everyone.
- In the event of an error, the whole release needs to be reverted
- Releasing code can lead to instability, and performance bottlenecks within the platform for a period of time
- It is often hard to know exactly how the site is configured
- Reverting a whole release due to one critical bug can mean losing new features that are giving the customer some benefit
- It is hard to measure the impact of your release. How can you tell if the code is giving you the desired outcome you predicted?
Ask any organisation that manages their own digital platform and in every single instance, you will hear horror stories about releases that have gone wrong at some point in their history. Shipping bad code into production is inevitable. After a company has had to deal with a number of releases that have gone tits-up, many companies logically start to add stricter processes and governance around their release management. This approach might help, however, it has two massive drawbacks:
It slows down every single release.
It only mitigates the risk of bad code being pushed into production, it does not stop it
Trying to add some governance around your releases is great, however, at some point a critical bug will be pushed into production and you are left with the same issues as before. You need to revert an entire release and all of your customers are impacted. This is where having a release management strategy, like a canary release comes into its own.
What Is A Canary Release?
From 1911 to 1986, it was standard practice for coal miners to bring two canaries down into the pit with them. As canaries have smaller lungs they are more susceptible to carbon monoxide and toxic gasses. A distressed (or dead) canary would provide an early warning signal to the miners that the air was not quite right. A dead canary was a signal for the miners to leave the mine straightaway.
The term canary release was coined at some point to describe a safer release process that contains an early warning signal. This term is often mixed up with another similar concept of green/blue deployments. Instead of releasing your code to the entirety of your customer base in one big bang, the release is made visible to only a small fraction of users. After a release, the stability of the platform is measured for a period of time. In this monitoring period, if you notice that a release causing a P1 bug, two important things can happen. First, the impact of the bug will be smaller. Second, the canary branch can be quickly disabled. How this is done can vary, the two main ways are cutting traffic to a server that contains the new changes, or, via a feature flag. Doing all the dev-ops to revert a server can be complex and expensive, not to mention the monitoring software. Using a feature flagging approach and turning off a feature in production is usually a lot simpler to implement. If we can agree feature flags are good, the next question is what's the best strategy to add them within our projects?
The Power Of The Feature Flag
A feature flag is a simple boolean switch that can be called at runtime to decide if a feature should be enabled or not. The code to create a feature flag can be pretty simple, an
if statement that checks if a configuration value is either
false. You might be reading this and thinking big deal, I can build my own feature flagging solution. It's simple, right? The answer to this is not clear-cut, the answer might be yes and no.
If you have a single website, on a single environment and you have one feature flag. Yes. If you work in a large company with many teams that has multiple tech stacks, multiple websites, and multiple mobile apps creating a bespoke feature flagging solution that works well will take time and effort. Not getting it right will also likely be a big growing pain point at some point in the future.
The main complexities around building a bespoke feature flagging solution for your company are the interoperability of the tech stacks you use and the actual management of the configuration. In order to turn the flag on and off, you will need to add some configuration, somewhere. Managing configuration can be complex. I've implemented a number of architectures for managing application configuration over the years. These have ranged from storing config within a Ci/Cd tool, using Mongo or Reddis to access settings, storing settings within the CMS, and the classic... adding configuration settings in the application itself.
If we are honest, I'm guessing most of us still have lots of configuration hidden within a setting file somewhere lurking within our project.
package.json, or a cheeky custom
.env file. I will be the first to admit when I am building a new project, I still do this all the time 😕. Why? When I am developing a feature locally and I'm simply trying to get it working, it is often quicker and easier to add it inside of a local config file. In the rush to hit the sprint deadlines, the setting becomes unconsciously part of the application. If so many applications are built this way, is it a big deal?
The typical challenges of managing configuration within code can be boiled into these main pain-points:
- Only developers can access configuration
- Sharing the same configuration in real-time between your app and website can be difficult
- When things go wrong, the configuration hunting game gets old very quickly. Having to search in multiple repositories (website, app, API, etc..) to try and find what config is a waste of time
- Merge conflicts and build transforms are always a source of configuration bugs.
- It is often hard to know exactly what config settings will be deployed without building and deploying first
- A release is required to change a feature in production
Dynamic configuration can solve a lot of these process problems listed above, so how do you get started?!?!?
The nice thing about using a tool is that you will also get additional benefits:
Avant to do target releases of new features, e.g. canary releases✔️
Want to use experimentation to capture data and prove that your customers are using the feature how you expect✔️
Want an easy way to pass dynamic configuration into your features that can be managed by non-developers ✔️
Want a way to manage the same flag across devices in real-time without having to do anything ✔️
This is where a tool like Optimziely Fullstack could help you. Fullstack is an off-the-shelf tool that can get you up and running with feature flagging in less than an hour. You can implement flags using any programming language and you will have a central area to manage your flags. Even if you are a small team a feature flagging tool will be quicker and easier than building something custom. All you need is an account and some knowledge to get started.
Going back to this posts original point, on any feature flag you create within Optimziely Fullstack, you can also specify what percentage of your customer base will see a feature.
Using Fullstack makes it very easy to start incorporating canary releases into your release process:
Create a new feature, wrap it in a feature flag
Release the code with the feature turned off
Enable the feature and only allow 20% of your audience to see it
If things are bad, hit the feature kill switch to disable it
If things are good, ramp up traffic allocation until you hit 100%
Out-of-the-box, traffic allocation will be based on all of your visitors. What happens if you want to add some personalisation to your site by only allowing certain customer segments to see your feature? This is also possible by creating custom audiences/segments.
Another nice feature is that you can define different environments. This means you can have different configuration values per environment. Changing which environment your code calls is done by passing in a different access token. If you have a development, staging and production server, you can simply set up three environments in the FullStack portal. As long as you use the correct access token in each environment and you have a simple way to test that your feature flags are working anywhere in your stack.
If you want to learn more about Fullstack, check out my YouTube channel (video released in a few weeks). I have created a video that will show you how you can up and running using a NextJs application in under 15 minutes 🔥🔥🔥
Happy Coding 🤘