Disaster Recovery of Azure VM – Step by step configuration guide

I think if you are an old school infrastructure management service techie, you must have been part of many DR excersice during your various job roles. If you are a pre sales techie, in many of the pre sales discussion you may have tried to convince your customers that Microsoft Azure enviroment is highly reliable and available so you don’t need to setup any DR enviroment. However it’s hard to digest by the customer because of the compliance need. The compliance requirements such as ISO 27001 still require that you have a provable disaster recovery solution in place as part of a business continuity plan (BCP). For many days the questions related to setting up a DR site in another Azure region doesn’t have any concrete answer untill May 2017 when Microsoft has released the Diseaster Recovery (Preview) of the Azure VM’s. However I will say this functionality is still not fully functional since there is no support for managed disks.


Edit: Managed disks are now fully supported in ASR. Please refer the below article.

Article for the support of managed disks.

Today we will see how we can configure the disaster recovery step by step.

Configure Azure VM disaster recovery step by step for the VM’s which have unmanaged disks

I have selected a VM in my Lab, the VM is located in West US 2 and it’s having Windows 2016 Operating System.

It’s a Windows 2016 Datacenter Server VM, please find the OS version below.

The next step is to go the disaster recovery (Preview) tab as you can see below.

In the next step you need to configure the disaster recovery for this VM.

Select the resource group under which the replicated VM will be created when the VM is failed over.

Select the virtual network in the target region to which failed over VM will be associated to.

Select the cache storage account, cache storage account is located in the source region. They are used as a temporary data store before replicating the changes to the target region. By default one cache storage account is created per vault and re-used. You can select different cache storage account if intend to customize the cache storage account to be used for this VM.

Data being replicated from the source VM is stored in replica managed disks in the target region. For each managed disks in source VM, one replicated managed disk is created and used in target region.

Recovery services vault contains target VM configuration settings and orchestrates replication. In the event of a disruption where your source VM is not available, you can failover from recovery services vault.

Vault resource group is the resource group of the recovery services vault. Replication policy defines the settings for the recovery point retention history and app consistent snapshot frequency.

The world map below shows the Azure Data Center’s which we have chosen for the replication. We have chosen to replicate the VM from West US 2 to East US 2.

The next step is to create the Azure resource.

When you can check the progress you can see the deployment is in progress.

In the next step it will show the replication going on for the VM.

Since this is part of the ASR (Azure Site Recovery), it will perform the same jobs which is generally done during the VM migration. You can find below the jobs which are triggered.

Note: For more details on Azure Site Recovery you can click here.

After some time you can find out that enable replication has been completed.

The replication may take 15 minutes to few hours depending on the size of the VM

As you can see below in my case 98% percentage has been completed after 20 minutes

Since the VM was small it was completed after 25 minutes

What is RPO: RPO is the Recovery Point Object.

Recovery Point Objective (RPO) describes the interval of time that might pass during a disruption before the quantity of data lost during that period exceeds the Business Continuity Plan’s maximum allowable threshold or “tolerance.”

Example: If the last available good copy of data upon an outage is from 16 hours ago, and the RPO for this business is 20 hours then we are still within the parameters of the Business Continuity Plan’s RPO. In other words it the answers the question – “Up to what point in time could the Business Process’s recovery proceed tolerably given the volume of data lost during that interval?”

Now I have to shut down the primary VM just to check the RPO status after two days. After two days the RPO was showing 2 days as you can see below.

And there is an error about replication was halted.

After I have started the VM the replication has been completed and the data from the primary site and DR site has been synced and the RPO has came down.

Run a disaster recovery drill for Azure VMs to a secondary Azure region

To test I have decided to a test failover

A test failover configuration is shown below.

Test failover took some time but it was not very high.

After few minutes failover has been completed successfully.

Now I can see both the VM’s in the primary site and the DR site is running in two different Azure regions.

The next step is to clean up the test failover

You can mention some note below.

Once you click on OK it will start the task to delete the VM

After some time the task will be completed as you can see below.

That’s all about today, I think you will like my post on Azure Disaster Recovery (Preview), I will bring more on BCP and DR on Azure in my future posts. For more details on each replication steps you can click here.

Enjoy rest of your day!!!!