AVD Disaster Recovery

  1. Introduction
    1. Planning Considerations
      1. Region Considerations
    2. Components
      1. Identity Provider (Active Directory)
      2. Networking
      3. VM Image (Golden Image)
      4. Storage and Profile Data
      5. Session Host VMs
        1. VM Replication using Azure Site Recovery
        2. Pre-build Hosts into DR Region
        3. Build Hosts into DR Region when needed
    3. Single Host Pool or Multiple?
    4. Overview

Introduction

When it comes to the wonderful world of AVD there are some questions that come up from customers quite often. One such question is:

“How do I handle Business Continuity and Disaster Recovery of the platform?”

I recently wrote a blog on the use of FSLogix Cloud Cache to enable a highly available profile solution that could be used to provide based DR in the event of storage access issues. Obviously, though this only covered the profile (storage) aspect of the solution. What about the Host Pools and Session Hosts themselves?

The purpose of this blog will be to advise on the various DR setup that can be configured to protect your AVD environment in the event of a regional/service outage.


Planning Considerations

In order to implement an effective DR plan we need to review the current platform design and review the current DR processes to ensure the new plan meets all business requirements.

When looking at a DR plan it is advised to concentrate on a few key areas.

Identify Critical Services
Which services/functions are critical to the service operating? I.e.: Which services will need to be protected/replicated.
In our case, we need to be identifying which parts of the AVD platform are critical to usage so we can include these components in our DR plan.

Impact Assessment for Loss of Service
What would be the business impact of any of these critical services being offline?
We need to understand the potential impact if this loss of service was to occur. Would the loss of the platform cause users to be unable to perform their jobs or are there potential workarounds available in the event of an issue?

Recovery Time Objectives
What are the business requirements for RPO and RTO for DR scenarios?
In the event of a DR issue what timeframe do we have to restore the platform? How much potential data loss is acceptable? These are standard DR considerations and will affect how the DR plan is implemented.

There are other considerations such as Risk Assessment, Testing Plan, etc but I am going to concentrate on these main areas for planning purposes.

Region Considerations

One of the main considerations for DR is going to be the location of the DR data. If we have our core AVD environment running in North Europe we will likely want to pick West Europe for the DR region. Ideally, we want the region to be close to the user entry point.

You will need to be careful when choosing this region to ensure you do not affect the performance of the environment when running in DR.


Components

It is important to recognize which aspects of the AVD environment you are responsible for managing and maintaining as the customer. AVD itself has been configured by Microsoft as a globally provisioned, highly-available service. The core AVD infrastructure itself is managed and maintained by Microsoft.

The below table explains where responsibility lies for each component:

Microsoft-ManagedCustomer-Managed
Load BalancerIdentity Provider (Active Directory)
GatewayNetworking
Connection BrokerVM Image (Golden Image)
DiagnosticsSession Host VMs
Storage and Profile Data

In the event of a regional outage, the Microsoft-Managed components will be made available in another region with minimal impact on the customer.

Therefore we don’t need to consider the AVD core infrastructure itself within our DR plan as Microsoft will do its best to ensure the services stay operational.

In order to correctly implement a DR process for our AVD environment we will need to ensure the customer-managed resources are covered by our DR plan.

Next, I will detail the options for each component type and how we can approach DR:


Identity Provider (Active Directory)


With the Active Directory DR, this will very likely be in place already for most organizations. If there are any other services hosted within Azure it is likely there are multiple Active Directory Domain Controllers deployed into multiple regions.

Ideally, we will either need a Domain Controller in each region that has infrastructure deployed so that in the event of a primary region failure services can still contact the local Domain Controller.

If there are already multiple Domain Controllers within your Azure tenant spread across multiple regions and you don’t wish to add another specifically for the AVD service, you will just need to ensure the AVD DR VNET has connectivity to an active Domain Controller.


Networking

First things first you will need a suitable Virtual Network configured in your chosen DR region. This network will be required to host the AVD Session Hosts, provide connectivity to Storage, and access to corporate data if required.

Therefore, the VNET must have the relevant peerings or VPN capabilities to access all the required networks.



VM Image (Golden Image)

This is something that is not always thought about but I believe is useful to consider for a DR plan.

There may be situations where the DR plan involves a rebuild into a different region rather than a replication.

Therefore, we need to ensure that the VM image that the AVD environment is built from is available in multiple regions. This is where Azure Compute Gallery comes into play.

With Azure Compute Gallery a VM Image Definition (think of this as the image type. E.g. AVDGoldenImage) is created which contains Image Versions that can be replicated to various Azure regions.

Azure Compute Gallery Image Definition images are available across subscriptions. This gives you the ability to have a deployment script deploy a new AVD environment into the DR region using the standard build image.


Storage and Profile Data

Handling replication of the Storage and Profile configuration is key to a successful DR plan.

Depending on the storage solution in place this could turn into another full blog with all the possible options. In this case, I am going to focus on FSLogix-provided profiles running on Azure Storage Accounts.

I have written a blog on one of the ways that DR can be achieved using FSLogix Cloud Cache

Using FSLogix Cloud Cache for DR

For a lot of customers, this will be suitable for most scenarios.

As mentioned other storage architectures will allow a variety of different replication strategies. Using physical storage you could have storage snapshots replicated between enclosures.

For software-based storage solutions (Windows File Server) you could employ something like DFRS to perform replication for high availability. The options will depend on the solution you have deployed.

However, the main thing is to make sure this data is replicated and available for access by the Session Hosts in the DR region.


Session Host VMs

The main component of the AVD environment is going to be the AVD Session Host VMs themselves.

How we deal with this component is key to keeping the AVD environment available in the event of any major issues.

There are a few options you have available to you for this component of the AVD environment. I will talk about 3 possible options below:

VM Replication using Azure Site Recovery

This is the solution that Microsoft would recommend. This involves using Azure Site Recovery to replicate the AVD Session Hosts into a secondary DR region.

In the event of a failure of the primary region, failover of the VMs would be invoked and the replica machines would become active to server user connections.

When performing this failover the administrator will need to remove sessions from AVD in the primary region before the failover can be actioned.

This strategy is most useful for Personal Session Hosts which store user data on the VM itself. It may not be needed for Pooled Session Hosts as advised below.

Pre-build Hosts into DR Region

If you don’t wish to set up replication of the Session Hosts we can instead pre-build a number of Session Hosts into the DR region. These Session Hosts will exist within the same Host Pool as the primary Session Hosts.

These DR Session Hosts would remain powered off until needed. This method could potentially save on ASR costs and time to failover the VMs.

This could be used for Pooled Session Hosts where there is no personal data stored on the devices. This method would not be ideal for Personal Session Hosts as any user data would be lost.

Build Hosts into DR Region when needed

Another option is to build the Session Hosts in the DR region as needed. This is useful as it does not have any charges for replication nor storage/compute charges for the DR Session Hosts.

It would be advised to have this process automated through a Deployment Script or a management tool such as Nerdio or Project Hydra.

All the above options can be used using a single Host Pool configuration. This means there is no need for a specific DR Host Pool as this would require replication of the App Group configuration.

Using a single Host Pool also ensures users have the same login experience in a DR scenario as they would have for day-to-day use.

However, an important consideration is whether there are different DR requirements for different sets of users. If this is the case Microsoft recommends that you have multiple Host Pools with different configurations.


Single Host Pool or Multiple?

One consideration when designing DR for an AVD environment is whether you have a single host pool that will contain all Sessions Hosts, or you deploy the DR Session Hosts into a separate Host Pool configuration.

Obviously, the answer to this question depends entirely on the use case for your environment.

In most cases extending the existing Host Pool into the DR environment (whether via ASR or deployed VMs). The main advantage of this process is that end users will experience no difference in login experience, they will connect to the same Host Pool and the same Application Group Desktop Session as they would on any other day.

Creating a separate DR Host Pool would require the creation of all Application Groups and configuration with the DR site which then requires the administration of 2 separate AVD Host Pools.

A potential use case for a new Host Pool may be that you wish to limit DR access to a specific subset of users (different from the standard access) you could have the Application Groups configured with these access requirements if needed.


Overview

Hopefully, this post has shed some light on the various ways to go about DR scenarios for AVD.

Obviously, each environment will have various different requirements from a business standpoint so unfortunately, it is not a one size fits all type of thing.

I would advise anyone looking into implementing DR should first look at the business requirements and then tailor the relevant solution around those requirements. For instance, some organizations may have no need for the profile data to replicate if all data is already stored within OneDrive/SharePoint.

Others may want to have additional redundancy on the profile storage by means of Storage Snapshots or backups.

Also, the cost of running ASR for Session Host replications may stop this from being used as an option. Therefore it may be that deploying Session Hosts during the disaster is the chosen solution. In short, there are many ways to provide the DR functionality and each environment will have its own way of doing this.

This post was aimed mainly to draw attention to the various components that have to be considered for a DR scenario and offer some suggestions on how this would be dealt with whilst following Microsoft’s Best Practices.

One thought on “AVD Disaster Recovery

Leave a comment