Doing Azure Assessment – Disk IOPS and Bandwidth has high impact on correct sizing

I know many of you while doing a Azure Assessment mainly focusing on Memory, CPU Core and Size and no. of the disks to find the right T-Shirt size of the Azure VM’s, however most important part which many assessment tools may ignore is the requirement of the correct disk IOPS, latency and throughput sizing, which is critical for most of the applications otherwise you may need to change the template at a later stage which will impact overall migration cost forecast.

To know more about the sizing parameters, you can refer one of my post that the top 10 information which are required for a reasonable sizing estimate of the Azure VM.

People who are from the storage background were very much aware of the terms related to IOPS, latency and throughput. In the golden days of SAN storage the disk manufacturer generally bench mark their product with the value of the maximum throughput the disk can deliver.

Throughput is nothing but the Average IO size X IOPS which is generally measured in MB\Seconds.

This can be compared with the new Bullet train which is going to be launch in India by 2020. It’s the maximum speed which the bullet train can reach, currently the maximum speed of the bullet train is 500 Km\Hr. a disk can also have 500 MB/Sec throughput which is the maximum it can deliver.

The next parameter is the IOPS which is very important for the correct sizing of disks. This means IO operations per second, which means the amount of read or write operations that could be done in one second.

Another important parameter is the IO Size which is the size of the data which can be processed by the I/O operations.

The last and one of the important parameter is the latency. Latency is how fast a single I/O-request is handled.

What is the best way to get the IOPS and throughput information from a Windows Server where the present application is running?

There multiple ways and multiple tools available in the internet like Iometer from Intel and Diskpad however I’ll recommend that if you are evaluating the disks of a Windows OS based system you should always use the Windows Performance Counter for your assessment. The Perfmon will give the required metrics for the correct assessment.

To collect the metrics you should configure the data collector set in a way that it should capture the right set of metrics and the perfmon counter should run in a period when the VM or Physical machine should witness the highest activity. If you consider a business case the metrics for an ERP application can be taken for a period of Monday to Friday because the application is at its pick at that time and that period data should be considered for the right sizing of the IOPS and throughput. However for few database VM’s the highest pick can be at weekend because the team may run some jobs in weekend and if that the case you should consider the data collection period for the weekend. To know the best time period for collecting the metrics you should contact the application owners.

Now let’s consider the main important part of this article. How we are going to determine the correct VM size of an Azure VM, before we understand this we should find out how Microsoft has sized their VM templates. In my analysis I have taken couple of on premise VM’s to understand the sizing.

I must tell that Microsoft is non consistent in the parameters which it has defined for the disk sizes across the template. However in most of the cases Microsoft has considered the following two parameters.

Microsoft measures the disk throughput and they usually consider this two parameters for the throughput calculation.

  • IOPS (Input/output operations per second)
  • MBps (Bandwidth for disk traffic only, where MBps = 10^6 bytes/sec.)

Please note that IOPS is a number here and the unit of Bandwidth is in MBps.

As I have informed you earlier we can collect the server storage data with windows perfomon counter. I have configured the items marked in red in the data collector set which I ran for 24 hrs. to collect the metrics from the server.

To understand it better let’s take three use cases, the first one is for low configuration application server and the 2nd one for the high configuration database server and 3rd one for an old ERP Server.

And in this sample example below you can find that I have taken an example of a low configuration on premise Application Server. As you can see in the below graph I have collected the storage (Physical Disk) data of the VM.

Fig: Physical Disks IOPS and throughput usage for 24 hrs. For the on premise sample application server.

As here you can see the in the above example I have collected the Perfmon data for 24 hrs. in a typical business day and have plotted a graph against the IOPS and throughput (disk bandwidth). In the above example the maximum IOPS is showing 310 IOPS. And in the above graph plot which is only for disk bandwidth the maximum bandwidth is showing around 12 MBps.

Based on the above metrics I can conclude that a VM template which can support IOPS 300 to 400 and bandwidth of above 12 MBps is suitable for this Application. Now let’s took a look into the CPU and Memory utilization of this server.

For the same VM the CPU and Memory Usage is showing as below.

If you look at the CPU utilization you can see the average CPU utilization is around 40% since this system is having two cores, and the average memory utilization is around 2.15 GB. Since CPU utilization is around 40% we can choose a 1 core VM however that will not fit since the memory requirement is high.

So if you take a look on the general purpose A2 series VM you can find that a Standard_A2 template is suitable for this Application.

Now you may ask the question that in this above table we don’t see the information for the 2nd parameter for storage which is disk network bandwidth which I have mentioned above. To get that information you need to refer another table here.

So for the VM templates where you don’t find the Storage bandwidth please refer to the above table. Where it is mentioned that Standard Tier VM will support Max bandwidth of 60 MB/s

As per the assessment with the data collected from the perfmon metrics below table describe the sizing parameters which we have considered and what is the best fit.

Parameters On Premise VM Selected Azure Template (Standard_A2)
CPU Core 2 2
Memory 2.2 GB (Max Utilized) 3.5 GB
IOPS 310 (Max required) 500 (Stripe Volume NA)
Storage Bandwidth 12 MBps 60 MBps

Now let’s talk about our 2nd example where we considered a database Server. As you can see below this is a high IOPS database server so the graph will look like this as shown below.

In this above figure you can find the IOPS is going above 20000 and network bandwidth is touching 700 Mbps. Let’s now check the CPU and Memory utilization for this server.

The above figure shows the CPU and Memory utilization data. Except few spikes in CPU we can see average CPU utilization is below 30% and the average memory utilization is below 128 GB however there are occasional spikes.

In this example let’s look into the following table for ESv3-series. The ESv3-series is a series of memory optimized VM’s, ESv3-series instances are based on the 2.3 GHz Intel XEON ® E5-2673 v4 (Broadwell) processor and can achieve 3.5GHz with Intel Turbo Boost Technology 2.0 and use premium storage. Ev3-series instances are ideal for memory-intensive enterprise applications.

The ESv3-series VM template table will look like this.

As you can find in the above table in the column number seven it will show the Max IOPS Size and Max Disk Network Size.

Now in next step you need to concentrate on this table for premium disks. You can add the number of disk to achieve the IO and Disk Bandwidth

For the Standard_D32s_v3 VM the IOPS is 51200 which I have marked in red and throughput is 768 MBps also have CPU core of 32 and Memory of 256 GB can be the best fit for this VM. However if we can consider the average CPU utilization, Memory Utilization and IOPS and Disk Network Speed we can also select the Standard_E16s_v3 template to get the max utilization of the resources. This is a call which Azure System Admin need to take. Please note that Azure VM template can be easily upgraded in case utilization causes any issue. Price wise there will be almost 50% difference in both the VM template.

Let’s verify the sizes here which we have considered in this exercise.

Parameters On Premise VM Selected Azure Template (Standard_E32s_v3) Optimized Azure Template (Standard_E16s_v3)
CPU Core 32 32 32
Memory (Average) 128 GB (Average) 250 GB (Max) 256 GB 128 GB
IOPS (Average) 20000 51200 25600
Storage Bandwidth (Max) 350 MBps 768 MBps 384 MBPs

Let’s take another 3rd example, this server is old and have only 8 core CPU but 48 GB Memory which has been increased based on the requirement in last 6 years, the utilization is showing as below

In the above physical server the CPU usage is 80 to 100 percent and the max memory usage is 90%

If we look at the IOPS and Disk Network Usage Graph it will show like this

Max IOPS is touching 18000 and max disk network bandwidth is touching 850 MBps

This is a special case where IO and Disk Network Bandwidth Requirement is very high for this case we need to select the high IO intensive VM from the template. And if you look into high IO intensive VM the table look like this

In the above table my selection will be Standard_L8s which will fit for CPU and Memory but for achieving the IOPS and Network Bandwidth requirement we need to consider striped disk volume of minimum (Please refer to above premium disk table) four disk (P40) which will help us to achieve a disk bandwidth requirement of 900 Mbps, however this is not guaranteed/possible as per this article by MS. If you need guarantee, you need to choose Standard_L32S which will be a dedicated machine for your workload in Azure and very much over kill in terms of CPU and Memory but fit well for the Network Bandwidth Usage, however it will be super expensive as well.

As per the below link, it is mentioned that VM throughput limit should be higher than the combined IOPS/throughput limit of the attached disks, which will discard our above disk striping idea since the combined limit of the VM is less than what is required here. Please check this URL for more details.

Clearly as per the above statement VM template overrides the combined striped disk IOPS and Disk Network bandwidth which is a sad news. L

Let’s verify the correct sizing if we go with above article by MS, here which we can considered in this exercise.

Parameters On Premise VM Selected Azure Template (Standard_L32S*)
CPU Core 8 32
Memory 48 GB (Max Utilized) 256 GB
IOPS 18000 (Max required) 40000
Storage Bandwidth 900 MBps 1000 MBps

Standard_L32S is a dedicated VM and will incur huge cost for the enterprise and it will overkill the CPU and memory.

Conclusion:

As we have seen in our examples IOPS and Disk Network Bandwidth is playing an important role to do correct VM sizing, so it is always recommended that you should consider these parameters while you do your next Azure Assessment otherwise it will be nightmare for you. If you are migrating on premise VM or Physical Server to Azure and you find IOPS and Network Bandwidth Requirement is very high, you should always request the application owners if they can tune the application or database in the server so that it will help in reducing the T shirt size in the VM. Azure assessment is not very easy process and it needs time and effort to make the best utilization of your Azure budget.

3 Comments