Windows Server 2012 Deduplication is Amazing!

The following article describes how to use Windows Server data deduplication on an Solid State Drive (SSD) that holds active Hyper-V virtual machines.

Coloring Outside the Lines Statement:
This configuration is not supported by Microsoft.  See Plan to Deploy Data Deduplication for more information.  Use these procedures at your own risk. That said, it works great for me.  Your mileage may vary.

A while back I decided to add another 224GB SATA III SSD to my blistering Windows Server 2012 Hyper-V server for my active VMs.  The performance is outstanding and it makes the server dead silent.  I moved my primary always-on HyperV VM workloads to this new SSD:

  • Domain Controller on WS2012
  • Exchange 2010 multi-role server on WS2012
  • TMG server on WS2008 R2
These VMs took 134GB, or 60%, of the capacity of the drive which was fine at the time.  Later, I added a multi-role Exchange 2013 server which took up another 60GB of space.  That left me with only 13% free space, which didn’t leave much room for VHD expansion and certainly not enough to host any other VMs.  Rather than buy another larger and more expensive SSD, I decided to see how data deduplication performs in Windows Server 2012.
Add the Data Deduplication Feature
Data Deduplication is a feature of the File and Storage Services role in Windows Server 2012.  It’s not installed by default, so you need to install it using the Add Roles and Features Wizard (as above) or by using the following PowerShell commands:

PS C:\> Import-Module ServerManager
PS C:\> Add-WindowsFeature -Name FS-Data-Deduplication
PS C:\> Import-Module Deduplication

Next, you need to enable data deduplication on the volume.  Use the File and Storage Services node of Server Manager and click Volumes.  Then right-click the drive you want to configure for deduplication and select Configure Data Deduplication, as shown below:

Configuring Data Deduplication on Volume X:

So far, this is how you normally configure deduplication for a volume.  You would normally configure deduplication to run on files older than X days, enable background optimization, and schedule throughput optimization to run on at specified days and times.  It’s pretty much a “set it and forget it” configuration.

From here on I’m going to customize deduplication for my Hyper-V SSD.

In the Configure Data Deduplication Settings for the SSD, select Enable data deduplication and configure it to deduplicate files older than 0 days. Click the Set Deduplication Schedule button and uncheck Enable background optimization, Enable throughput optimization, and Create a second schedule for throughput optimization.

Enable Data Deduplication for Files Older Than 0 Days

Disable Background Optimization and Throughput Optimization Schedules

Click OK twice to finish the configuration.  What we’ve done is enabled data deduplication for all files on the volume, but deduplication will not run in real-time or on a schedule.  Note that these deduplication schedule settings are global and affect all drives configured for deduplication on the server.

You can also configure these data deduplication settings from PowerShell using the following commands:

PS C:\> Enable-DedupVolume X:
PS C:\> Set-Dedupvolume X: -MinimumFileAgeDays 0
PS C:\> Set-DedupSchedule -Name “BackgroundOptimization”, “ThroughputOptimization”, “ThroughputOptimization-2″ -Enabled $false

This configuration mitigates the reason why Microsoft does not support data deduplication on drives that host Hyper-V VMs.  Mounted VMs are always open for writing and have a fairly large change rate.1  This is the reason Microsoft says, “Deduplication is not supported for files that are open and constantly changing for extended periods of time or that have high I/O requirements.

In order to deduplicate the files and recover substantial disk space you need to shutdown the VMs hosted on the volume and then run deduplication manually with this command:

PS C:\> Start-DedupJob –Volume X: –Type Optimization

This manual deduplication job can take some time to run depending on the amount of data and the speed of your drive.  In my environment it took about 90 minutes to deduplicate a 224GB SATA III SSD that was 87% full.  You can monitor the progress of the deduplication job at any time using the Get-DedupJob cmdlet.  The cmdlet shows the percentage of progress, but does not return any output once the job finishes.

You can also monitor the job using Resource Monitor, as shown below:

Process Monitor During Deduplication

Here you can see that the Microsoft File Server Data Management Host process (fsdmhost.exe) is processing the X: volume.  When the deduplication process completes, the X: volume queue length will return to 0.

Once deduplication completes you can restart your VMs, check the level of deduplication, and how much data has been recovered.  From the File and Storage Services console, right-click the volume and select Properties:

Properties of Deduplicated SSD Volume

Here we can see that 256GB of raw data has been deduplicated to 61.5GB on this 224GB SSD disk – a savings of 75%!!!  That leaves 162GB of raw disk storage free.  I could easily create or move additional VMs to this disk and run the deduplication job again.

The drive above now actually holds more reconstituted data than the capacity of the drive itself with no noticeable degradation in performance.  It currently hosts the following active Hyper-V VMs:

  • Domain Controller on WS2012
  • Exchange 2010 multi-role server on WS2012
  • TMG server on WS2008 R2
  • Exchange 2013 multi-role server on WS2012
  • Exchange 2013 CAS on WS2012
  • Exchange 2013 Mailbox Server on WS2012
  • Because real-time optimization is not being performed, the VMs will grow over time as changes are made and data is added. The manual deduplication job would need to be run as needed to recover space.
  • Since the SSD actually contains more raw duplicated data than the drive can hold, I’m unable to disable deduplication without moving some data off the volume first.
  • Even though more VMs can be added to this volume, you have to be sure that there is sufficient free space on the volume to perform deduplication.
For even more information about Windows Server 2012 data deduplication, I encourage your to read Step-by-Step: Reduce Storage Costs with Data Deduplication in Windows Server 2012!
I hope you find this article useful in your own deployments and I’m interested to know what your experience is.  Please leave a comment below!