In the first part of my 2-part review of Windows Server 8 I looked at some of the best of the more than 300 new features Microsoft packed into the upcoming server OS. Now it's time to turn our attention to some massive storage enhancements.
My personal highlight of the entire three days of the Windows Server 8 reviewers workshop were the talks about Storage. The killer-feature to me is the new and built-in data deduplication, which detects duplicate data in files and folders, puts it in a separate store (System Volume Information) and simply gets rid of the redundant bytes. The file itself is 100% intact, though once it gets accessed it pulls the (now missing) information back from the one single data store.
Now, deduplication isn't something groundbreaking. It's been done before, and it's been done well, but dedup has never found its way into the OS, which means it's deeply integrated and highly manageable. Microsoft Research invested 2 years on this algorithm and came up with techniques to minimize the performance impact caused by pulling one piece of data from one part of the disk and when fetching other pars from the data duplication store (fragmentation!); according to Microsoft's server team, dedup has a less than 3-4% impact on overall performance when accessing the data, although only performance tests will tell the true story.
However, the benefit greatly outweighs the possible downsides. Generally, you can expect a chunking rate of between 30% and 90%, which is absolutely amazing. On day 3 of the Windows Server 8 reviewers workshop, I had the chance to catch up with the development and program management team behind data deduplication and found out a couple of interesting tidbits:
- Deduplication automatically runs on "idle". Say you've enabled deduplication on drive E and copy 20 gigs of files over, deduplication wouldn't start immediately. It would, however, wait until the server isn't quite as busy and perform the deduplication process. You have to keep in mind that going through files and detecting data is quite an I/O eater.
- Admins can determine which files get deduplicated based on their age. Maybe you don't want to dedup files that the server just created.
- Failsafe: When asked about the possible risk of having one important part of possibly dozens or hundreds of files in one chunk, I was told that several failsafe mechanisms are in place that perform file system health checks on the specific area that the deduplicated data gets written to. Also, if your server accesses data very frequently (they wouldn't share a number), it gets automatically duplicated again to avoid possible risk.
And, by the way, this very same technique applies to network transfers (think VDI!) and even local RAM. Windows Server 8 saves local memory by finding duplicate information and getting rid of it.
Sign up for Computerworld eNewsletters.