High Performance Deduplication

Time and again multiple enterprise customers, especially those who are migrating from competing solutions, ask us about scalability of Druva inSync. Since the launch of v4.0, inSync has scaled exceptionally well, especially for large deployments. The software has succeeded where majority of competing solutions have failed or turned off deduplication.

About a week back, (on request of a large customer) we started testing one of the competing solutions. We tested the software for 1 million files of total size of 2TB, of which 48% was duplicate. Insync finished the backup in about 22 hours and the competing software is still backing up.

InSync doesn’t support any “integration” with deduplication, but the whole software was designed around the deduplication and CDP. There is NO flag to turn off dedupe and there never will be.

This article focuses on my thoughts on how Druva succeeds where majority of others fail.

Why Source Deduplication Fails to Scale for Majority Vendors ?
The biggest bottleneck for performance scalability of deduplication is the random disk IO performance. Almost all dedupe systems include a database to store the block-hash index which needs to be checked for every hash check. A server class magnetic disk usually offers a latency of 8-12ms which restricts the hash matches to about 100/sec, throttling the dedupe performance drastically.

Now, when the data set is small the entire index can reside in memory and hence the hash checks as much faster. As the index grows, the I/O congestion brings down the software’s capacity to perform inline deduplication.
Consider this: Just about 1000 users can create over 10 Billion blocks for backup. And checking them with a rate of 100/sec could take 3.21 years.

Learnings from Storage Guys
Data domain had an interesting approach. They optimized their inline dedupe performance for backup streams. Since the backup was mostly for servers with few large files and the data streams were mostly long streams of data in tar format, Data domain used a simple index read-ahead algorithm to load the relevant parts of the index before the stream blocks hashes reached the server. Since the streams changed less than 10% across two simultaneous backups, the algorithm helped deduplicate them at a very fast pace.

Solid State Disks
A simple solution to the random-I/O problem is using SSDs to store the index. Although we did tweaked/changed certain features to support SSDs but the solution wasn’t complete because of the size limitation imposed by them.

Two Step Approach for Druva: No-SQL + HyperCache
The “Data Domain approach” did not work for us as our data was much more random and coming from different sources. But on the flip side we had much more knowledge of the data formats we were backing up.
The first step towards scalability was to get rid of the inbuilt SQL database which imposed a lot of latency because of SQL query serialization and execution. We replaced PostgreSQL with Oracle no-SQL BDB as an embedded database, which improved the performance and much simpler to maintain.

The second major innovation was HyperCache – a selective in-memory cache of index. Hypercache constitutes of both a positive and a negative cache, which remembers and caches both the most probable and the least probable hashes for on-going backup. HyperCache uses an ever learning algorithm and uses different parameters like time, frequency and probability of a hash to cache it.

The Result
The result was 85% reduction in disk I/O by using 4GB of RAM for every 1TB of data stored. The reduction in IO translates to 4X better scalability, and the solution can easily scale to thousands of users with linear improvement in scalability/performance.

Use of SSDs further improves the performance by 6X. InSync core has been modified to keep only the most concurrent part of the database index on SSDs and optimize it for solid state drives.

Motif #4.1

Every well-planned release has a motif, a term often used to describe a dominant theme in a literary, artistic, or musical work.

In all the early releases, the motif for Druva has been Simplicity: simplicity for both end-users and for IT administrators. For end users, Druva inSync is now so simple to use that it just works without their knowing. It’s completely non-intrusive and works on all kinds of networks such as WAN and VPN. It’s so simple to use that end users can access their data from any Web browser without having to contact IT. Likewise, for IT administrators, it’s so simple to download (~40MB), install (under 20 minutes), and manage (almost zero maintenance) that the total cost of ownership is almost negligible. The simplicity motif has been a strong differentiator for Druva’s offerings.

For inSync 4.0, we made Storage and Bandwidth Optimization the motif. To optimize storage, we introduced App-aware Dedupe, an industry-first dedupe technology that offers a 90% storage savings across all user data and a 100% dedupe accuracy at the source (laptops) for supported applications such as Outlook and Office. To optimize bandwidth, we introduced the Octopus WAN optimization engine, a multi-threaded client architecture that does smart bandwidth throttling to offer a 5x performance gain for every client backing up on WAN.

The eye-catching red shack on the wharf (Rockport, Massachusetts) is often called Motif #1, a reference to its popularity among artists.

The theme for inSync 4.1 emerged naturally to “Scale” as customers were increasingly deploying Druva to more users in each of their environments. With release 4.1, we wanted to make inSync scale efficiently along several dimensions as outlined below –

Scale -

  • 2000 users per server
  • 16TB of data per server
  • 200 parallel connections per server

Performance -

  • We’re excited to introduce an innovative HyperCache technology, which can improve backup performance by 6x compared to inSync 4.0. HyperCache is an in-memory cache that can be configured to access the most optimal subset of your dedupe index in memory resulting in a high hit rate. The usual 80-20 rule applies here: with just a 30% subset of the dedupe index, Hypercache can deliver upwards of 75% hit rate. We recommend a 4GB of HyperCache size for every 1TB of data to maximize performance. The admin console offers a simple way for you to configure HyperCache for optimal performance.
  • You can now configure an SSD storage for your dedupe index to further enhance your server performance. Lab results show a whopping 12x performance improvement with HyperCache and SSD configurations.
  • You can now install Druva on a 64-bit system for enhanced performance.

Administration

  • 4.1 now supports a new administrative role in addition to a Server Administrator. A Profile Administrator role grants permissions to manage one or more user profiles in order to edit profile settings, add users, and manage data restore for those profiles. This is a great way to scale the administration tasks across your organization between server and profile administration.
  • In light of the above role, we’ve enhanced our dashboard and reporting, so an administrator can get a customized view of their reports depending on their role.
  • You can now automate the import of users to inSync from your Active Directory. A periodic import from your AD can be set up to dynamically add users to inSync.

Access -

We’re excited about the upcoming deployments of inSync 4.1 and the performance benefits to all of you. In my next blog, I’ll talk about the 2 editions of inSync 4.1 (Enterprise and Professional), how they compare, and which one is right for you. Stay tuned….

 

Say Hello to Blackbird !

With inSync v4.0 going live last week, Druva showcased the new Blackbird storage engine which introduces a new concept called – “Application Aware Data Deduplication”. This new engine although currently only available in inSync, will form the core of all future product offerings.

The idea of “app-aware deduplication” emerged from the fact that complex applications like MS Outlook or Exchange need much more intelligent deduplicate removal than simple block based approach.

Each data block in PST is of fixed size and usually contains a header and a footer (ref: libpst ) which makes it impossible for simple dedupe approaches to identify block boundaries and hence restricting deduplication accuracy to just 30-40%.

Application aware data deduplication depends upon APIs exposed by the application to understand the construct of on-disk data and deduplicate at the logical-block or message level. This guarantees 100% deduplication accuracy and faster processing of data.

Another interesting change is shift from PostgreSQL database to no-SQL Oracle embedded database. This small (less than 1MB in size) embedded database removes the heavy “SQL” and networking layer between the server and database, hence greatly improving performance and scalability. The new engine can now support 16TB of dedupe data and about 200 parallel backups.

In a nutshell, the Blackbird engine will have the following features -

  1. App-Aware deduplication
  2. Light-weight and highly scalable
  3. Simple to install and zero-maintenance
  4. Near-CDP – timeline/event based near-continuous backups
  5. Search enabled restores
  6. Replication (to be showcased soon :)

InSync v4.0
InSync v4 is definitely a new benchmark for laptop backup. I am extremely confident that if anyone tries this solution will never buy anything else for laptop backup. With new storage, redesigned WAN Optimization and dashboard, its clearly leaps and bounds ahead of what’s available in the market.

More about new features – http://www.druva.com/insync/version-4-0
Download inSync v4 – http://www.druva.com/download/insync