Motif #4.1

Every well-planned release has a motif, a term often used to describe a dominant theme in a literary, artistic, or musical work.

In all the early releases, the motif for Druva has been Simplicity: simplicity for both end-users and for IT administrators. For end users, Druva inSync is now so simple to use that it just works without their knowing. It’s completely non-intrusive and works on all kinds of networks such as WAN and VPN. It’s so simple to use that end users can access their data from any Web browser without having to contact IT. Likewise, for IT administrators, it’s so simple to download (~40MB), install (under 20 minutes), and manage (almost zero maintenance) that the total cost of ownership is almost negligible. The simplicity motif has been a strong differentiator for Druva’s offerings.

For inSync 4.0, we made Storage and Bandwidth Optimization the motif. To optimize storage, we introduced App-aware Dedupe, an industry-first dedupe technology that offers a 90% storage savings across all user data and a 100% dedupe accuracy at the source (laptops) for supported applications such as Outlook and Office. To optimize bandwidth, we introduced the Octopus WAN optimization engine, a multi-threaded client architecture that does smart bandwidth throttling to offer a 5x performance gain for every client backing up on WAN.

The eye-catching red shack on the wharf (Rockport, Massachusetts) is often called Motif #1, a reference to its popularity among artists.

The theme for inSync 4.1 emerged naturally to “Scale” as customers were increasingly deploying Druva to more users in each of their environments. With release 4.1, we wanted to make inSync scale efficiently along several dimensions as outlined below –

Scale -

  • 2000 users per server
  • 16TB of data per server
  • 200 parallel connections per server

Performance -

  • We’re excited to introduce an innovative HyperCache technology, which can improve backup performance by 6x compared to inSync 4.0. HyperCache is an in-memory cache that can be configured to access the most optimal subset of your dedupe index in memory resulting in a high hit rate. The usual 80-20 rule applies here: with just a 30% subset of the dedupe index, Hypercache can deliver upwards of 75% hit rate. We recommend a 4GB of HyperCache size for every 1TB of data to maximize performance. The admin console offers a simple way for you to configure HyperCache for optimal performance.
  • You can now configure an SSD storage for your dedupe index to further enhance your server performance. Lab results show a whopping 12x performance improvement with HyperCache and SSD configurations.
  • You can now install Druva on a 64-bit system for enhanced performance.

Administration

  • 4.1 now supports a new administrative role in addition to a Server Administrator. A Profile Administrator role grants permissions to manage one or more user profiles in order to edit profile settings, add users, and manage data restore for those profiles. This is a great way to scale the administration tasks across your organization between server and profile administration.
  • In light of the above role, we’ve enhanced our dashboard and reporting, so an administrator can get a customized view of their reports depending on their role.
  • You can now automate the import of users to inSync from your Active Directory. A periodic import from your AD can be set up to dynamically add users to inSync.

Access -

We’re excited about the upcoming deployments of inSync 4.1 and the performance benefits to all of you. In my next blog, I’ll talk about the 2 editions of inSync 4.1 (Enterprise and Professional), how they compare, and which one is right for you. Stay tuned….

 

Green-ness of Data De-duplication

The Storage Hunger

Sale of disk-bases storage system has already crossed 2500 Petabytes in 2008 and up by 58.1% YOY (One petabyte = 1 Million Gbs). These figures do not include the direct attached storage which comes pre-loaded with PCs or servers.[1]

This is understandable as 1TB (1000GB) storage NAS/SAN devices are now commodity. The top three vendors in this space are HP, IBM and EMC with market share of aprroximately 29%, 20% and 14% respectively.[2]

The overall consumption doubles when this storage is backed up :)

Energy Consumption

On an average a dataceter consumes 100 Watts/sq-feet of energy and the best solid state storage consumes about 5 watts for 1MB IOPs.[3]

This puts the total cost for mainiating (cooling + power) for 1 TB disk array about USD $2,500/annually. (16c for KWh, and 20 GB average daily usage).

This makes the annual energy consumption of newly bought storage = USD 5 Billion !!!

And backing this 5 Billion dollar inventory surely adds couple of more billions.

Data De-duplication

The data de-duplication technology saves single copy of duplicate data. There are two important aspects of any data de-duplication solution/product -

  1. Scope of duplicate discovery – File-level / Sub-File level / Block level
  2. Point of duplicate discovery – Source / Target

Most of the storage vendors which use data de-duplication provide block-level duplicate removal at target (i.e. when the data reached the storage). But, its not very difficult to image that source level removal of sub-file or block level duplicates would be much better for two reasons -

  1. Sending lesser/de-duplicated data saves time and bandwidth (apart from storage)
  2. Duplicate discovey would be much better as you have access to the structured data

Consindering Microsoft’s report on de-duplicate assessment [4], -

  1. 20-30% data duplicates are easily visible even in unstructured data source like ERP databases
  2. 40-80% data duplicates can be seen in file-servers and mail servers.
  3. 60-90% data duplicates can be seen between different PCs. (Just my observation and opinion)

On an average a conservative 30% data duplicate removal can save $1.6B on storage energy and $2B on bandwidth costs and backups.

De-duplication and Druvaa

We see Druvaa inSync as a product/platform to provide de-duplicated (at source) backup for PCs, PDAs and servers. The current version is available for just PCs and we can easily see up to 90% savings for time and cost (bandwidth and storage) for enterprises.

I just don’t see a reason why all storage and backup vendors wouldn’t do it. EMC and Netapp have already announced de-duplcation as additionally licenssible technology on their arrays (target based).[5] No major vendor except for EMC has announced agent/source based de-dup though.[6]

Surely, Druvaa has a good lead and cashing on it :)