Green-ness of Data De-duplication

The Storage Hunger

Sale of disk-bases storage system has already crossed 2500 Petabytes in 2008 and up by 58.1% YOY (One petabyte = 1 Million Gbs). These figures do not include the direct attached storage which comes pre-loaded with PCs or servers.[1]

This is understandable as 1TB (1000GB) storage NAS/SAN devices are now commodity. The top three vendors in this space are HP, IBM and EMC with market share of aprroximately 29%, 20% and 14% respectively.[2]

The overall consumption doubles when this storage is backed up :)

Energy Consumption

On an average a dataceter consumes 100 Watts/sq-feet of energy and the best solid state storage consumes about 5 watts for 1MB IOPs.[3]

This puts the total cost for mainiating (cooling + power) for 1 TB disk array about USD $2,500/annually. (16c for KWh, and 20 GB average daily usage).

This makes the annual energy consumption of newly bought storage = USD 5 Billion !!!

And backing this 5 Billion dollar inventory surely adds couple of more billions.

Data De-duplication

The data de-duplication technology saves single copy of duplicate data. There are two important aspects of any data de-duplication solution/product -

  1. Scope of duplicate discovery – File-level / Sub-File level / Block level
  2. Point of duplicate discovery – Source / Target

Most of the storage vendors which use data de-duplication provide block-level duplicate removal at target (i.e. when the data reached the storage). But, its not very difficult to image that source level removal of sub-file or block level duplicates would be much better for two reasons -

  1. Sending lesser/de-duplicated data saves time and bandwidth (apart from storage)
  2. Duplicate discovey would be much better as you have access to the structured data

Consindering Microsoft’s report on de-duplicate assessment [4], -

  1. 20-30% data duplicates are easily visible even in unstructured data source like ERP databases
  2. 40-80% data duplicates can be seen in file-servers and mail servers.
  3. 60-90% data duplicates can be seen between different PCs. (Just my observation and opinion)

On an average a conservative 30% data duplicate removal can save $1.6B on storage energy and $2B on bandwidth costs and backups.


De-duplication and Druvaa

We see Druvaa inSync as a product/platform to provide de-duplicated (at source) backup for PCs, PDAs and servers. The current version is available for just PCs and we can easily see up to 90% savings for time and cost (bandwidth and storage) for enterprises.

I just don’t see a reason why all storage and backup vendors wouldn’t do it. EMC and Netapp have already announced de-duplcation as additionally licenssible technology on their arrays (target based).[5] No major vendor except for EMC has announced agent/source based de-dup though.[6]

Surely, Druvaa has a good lead and cashing on it :)

Add comment September 20th, 2008


Categories

Subscribe

Calendar

September 2010
M T W T F S S
« Jul    
 12345
6789101112
13141516171819
20212223242526
27282930  

Archives

Blogroll

Meta

Tags

backup bare metal restore beta blackbird Business data backup cleantech cloud storage data backup database data dedeuplication data deduplication data protection Disaster Recovery discount Druva druvaa druvaa insync v3 Druva inSync enterprise backup Enterprise PC backup software enterprises File system greentech inSync insync roadmap laptop backup Laptop Backup Software new release news Notebook Backup offsite backup pc backup PC Backup Software Performance Improvement product design python performance optimization remote backup restore ROI search software storage storage growth technology usability

Visitors Online