<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Druva Blog &#187; data dedeuplication</title>
	<atom:link href="http://blog.druva.com/tag/data-dedeuplication/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.druva.com</link>
	<description>Enterprise Data Backup and Beyond</description>
	<lastBuildDate>Wed, 21 Dec 2011 23:25:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Green-ness of Data De-duplication</title>
		<link>http://blog.druva.com/2008/09/20/de-duplication-greentech/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=de-duplication-greentech</link>
		<comments>http://blog.druva.com/2008/09/20/de-duplication-greentech/#comments</comments>
		<pubDate>Sat, 20 Sep 2008 18:49:57 +0000</pubDate>
		<dc:creator>Jaspreet</dc:creator>
				<category><![CDATA[Data Protection]]></category>
		<category><![CDATA[Technology & Innovation]]></category>
		<category><![CDATA[cleantech]]></category>
		<category><![CDATA[data dedeuplication]]></category>
		<category><![CDATA[greentech]]></category>
		<category><![CDATA[inSync]]></category>
		<category><![CDATA[storage growth]]></category>

		<guid isPermaLink="false">http://blog.druvaa.com/?p=46</guid>
		<description><![CDATA[The Storage Hunger Sale of disk-bases storage system has already crossed 2500 Petabytes in 2008 and up by 58.1% YOY (One petabyte = 1 Million Gbs). These figures do not include the direct attached storage which comes pre-loaded with PCs &#8230; <a href="http://blog.druva.com/2008/09/20/de-duplication-greentech/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h3>The Storage Hunger</h3>
<p>Sale of disk-bases storage system has already crossed 2500 Petabytes in 2008 and up by 58.1% YOY (One petabyte = 1 Million Gbs). These figures do not include the direct attached storage which comes pre-loaded with PCs or servers.[<a title="Storage  Growth" href="http://www.infomaticsonline.co.uk/vnunet/news/2141818/storage-hunger-reaches-457" target="_blank">1</a>]</p>
<p>This is understandable as 1TB (1000GB) storage NAS/SAN devices are now commodity. The top three vendors in this space are HP, IBM and EMC with market share of aprroximately 29%, 20% and 14% respectively.[<a title="Storage  Growth" href="http://www.infomaticsonline.co.uk/vnunet/news/2141818/storage-hunger-reaches-457" target="_blank">2</a>]</p>
<p>The overall consumption doubles when this storage is backed up <img src='http://blog.druva.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h3>Energy Consumption</h3>
<p>On an average a dataceter consumes 100 Watts/sq-feet of energy and the <span style="text-decoration: underline">best</span> solid state storage consumes about 5 watts for 1MB IOPs.[<a title="Storage energy consumption comparision" href="http://www.lsi.com/powerconsumption/ESG_Power_Efficiency_WP_082807.pdf" target="_blank">3</a>]</p>
<p>This puts the total cost for mainiating (cooling + power) for 1 TB disk array about USD $2,500/annually. (16c for KWh, and 20 GB average daily usage).</p>
<p>This makes the annual energy consumption of newly bought storage = <strong>USD 5 Billion</strong> !!!</p>
<p>And backing this 5 Billion dollar inventory surely adds couple of more billions.</p>
<h3>Data De-duplication</h3>
<p>The data de-duplication technology saves single copy of duplicate data. There are two important aspects of any data de-duplication solution/product -</p>
<ol>
<li>Scope of duplicate discovery &#8211; File-level / Sub-File level / Block level</li>
<li>Point of duplicate discovery &#8211; Source / Target</li>
</ol>
<p>Most of the storage vendors which use data de-duplication provide block-level duplicate removal at target (i.e. when the data reached the storage). But, its not very difficult to image that source level removal of sub-file or block level duplicates would be much better for two reasons -</p>
<ol>
<li>Sending lesser/de-duplicated data saves time and bandwidth (apart from storage)</li>
<li>Duplicate discovey would be much better as you have access to the structured data</li>
</ol>
<p>Consindering Microsoft&#8217;s report on de-duplicate assessment [<a title="Microsoft Report: SIS and its internal affects at Microsoft" href="https://www.microsoft.com/downloads/details.aspx?FamilyID=99F8EE58-4FAF-4951-BA84-7237B5C639B5&amp;displaylang=en" target="_blank">4</a>], -</p>
<ol>
<li>20-30% data duplicates are easily visible even in unstructured data source like ERP databases</li>
<li>40-80% data duplicates can be seen in file-servers and mail servers.</li>
<li>60-90% data duplicates can be seen between different PCs. (Just my observation and opinion)</li>
</ol>
<p>On an average a conservative 30% data duplicate removal can <strong>save $1.6B on storage energy and $2B on bandwidth costs and backups</strong>.</p>
<p></p>
<h3>De-duplication and Druvaa</h3>
<p>We see Druvaa inSync as a product/platform to provide de-duplicated (<span style="text-decoration: underline">at source</span>) backup for PCs, PDAs and servers. The current version is available for just PCs and we can easily see up to 90% savings for <strong>time and cost</strong> (bandwidth and storage) for enterprises.</p>
<p>I just don&#8217;t see a reason why all storage and backup vendors wouldn&#8217;t do it. EMC and Netapp have already announced de-duplcation as additionally licenssible technology on their arrays (<span style="text-decoration: underline">target based</span>).[<a title="NetApp annoucement" href="www.netapp.com/us/company/news/news_rel_20070515.html" target="_blank">5</a>] No major vendor except for EMC has announced agent/source based de-dup though.[<a title="EMC Avamar" href="www.emc.com/products/detail/software/avamar.htm" target="_blank">6</a>]</p>
<p>Surely, Druvaa has a good lead and cashing on it <img src='http://blog.druva.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.druva.com/2008/09/20/de-duplication-greentech/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

