<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Druva Blog &#187; database</title>
	<atom:link href="http://blog.druva.com/tag/database/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.druva.com</link>
	<description>Enterprise Data Backup and Beyond</description>
	<lastBuildDate>Wed, 21 Dec 2011 23:25:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>File-systems Vs Databases</title>
		<link>http://blog.druva.com/2009/01/25/file-systems-vs-databases/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=file-systems-vs-databases</link>
		<comments>http://blog.druva.com/2009/01/25/file-systems-vs-databases/#comments</comments>
		<pubDate>Sun, 25 Jan 2009 20:05:46 +0000</pubDate>
		<dc:creator>Jaspreet</dc:creator>
				<category><![CDATA[Data Protection]]></category>
		<category><![CDATA[Technology & Innovation]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[File system]]></category>
		<category><![CDATA[restore]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://blog.druvaa.com/?p=124</guid>
		<description><![CDATA[This topic has been on my plate for some time now. It&#8217;s interesting to see how databases have come a long way and have clearly out-shadowed file-systems for storing structured or unstructured information. Technically, both of them support the basic &#8230; <a href="http://blog.druva.com/2009/01/25/file-systems-vs-databases/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This topic has been on my plate for some time now. It&#8217;s interesting to see how databases have come a long way and have clearly out-shadowed file-systems for storing structured or unstructured information.</p>
<p>Technically, both of them support the basic features necessary for data access. For example both of them ensure  -</p>
<ul>
<li>Data is managed to ensure its integrity and quality</li>
<li>Allow shared access by a community of users</li>
<li> Use of well defined schema for data-access</li>
<li> Support a query language</li>
</ul>
<p>But, file-systems seriously lack some of the critical features necessary for managing data. Lets take a look at some of these feature.</p>
<p><strong>Transaction support</strong><br />
Atomic transactions guarantee complete failure or success of an operation. This is especially needed when there is concurrent access to same data-set. This is one of the basic features provided by all databases.</p>
<p>But, most file-systems don&#8217;t have this features. Only the lesser known file-systems &#8211; <a title="Transactional NTFS (TxF)" href="http://en.wikipedia.org/wiki/Transactional_NTFS" target="_blank">Transactional NTFS(TxF)</a>, <a title="Sun ZFS" href="en.wikipedia.org/wiki/ZFS" target="_blank">Sun ZFS</a>, <a title="Veritas VxFS" href="en.wikipedia.org/wiki/Veritas_File_System" target="_blank">Veritas VxFS</a> support this feature. Most of the popular opensource file-systems (including ext3, xfs, reiserfs) are not even POSIX compliant.</p>
<p><strong>Fast Indexing</strong><br />
Databases allow indexing based on any attribute or data-property (i.e. SQL columns). This helps fast retrieval of data, based on the indexed attribute. This functionality is not offered by most file-systems i.e. you can&#8217;t quickly access &#8220;all files created after 2PM today&#8221;.</p>
<p>The desktop search tools like Google desktop or MAC spotlight offer this functionality. But for this, they have to scan and index the complete file-system and store the information in a internal relational-database.</p>
<p><strong>Snapshots</strong><br />
Snapshot is a point-in-time copy/view of the data. Snapshots are needed for backup applications, which need consistent point-in-time copies of data.</p>
<p>The transactional and journaling capabilities enable most of the databases to offer snapshots without shopping access to the data. Most file-systems however, don&#8217;t provide this feature (ZFS and VxFS being only exceptions). The backup softwares have to either depend on running application or underlying storage for snapshots.</p>
<p><strong>Clustering</strong><br />
Advanced databases like Oracle (and now MySQL) also offer clustering capabilities. The &#8220;<strong>g</strong>&#8221; in &#8220;Oracle 11<strong>g</strong>&#8221; actually stands for &#8220;<em>grid</em>&#8221; or clustering capability. MySQL offers shared-nothing clusters using synchronous replication. This helps the databases scale up and support larger &amp; more-fault tolerant production environments.</p>
<p>File systems still don&#8217;t support this option <img src='http://blog.druva.com/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' />   The only exceptions are Veritas CFS and GFS (Open Source).</p>
<p><strong>Replication</strong><br />
Replication is commodity with databases and form the basis for disaster-recovery plans. File-systems still have to evolve to handle it.</p>
<p><strong>Relational View of Data</strong><br />
File systems store files and other objects only as a stream of bytes, and have little or no information about the data stored in the files. Such file systems also provide only a single way of organizing the files, namely via directories and file names. The associated attributes are also limited in number e.g. &#8211; type, size, author, creation time etc. This does not help in managing related data, as disparate items do not have any relationships defined.</p>
<p>Databases on the other hand offer easy means to relate stored data. It also offers a flexible query language (SQL) to retrieve the data. For example, it is possible to query a database for <em>&#8220;contacts of all persons who live in Acapulco and sent emails yesterday&#8221;,</em> but impossible in case of a file system.</p>
<p>File-systems need to evolve and provide capabilities to relate different data-sets. This will help the application writers to make use of native file-system capabilities to relate data. A good effort in this direction <em>was</em> <a title="Microsoft WinFS" href="http://en.wikipedia.org/wiki/WinFS" target="_blank">Microsoft WinFS</a>.</p>
<h3>Conclusion</h3>
<p>The only disadvantage with using the databases as primary storage option, seems to be the additional cost associated. But, I see no reason why file-systems in future will borrow features from databases.</p>
<h3>Disclosure</h3>
<p>Druvaa inSync uses a proprietary file-system to store and index the backed up data. The meta-data for the file-system is stored in an embedded PostgreSQL database. The database driven model was chosen to store additional identifiers withe <em>each </em>block &#8211; size, hash and time. This helps the filesystem to -</p>
<ol>
<li>Divide files into variable sized blocks</li>
<li>Data deduplication &#8211; Store single copy of duplicate blocks</li>
<li>Temporal File-system &#8211; Store time information with each block. This enables faster time-based restores.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://blog.druva.com/2009/01/25/file-systems-vs-databases/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

