Posts filed under 'Technology & Innovation'

Say Hello to Blackbird !

With inSync v4.0 going live last week, Druva showcased the new Blackbird storage engine which introduces a new concept called – “Application Aware Data Deduplication”. This new engine although currently only available in inSync, will form the core of all future product offerings.



The idea of “app-aware deduplication” emerged from the fact that complex applications like MS Outlook or Exchange need much more intelligent deduplicate removal than simple block based approach.



Each data block in PST is of fixed size and usually contains a header and a footer (ref: libpst ) which makes it impossible for simple dedupe approaches to identify block boundaries and hence restricting deduplication accuracy to just 30-40%.

Application aware data deduplication depends upon APIs exposed by the application to understand the construct of on-disk data and deduplicate at the logical-block or message level. This guarantees 100% deduplication accuracy and faster processing of data.

Another interesting change is shift from PostgreSQL database to no-SQL Oracle embedded database. This small (less than 1MB in size) embedded database removes the heavy “SQL” and networking layer between the server and database, hence greatly improving performance and scalability. The new engine can now support 16TB of dedupe data and about 200 parallel backups.

In a nutshell, the Blackbird engine will have the following features -

  1. App-Aware deduplication
  2. Light-weight and highly scalable
  3. Simple to install and zero-maintenance
  4. Near-CDP – timeline/event based near-continuous backups
  5. Search enabled restores
  6. Replication (to be showcased soon :)



InSync v4.0
InSync v4 is definitely a new benchmark for laptop backup. I am extremely confident that if anyone tries this solution will never buy anything else for laptop backup. With new storage, redesigned WAN Optimization and dashboard, its clearly leaps and bounds ahead of what’s available in the market.

More about new features – http://www.druva.com/insync/version-4-0
Download inSync v4 – http://www.druva.com/download/insync

1 comment September 6th, 2010

Introducing The Blackbird Storage Engine

Today inSync team released the first internal alpha build of all new blackbird engine, which is going to be featured in inSync v4.0

Druva BlackBird



And what I saw totally changed my perception of  ”fast” and “small”. Forever.





If you have missed my earlier posts, Blackbird (name inspired from the SR71-Blackbird) is a new storage engine for inSync and Phoenix which will introduce Invent “Application Aware Data Deduplication” and get rid of any dependency on a SQL database.

Getting rid of SQL database shrinks the Blackbird core to only about 2 MB against current 32MB. It also avoids the overhead of TCP connects to SQL DB and SQL query optimizer, making it blazingly fast and light weight.

Application aware deduplication makes sure data deduplication happens at the logical level. For example, rather than treating a PST file like a stream of blocks, we would understand and deduplicate message by message. This guarantee faster processing and 100% deduplication accuracy.

Quickly highlighting the features from today’s presentation -

  1. Fast - Amazing speed.
  2. Light-weight - Multi-threaded. No SQL Database.
  3. Scalable – High level of fault tolerance against data corruptions in the store.
  4. Small - InSync v4.0 will probably shrink to just 12MB !!
  5. Simple – No DB required for installation !

The public beta for InSync v4.0 is expected by end of April. I am extremely eager to showcase this engineering marvel.

Will follow with more posts and some screenshots !

6 comments March 15th, 2010

Kingston 500GB USB Flash Drive !

First of all sorry for not updating this blog regularly, hopefully we soon will have a great news to break on the blog :)

Milind and myself were killing some free time in a Chinese market in New Delhi, and we suddenly noticed Kingston USB flash drives with 256GB capacity costing about $30. We checked everything and it looked all real :)

Flash Drive

I quickly googled and saw the same product on Kingston website. Well, I started to bargain and put my skills to test. And soon the poor guy was ready to sell it for $20 :) And just before I was about to pay, as a last (shameless) bargain tactic, I told to the guy “Well I am not too pleased, do you have something with bigger capacity” and holy shit, the next moment he handed over a 500GB Kingston Flash Drive. And now this was something, not even on the website !

Milind asked me, “Do you really want this ? I mean what will you use it for ?” and I said, well its a good toy for 20 bucks. And finally both of got ourselves, 2 flash drives of 500GB each :)

As soon I got to the flight, I started stress testing it … and surprisingly it still works !!

Shameless Geek Thought: And now I am thinking, if disks get bigger and bigger like this, backing them up would almost impossible. File-systems and hardware vendors surely need to come up with something intelligent (possibly like NTFS change log)  to avoid scanning the device for changes. Surprisingly, such interfaces are still not present in Linux (ext*) and Solaris (JFS).

4 comments February 14th, 2010

Hello World !

After long waits and about 4 months of beta program, I am extremely excited to announce the general availability of Druvaa Phoenix v1.0.

The entire team has been super busy to make this happen. And I am sure it would be quite evident when you give it a try.

Druvaa Phoenix

Reinventing Backup

Phoenix is designed ground-up for remote backups. Here are some of the key product features which make it ultra special -

  1. Global Source Based Data Deduplication – Over 90% reduction in backup time, bandwidth and storage.
  2. WAN Optimization – Understands high latency and noisy networks.
  3. Near Continuous Data Protection – snapshot/restore-points based point-in-time restores. No age-old full, incremental backups.
  4. Smart Bandwidth Scheduling - Set smart bandwidth limits for each backup schedule.


The Road Ahead

What we currently have is just a platform which will be used to showcase some market changing features -

  1. Search Based Restore – We missed this feature in v1.0, but should be available in the next v1.2 release
  2. “Blackbird SR-71 – A new storage engine with application aware data deduplication. This should be able to match an attachment inside exchange store at New Jersey to a file stored in a file-server at Kent. This should set the standards for backup performance.
  3. Long Distance Replication – Replicate backed up data over noisy long distance IP networks.
  4. Advanced Dashboard – The second best reporting dashboard (after Google Analytics).

Application aware Agents - Phoenix currently only comes with generic Windows agent, we plan to introduce these starting v2.0

Useful Links -

I welcome you guys to download a copy and share your feedback !

2 comments December 15th, 2009

Why so much delay in inSync 3.1 and Phoenix ??

Well, first let me confess that inSync v3.1 took much more time than we planned.Time We had initially planned to release inSync by July 09 and Phoenix public beta by Sep 09.

In Short -
We are working on a new storage engine codename Blackbird (based on the SR-71 legend). The new engine will use application specific deduplication technology to improve performance and bandwidth/storage savings.

Initially planned for inSync v3.1 and Phoenix v1.0 , this now will be available in next major releases.

The longer version -
For the past two years, we have been doing experiments on various different algorithms for global source based data deduplication. While releasing inSync v2.0 we finalized on chunk based or variable-block based data deduplication, because of the simple fact that it was tough to find similar data blocks at natural block boundaries across different users. We also worked on the performance which gradually improved over time.

While the approach was reasonably accurate, there was a scope of significant improvement. We realized that 90% of the backup data on customer PCs comes from the documents and PST files, hence something totally focussed on PST files can dramatically improve the deduplication performance.

Also, while working on Phoenix, we came across a bigger challenge of finding duplicates across different data sources within the enterprise. We soon realized that simple block based approach will not take us too far. We also realized that most of the vendors use fixed and variable block/chunk based hashing techniques. This works well for them, because they have been treating backups as “byte streams”, and the only way to remove duplicates is fixed or variable size data deduplication.

Looking at various data types and possible ways improve, we could clearly see two fundamental changes in our approach which could bring paradigm shift in data deduplication -

  1. For accuracy – Application aware data deduplication
  2. For performance – Hierarchical block based deduplication

Application aware deduplication, can actually pin point duplicates across PST file attachments and  normal office documents.

On the PC side, majority of the data is office documents and Email files. This makes it simpler to introduce the new approach, but still a lot of work needs to be done to productise it. For Phoenix, the problem is much bigger and would take some more time to solve.

The new engine should be ready soon. It would be shipped first in inSync v4.0 early next year and then in Phoenix v2.0 . In the next few posts, I will try and get some benchmark data.

3 comments November 9th, 2009

Six things customers love about inSync & three things they don’t

As I mentioned in the last post, Druvaa acquired some great customers across 12 different countries. A large majority of the new customers are large enterprises from Technology and Finance verticals. The smallest deal size was about 20 licenses and largest about 10,000 licenses.

As a company policy we don’t engage in traditional enterprise selling, instead just help the interested customers buy mainly through the website. With this model, its even more important for to know what the customer likes and dislikes about the product.

Customer Survey

I tried to reach almost all major customers to see why they purchased inSync and some others on why they did not. In this post I have tried to summarize the response I received.

Six things the customers really love about the product -

  1. Usability and ease of use – Especially the quick 20 minutes setup time for inSync v3 was very much appreciated
  2. Data deduplication and time savings - On an average the customers are seeing 1:15 storage and bandwidth savings compared to traditional softwares.
  3. WAN performance – Most of the customers are backing up mobile workforce easily and enjoying this feature.
  4. Search based restore – Search is used by almost 100% of the users and absolutely loved by all. This is  something truly unique with inSync.
  5. Granular user policy control – Control over bandwidth, folders and restore policies is appreciated by all.
  6. Invisible backups – Smaller, smarter backup and specially the flexible scheduling is liked by all.

One of the fast emerging hot favorite feature is Bare Metal Restore, but I believe it may take some more time for this feature to get adopted properly.

Three things the customers or prospects would like to see improve -

  1. Mac Client - We do have Linux and Windows client, and all I can say is that Mac client is on its way :)
  2. Price - In the current economic scenario, unfortunately pricing is a major issue. This is probably a reason we saw sharp rise in sales when we offered the online introductory discount on v3. We do plan to release a low cost offering for inSync in near future
  3. Performance for large backup/restore- With the new release there are a few performance issues for non-compressible data especially over the Gigabit network. This is expected to be taken care of with next release.

With the new v3 release now production ready, in the next few posts I would like to outline the product roadmap and whats happening with the new product – Druvaa Pheonix.

1 comment April 29th, 2009

Druvaa inSync v3 Beta – Usability Enhancements

Not sure if you have noticed, we have made significant changes to the website. It’s a complete redesign with primary focus on usability. I have requested George to write a post about it.
Take a look – http://www.druvaa.com/


We also released the new v3 beta. The new beta (just like the website) has lot more colors, style, wizards and a whole bunch of usability enhancements. Take a look at the screenshots -





2 comments February 26th, 2009

Six Common Usability Mistakes in Software Product Design

By now, all good designers and developers realize the importance of usability for their work. Usable products offer great user experiences, and great user experiences lead to happy customers.

Six common mistakes and recommendations for product design and usability -

1. Usability Vs Utility
Utility refers to the ability of the product to perform tasks. The more tasks the product is designed to perform, the more utility it has. Usability refers to the ease of learning and performing these tasks.

utility vs usability

Most software give higher priority to features than usability. As a result it becomes more and more confusing for the end user to get work done.

2. Liking it Vs Using It
Likeability is always a desirable trait in a product. If people like the product, they are more likely to use it and to recommend it to others. But as with utility, likeability is often confused with usability.

Liking It - Skype

People often like a product for reasons unrelated to utility and usability. They may be attracted to its styling and flash, or to the status they believe the product confers upon them. People tend to like highly usable products, but you should not assume that means a well-liked product is usable.

3. Discovery Vs Flow
Some of the most widely used products do not have an instructions manual e.g. toothbrush or Skype.

Products with “Installation Manuals” turn me off. IMO, there is place on earth for installation manuals. And admin guides should be only when you want to learn a little extra or troubleshoot. Instead the product should try and use inline or in-GUI help as much as possible.

Discovery involves looking for, and finding, a product’s feature in response to a particular need. And it gets worse when a complex feature needs multiple inputs or choices to be made.

I am a big fan of wizards. I think the task becomes much simpler when broken down into series of actions.

4. Tiny Meaningless buttons
The buttons should signify action. The most common mistake with buttons is when they are labeled “OK” which in my opinion makes very little sense.

button with action

Buttons should have lables which signify clear actions like – “Modify Report Schedule”

5. Duplicate Actions
Quite often, products have more than one ways of performing the same task. This is confusing and often irritating. There should always be one clear way of performing an action.

6. Don’t Give Too Many Choices
Never confuse flexibility with giving too many choice. You would never buy car from a salesman who gives you 8-10 names for “the cars that might suit you”. Instead you are more likely to buy, when the salesman gives you 1 (or max 2) options and convinces you.

Too many choices

If you are sure that more than 80% of your audience is likely to vote “yes” for the option, please make it a default. Or if you really want it, add it to “advanced”.

Disclosure
If you think inSync is a very user friendly product, you would be pleasently surprised with the upcoming upgrade. Usability has been one of our core focus areas in inSync v3 release.

6 comments February 22nd, 2009

The Dark Side of The Cloud

We all pay our monthly electricity bills. I am sure no one wants to own a power plant :) But, on the contrary most of us own cars and very few rent it for daily use.

The two most important factors which decide how we want to use these two services are -

  1. The cost of ownership
  2. The cost and effort in maintenance

Cloud computing today promises benefits (which are similar to using electricity) for computing, hosted application and storage. Although the offer is very lucrative, but their is a dark side to this as well.

The post just tries to some aspects which you must keep in mind before making the plunge.

The Dark Side of Cloud

The Dark Side of the Cloud

Application Integration

Most of the services like SimpleDB, EBS, SQS still needs a lot of application integration and porting. And that’s something enterprises hate. It’s one of the primary reasons the X86 architecture and IPV4 are so widely used. Even if someone ports the application to these services, he is guaranteed to be locked with it for the rest of his life :)

Services like salesforce.com don’t need any porting, but there have been cases of access to data being refused customers who wish to change the vendor.

Uptime and QoS Guarantees

Most of these services including Amazon and Salesforce do not give uptime and QoS guarantees. The billing and EULA are free from any such clauses.

And when there is a downtime, you can’t do much than start calling the support center to play the blame-game.  And its funny when see the the cloud provider talking the same language to its service provider :)

It’s No Way Even Close to Perfect

Take a recent unfortunate situation for Ylastic, a company that provides a single front-end to manage Amazon Web Services, who was recently an unwillingly participant in one of these cloud bursts. Ylastic noticed something strange occurring with one of the Amazon Elastic Cloud Compute (EC2) Elastic Block Stores (EBS).

But something wasn’t quite right. And over the course of a few hours the story played out via Twitter as Ylastic noticed issues with its EBS instances. When the problem was finally identified, Ylastic discovered that the data could not be recovered. They were forced to recover from an earlier snapshot, that contained only a subset of the data.

Finally, after recovering what data they could, Ylastic had to go to its customers with the unfortunate message:

“AWS has finally terminated the frozen instances. But the EBS volume is still detaching and has been for hours. It doesn’t seem like we will be able to get into it at this point. Some time in the last month or so, our EBS snapshotting of this stuck volume seems to have stopped working correctly…. We have gone back and run through all the snapshots, and the last good snapshot that we have is from October 1.”

Who was at fault? Amazon? Ylastic? Truly, no one. It was simply a combination of issues. A perfect storm in the cloud, as it were. And that perfect storm resulted in data loss for Ylastic and its customer base.

Control

Take for example the case when you take up a cheap hosted website plan on a shared server. You can still negotiate uptime and QoS guarantees. But, what you just can’t control is a SPAM King sharing the same server and IP address with you :)

Most likely you will face two problems -

  1. A slow response on the website- the SPAM King has taken up the computing
  2. Public mail servers will mark the mail traffic from you as spam :)

Plus, there been many stories around salesforce (read this and this) and twitter getting hacked.

ROI

Cost of ownership for a power plant is so damn high, that you just can’t afford one even if you are not happy with your power company. That exactly has to be the case for the cloud.  No one would think of hosting his own solution when the cloud offers the same peanuts.

2 comments February 11th, 2009

File-systems Vs Databases

This topic has been on my plate for some time now. It’s interesting to see how databases have come a long way and have clearly out-shadowed file-systems for storing structured or unstructured information.

Technically, both of them support the basic features necessary for data access. For example both of them ensure  -

  • Data is managed to ensure its integrity and quality
  • Allow shared access by a community of users
  • Use of well defined schema for data-access
  • Support a query language

But, file-systems seriously lack some of the critical features necessary for managing data. Lets take a look at some of these feature.

Transaction support
Atomic transactions guarantee complete failure or success of an operation. This is especially needed when there is concurrent access to same data-set. This is one of the basic features provided by all databases.

But, most file-systems don’t have this features. Only the lesser known file-systems – Transactional NTFS(TxF), Sun ZFS, Veritas VxFS support this feature. Most of the popular opensource file-systems (including ext3, xfs, reiserfs) are not even POSIX compliant.

Fast Indexing
Databases allow indexing based on any attribute or data-property (i.e. SQL columns). This helps fast retrieval of data, based on the indexed attribute. This functionality is not offered by most file-systems i.e. you can’t quickly access “all files created after 2PM today”.

The desktop search tools like Google desktop or MAC spotlight offer this functionality. But for this, they have to scan and index the complete file-system and store the information in a internal relational-database.

Snapshots
Snapshot is a point-in-time copy/view of the data. Snapshots are needed for backup applications, which need consistent point-in-time copies of data.

The transactional and journaling capabilities enable most of the databases to offer snapshots without shopping access to the data. Most file-systems however, don’t provide this feature (ZFS and VxFS being only exceptions). The backup softwares have to either depend on running application or underlying storage for snapshots.

Clustering
Advanced databases like Oracle (and now MySQL) also offer clustering capabilities. The “g” in “Oracle 11g” actually stands for “grid” or clustering capability. MySQL offers shared-nothing clusters using synchronous replication. This helps the databases scale up and support larger & more-fault tolerant production environments.

File systems still don’t support this option :(   The only exceptions are Veritas CFS and GFS (Open Source).

Replication
Replication is commodity with databases and form the basis for disaster-recovery plans. File-systems still have to evolve to handle it.

Relational View of Data
File systems store files and other objects only as a stream of bytes, and have little or no information about the data stored in the files. Such file systems also provide only a single way of organizing the files, namely via directories and file names. The associated attributes are also limited in number e.g. – type, size, author, creation time etc. This does not help in managing related data, as disparate items do not have any relationships defined.

Databases on the other hand offer easy means to relate stored data. It also offers a flexible query language (SQL) to retrieve the data. For example, it is possible to query a database for “contacts of all persons who live in Acapulco and sent emails yesterday”, but impossible in case of a file system.

File-systems need to evolve and provide capabilities to relate different data-sets. This will help the application writers to make use of native file-system capabilities to relate data. A good effort in this direction was Microsoft WinFS.

Conclusion

The only disadvantage with using the databases as primary storage option, seems to be the additional cost associated. But, I see no reason why file-systems in future will borrow features from databases.

Disclosure

Druvaa inSync uses a proprietary file-system to store and index the backed up data. The meta-data for the file-system is stored in an embedded PostgreSQL database. The database driven model was chosen to store additional identifiers withe each block – size, hash and time. This helps the filesystem to -

  1. Divide files into variable sized blocks
  2. Data deduplication – Store single copy of duplicate blocks
  3. Temporal File-system – Store time information with each block. This enables faster time-based restores.

5 comments January 25th, 2009

Previous Posts


Categories

Subscribe

Calendar

September 2010
M T W T F S S
« Jul    
 12345
6789101112
13141516171819
20212223242526
27282930  

Archives

Blogroll

Meta

Tags

backup bare metal restore beta blackbird Business data backup cleantech cloud storage data backup database data dedeuplication data deduplication data protection Disaster Recovery discount Druva druvaa druvaa insync v3 Druva inSync enterprise backup Enterprise PC backup software enterprises File system greentech inSync insync roadmap laptop backup Laptop Backup Software new release news Notebook Backup offsite backup pc backup PC Backup Software Performance Improvement product design python performance optimization remote backup restore ROI search software storage storage growth technology usability

Visitors Online