Why so much delay in inSync 3.1 and Phoenix ??
By Jaspreet on November 9th, 2009 under Data Protection,Technology & Innovation
Well, first let me confess that inSync v3.1 took much more time than we planned.
We had initially planned to release inSync by July 09 and Phoenix public beta by Sep 09.
In Short -
We are working on a new storage engine codename Blackbird (based on the SR-71 legend). The new engine will use application specific deduplication technology to improve performance and bandwidth/storage savings.
Initially planned for inSync v3.1 and Phoenix v1.0 , this now will be available in next major releases.
The longer version -
For the past two years, we have been doing experiments on various different algorithms for global source based data deduplication. While releasing inSync v2.0 we finalized on chunk based or variable-block based data deduplication, because of the simple fact that it was tough to find similar data blocks at natural block boundaries across different users. We also worked on the performance which gradually improved over time.
While the approach was reasonably accurate, there was a scope of significant improvement. We realized that 90% of the backup data on customer PCs comes from the documents and PST files, hence something totally focussed on PST files can dramatically improve the deduplication performance.
Also, while working on Phoenix, we came across a bigger challenge of finding duplicates across different data sources within the enterprise. We soon realized that simple block based approach will not take us too far. We also realized that most of the vendors use fixed and variable block/chunk based hashing techniques. This works well for them, because they have been treating backups as “byte streams”, and the only way to remove duplicates is fixed or variable size data deduplication.
Looking at various data types and possible ways improve, we could clearly see two fundamental changes in our approach which could bring paradigm shift in data deduplication -
- For accuracy – Application aware data deduplication
- For performance – Hierarchical block based deduplication
Application aware deduplication, can actually pin point duplicates across PST file attachments and normal office documents.
On the PC side, majority of the data is office documents and Email files. This makes it simpler to introduce the new approach, but still a lot of work needs to be done to productise it. For Phoenix, the problem is much bigger and would take some more time to solve.
The new engine should be ready soon. It would be shipped first in inSync v4.0 early next year and then in Phoenix v2.0 . In the next few posts, I will try and get some benchmark data.
Related Posts:
- Understanding Data Deduplication “Data deduplication is inarguably one of the most new important...
- Hello World ! After long waits and about 4 months of beta program,...
- File-systems Vs Databases This topic has been on my plate for some time...
- Druvaa inSync v3.0 Feature List This post is in reference to the forums discussion –...
- Druvaa inSync 2.0 Features Suggest For the first timers – Druvaa inSync is a Fast...
Related posts brought to you by Yet Another Related Posts Plugin.
3 Comments Add your own
1. Prat | November 11th, 2009 at 1:58 pm
Impressive !!
I am looking fwd to Phoenix ? When is it going to get released finally
Is there any other product doing application level dedupe ?
2. Steve | November 12th, 2009 at 2:23 pm
This will make a lot of sense for long distance backup and replication.
Who are the others doing application level dedupe ?
What is the application is not supported by the custom dedupe algo ? What if MS changes the format ?
3. Jaspreet | November 12th, 2009 at 2:30 pm
@Prat,
Thanks.
Phoenix public (final) beta is coming out hopefully early next week.
@Steve,
Yes, Phoenix should do a good job for low bandwidth backup (and eventually replication).
If we don’t have custom algo for a format, phoenix will fall back to simple block based approach
Jaspreet
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Trackback this post | Subscribe to the comments via RSS Feed