Motif #4.1

Every well-planned release has a motif, a term often used to describe a dominant theme in a literary, artistic, or musical work.

In all the early releases, the motif for Druva has been Simplicity: simplicity for both end-users and for IT administrators. For end users, Druva inSync is now so simple to use that it just works without their knowing. It’s completely non-intrusive and works on all kinds of networks such as WAN and VPN. It’s so simple to use that end users can access their data from any Web browser without having to contact IT. Likewise, for IT administrators, it’s so simple to download (~40MB), install (under 20 minutes), and manage (almost zero maintenance) that the total cost of ownership is almost negligible. The simplicity motif has been a strong differentiator for Druva’s offerings.

For inSync 4.0, we made Storage and Bandwidth Optimization the motif. To optimize storage, we introduced App-aware Dedupe, an industry-first dedupe technology that offers a 90% storage savings across all user data and a 100% dedupe accuracy at the source (laptops) for supported applications such as Outlook and Office. To optimize bandwidth, we introduced the Octopus WAN optimization engine, a multi-threaded client architecture that does smart bandwidth throttling to offer a 5x performance gain for every client backing up on WAN.

The eye-catching red shack on the wharf (Rockport, Massachusetts) is often called Motif #1, a reference to its popularity among artists.

The theme for inSync 4.1 emerged naturally to “Scale” as customers were increasingly deploying Druva to more users in each of their environments. With release 4.1, we wanted to make inSync scale efficiently along several dimensions as outlined below –

Scale -

  • 2000 users per server
  • 16TB of data per server
  • 200 parallel connections per server

Performance -

  • We’re excited to introduce an innovative HyperCache technology, which can improve backup performance by 6x compared to inSync 4.0. HyperCache is an in-memory cache that can be configured to access the most optimal subset of your dedupe index in memory resulting in a high hit rate. The usual 80-20 rule applies here: with just a 30% subset of the dedupe index, Hypercache can deliver upwards of 75% hit rate. We recommend a 4GB of HyperCache size for every 1TB of data to maximize performance. The admin console offers a simple way for you to configure HyperCache for optimal performance.
  • You can now configure an SSD storage for your dedupe index to further enhance your server performance. Lab results show a whopping 12x performance improvement with HyperCache and SSD configurations.
  • You can now install Druva on a 64-bit system for enhanced performance.

Administration

  • 4.1 now supports a new administrative role in addition to a Server Administrator. A Profile Administrator role grants permissions to manage one or more user profiles in order to edit profile settings, add users, and manage data restore for those profiles. This is a great way to scale the administration tasks across your organization between server and profile administration.
  • In light of the above role, we’ve enhanced our dashboard and reporting, so an administrator can get a customized view of their reports depending on their role.
  • You can now automate the import of users to inSync from your Active Directory. A periodic import from your AD can be set up to dynamically add users to inSync.

Access -

We’re excited about the upcoming deployments of inSync 4.1 and the performance benefits to all of you. In my next blog, I’ll talk about the 2 editions of inSync 4.1 (Enterprise and Professional), how they compare, and which one is right for you. Stay tuned….

 

Product Testing – Chicken Gun Theory

I was recently watching a Discovery Network program on Airbus A380, and it was an eye opener. It was amazing how they were testing the aircraft for impact analysis. About 75% of air collisions happen because of mid-air bird impacts, the simplest solution in the past was to make the Aluminium sheet thicker (about 0.8mm). Because of the new A380′s weight, they decided to try a new approach and design a new lighter material. Impact

As a base, they took a thinner 0.6mm Aluminium sheet. But this time, rather than trying the usual impact analysis they create a real life scenario. They created a Chicken Gun – a gun capable of firing a 1kg skinned chicken at 260 mph towards an Aluminium target.

No surprises, the results of this testing were completely unexpected. The chicken actually tore apart the Aluminium foil like a paper. The result was a new material which combined both glass-fiber and Aluminium.

Honestly has been an eye opener for me !

Testing InSync isn’t easy, unlike server backup you have to the data is much more dispersed and diverse, coming from 1000 different sources over different networks.  This time for v4.0 testing, we have decided to follow the Chicken Gun Theory. We are in the process of building three new test suites for -

  1. Parallel Stress
  2. Fatigue
  3. Volume of Data
  4. Different Networks

To simulate parallel testing we have leased over 200+ small servers from a grid provider for next few months. Each of these servers will simulate multiple users to detect bottlenecks in parallel testing. Hopefully this would also be useful for testing scalability of the storage engine. For example we recently discovered that enabling native compression in the embedded database puts some marginal load on abundantly available CPU power, but reduces critical disk I/O by 30% !

Sometimes we have seen issues surfacing from prolonged usage when server is working relentlessly for months. To test this, a small set of these servers will be kept in constant “synchronizing” state for weeks. This fatigue testing should be able to test various corner cases causing memory leaks, open file handlers, de-fragmentation etc.

And to test the effect of different networks, we would be using different proxies which will induce latency and network drops in the network. This should help us understand the “remote” backup better. This actually helped us understand a recent customer case, where backup was suffering because of VPN time-outs.

I am sure, this would help us make your laptop backup much more robust … without sacrificing any chickens  (except for the launch parties) :)

Six Common Usability Mistakes in Software Product Design

By now, all good designers and developers realize the importance of usability for their work. Usable products offer great user experiences, and great user experiences lead to happy customers.

Six common mistakes and recommendations for product design and usability -

1. Usability Vs Utility
Utility refers to the ability of the product to perform tasks. The more tasks the product is designed to perform, the more utility it has. Usability refers to the ease of learning and performing these tasks.

utility vs usability

Most software give higher priority to features than usability. As a result it becomes more and more confusing for the end user to get work done.

2. Liking it Vs Using It
Likeability is always a desirable trait in a product. If people like the product, they are more likely to use it and to recommend it to others. But as with utility, likeability is often confused with usability.

Liking It - Skype

People often like a product for reasons unrelated to utility and usability. They may be attracted to its styling and flash, or to the status they believe the product confers upon them. People tend to like highly usable products, but you should not assume that means a well-liked product is usable.

3. Discovery Vs Flow
Some of the most widely used products do not have an instructions manual e.g. toothbrush or Skype.

Products with “Installation Manuals” turn me off. IMO, there is place on earth for installation manuals. And admin guides should be only when you want to learn a little extra or troubleshoot. Instead the product should try and use inline or in-GUI help as much as possible.

Discovery involves looking for, and finding, a product’s feature in response to a particular need. And it gets worse when a complex feature needs multiple inputs or choices to be made.

I am a big fan of wizards. I think the task becomes much simpler when broken down into series of actions.

4. Tiny Meaningless buttons
The buttons should signify action. The most common mistake with buttons is when they are labeled “OK” which in my opinion makes very little sense.

button with action

Buttons should have lables which signify clear actions like – “Modify Report Schedule”

5. Duplicate Actions
Quite often, products have more than one ways of performing the same task. This is confusing and often irritating. There should always be one clear way of performing an action.

6. Don’t Give Too Many Choices
Never confuse flexibility with giving too many choice. You would never buy car from a salesman who gives you 8-10 names for “the cars that might suit you”. Instead you are more likely to buy, when the salesman gives you 1 (or max 2) options and convinces you.

Too many choices

If you are sure that more than 80% of your audience is likely to vote “yes” for the option, please make it a default. Or if you really want it, add it to “advanced”.

Disclosure
If you think inSync is a very user friendly product, you would be pleasently surprised with the upcoming upgrade. Usability has been one of our core focus areas in inSync v3 release.

Performance Optimization

One of the major goals for inSync 2.1 release (due this week) is improved performance. With this new release users should be able to experience almost 30% speed improvements specially while syncing smaller files.

While working on inSync 2.1, team Druvaa rediscovered some tips and tricks for performance improvement -

Code Profilers
They can give you very quick insights into bottlenecks. It’s better to start at profiler output than from a hypothesis. Start working out a hypothesis only after profiler points out a bad function. We used gprof2dot, which plots a nice graph from prof or gprof output. An example is shown below -
The graph shows top down hierarchy of functions, the percentage of time each function consumes, the number of calls etc. The percentage of time consumed by a function puts the performance optimization exercise in the right perspective. You don’t want to optimize a function if it contributes just 1% to the whole processing time. The general idea is to concentrate on function that consumes substantial time and is not supposed to do it. Once a few functions like this are optimized, you can go for another round of profiling.

Network Utilization
It’s not sufficient to just reduce the network bandwidth usage. It’s equally important to completely utilize your share of the network bandwidth.
Especially for non-interactive applications, the throughput matters much more than the latency. In a system that uses a single threaded client to issue RPC calls, thethroughput is governed by the latency. If one RPC call takes a long time, the throughput is low even though there is no bottleneck, persay. Looking at it in a different way, the network is not being utilized when the server is processing the call. A multi-threaded client improves network utilization and also throughput. Sometimes the cause for poor network performance could be outside your code. For example, the TCP default window size shows poor performance with high latency-high bandwidth network. Increasing TCP window size improves performance for such networks and so does the use of multiple TCP connections.

Caching

Caching frequently used data reduces the database queries or disk reads. Database queries and disk reads may not consume the CPU cycles but they add to the latency in a big way.

Muti-threading can work around latency but it comes with its own overheads in terms of code complexity and resource consumption. Simple caching avoids frequent trips to database/disk. Databases and operating systems maintain their own cache but the overheads of connecting to a database or issuing a system call are avoided at best.
Beware of stale caches and serialization issues.

Delayed Writes
Synchronous writes are slow. Some writes, for example activity logs, can be delayed indefinitely. Other writes that need persistance gurantees can be synced in batches than individually.

This holds true for both databases and file systems. It’s cheaper to do multiple inserts in one sqlite transaction than to create one transaction for each insert. On the file system side, you are better

off writing a few MBytes to a file, followed by a fsync than multiple few KBytes of writes and a fsync for each write.

Batch requests

A batch of 10 queries sent to a database works faster than 10 queries issued one after the other. Encoding the 10 queries as a pl/sql function works even better. This is primarily due to the socket communication overheads, specifically the latency involved in it.

For inSync 2.1, we found that the lowest hanging fruits were with the database and file system interactions. We sure plucked all of them :)