Motif #4.1

Every well-planned release has a motif, a term often used to describe a dominant theme in a literary, artistic, or musical work.

In all the early releases, the motif for Druva has been Simplicity: simplicity for both end-users and for IT administrators. For end users, Druva inSync is now so simple to use that it just works without their knowing. It’s completely non-intrusive and works on all kinds of networks such as WAN and VPN. It’s so simple to use that end users can access their data from any Web browser without having to contact IT. Likewise, for IT administrators, it’s so simple to download (~40MB), install (under 20 minutes), and manage (almost zero maintenance) that the total cost of ownership is almost negligible. The simplicity motif has been a strong differentiator for Druva’s offerings.

For inSync 4.0, we made Storage and Bandwidth Optimization the motif. To optimize storage, we introduced App-aware Dedupe, an industry-first dedupe technology that offers a 90% storage savings across all user data and a 100% dedupe accuracy at the source (laptops) for supported applications such as Outlook and Office. To optimize bandwidth, we introduced the Octopus WAN optimization engine, a multi-threaded client architecture that does smart bandwidth throttling to offer a 5x performance gain for every client backing up on WAN.

The eye-catching red shack on the wharf (Rockport, Massachusetts) is often called Motif #1, a reference to its popularity among artists.

The theme for inSync 4.1 emerged naturally to “Scale” as customers were increasingly deploying Druva to more users in each of their environments. With release 4.1, we wanted to make inSync scale efficiently along several dimensions as outlined below –

Scale -

  • 2000 users per server
  • 16TB of data per server
  • 200 parallel connections per server

Performance -

  • We’re excited to introduce an innovative HyperCache technology, which can improve backup performance by 6x compared to inSync 4.0. HyperCache is an in-memory cache that can be configured to access the most optimal subset of your dedupe index in memory resulting in a high hit rate. The usual 80-20 rule applies here: with just a 30% subset of the dedupe index, Hypercache can deliver upwards of 75% hit rate. We recommend a 4GB of HyperCache size for every 1TB of data to maximize performance. The admin console offers a simple way for you to configure HyperCache for optimal performance.
  • You can now configure an SSD storage for your dedupe index to further enhance your server performance. Lab results show a whopping 12x performance improvement with HyperCache and SSD configurations.
  • You can now install Druva on a 64-bit system for enhanced performance.

Administration

  • 4.1 now supports a new administrative role in addition to a Server Administrator. A Profile Administrator role grants permissions to manage one or more user profiles in order to edit profile settings, add users, and manage data restore for those profiles. This is a great way to scale the administration tasks across your organization between server and profile administration.
  • In light of the above role, we’ve enhanced our dashboard and reporting, so an administrator can get a customized view of their reports depending on their role.
  • You can now automate the import of users to inSync from your Active Directory. A periodic import from your AD can be set up to dynamically add users to inSync.

Access -

We’re excited about the upcoming deployments of inSync 4.1 and the performance benefits to all of you. In my next blog, I’ll talk about the 2 editions of inSync 4.1 (Enterprise and Professional), how they compare, and which one is right for you. Stay tuned….

 

Performance Optimization

One of the major goals for inSync 2.1 release (due this week) is improved performance. With this new release users should be able to experience almost 30% speed improvements specially while syncing smaller files.

While working on inSync 2.1, team Druvaa rediscovered some tips and tricks for performance improvement -

Code Profilers
They can give you very quick insights into bottlenecks. It’s better to start at profiler output than from a hypothesis. Start working out a hypothesis only after profiler points out a bad function. We used gprof2dot, which plots a nice graph from prof or gprof output. An example is shown below -
The graph shows top down hierarchy of functions, the percentage of time each function consumes, the number of calls etc. The percentage of time consumed by a function puts the performance optimization exercise in the right perspective. You don’t want to optimize a function if it contributes just 1% to the whole processing time. The general idea is to concentrate on function that consumes substantial time and is not supposed to do it. Once a few functions like this are optimized, you can go for another round of profiling.

Network Utilization
It’s not sufficient to just reduce the network bandwidth usage. It’s equally important to completely utilize your share of the network bandwidth.
Especially for non-interactive applications, the throughput matters much more than the latency. In a system that uses a single threaded client to issue RPC calls, thethroughput is governed by the latency. If one RPC call takes a long time, the throughput is low even though there is no bottleneck, persay. Looking at it in a different way, the network is not being utilized when the server is processing the call. A multi-threaded client improves network utilization and also throughput. Sometimes the cause for poor network performance could be outside your code. For example, the TCP default window size shows poor performance with high latency-high bandwidth network. Increasing TCP window size improves performance for such networks and so does the use of multiple TCP connections.

Caching

Caching frequently used data reduces the database queries or disk reads. Database queries and disk reads may not consume the CPU cycles but they add to the latency in a big way.

Muti-threading can work around latency but it comes with its own overheads in terms of code complexity and resource consumption. Simple caching avoids frequent trips to database/disk. Databases and operating systems maintain their own cache but the overheads of connecting to a database or issuing a system call are avoided at best.
Beware of stale caches and serialization issues.

Delayed Writes
Synchronous writes are slow. Some writes, for example activity logs, can be delayed indefinitely. Other writes that need persistance gurantees can be synced in batches than individually.

This holds true for both databases and file systems. It’s cheaper to do multiple inserts in one sqlite transaction than to create one transaction for each insert. On the file system side, you are better

off writing a few MBytes to a file, followed by a fsync than multiple few KBytes of writes and a fsync for each write.

Batch requests

A batch of 10 queries sent to a database works faster than 10 queries issued one after the other. Encoding the 10 queries as a pl/sql function works even better. This is primarily due to the socket communication overheads, specifically the latency involved in it.

For inSync 2.1, we found that the lowest hanging fruits were with the database and file system interactions. We sure plucked all of them :)