Predictive Restore (Under the hood)

This article is part of our "Tiger Bridge - Under the hood" series, where we uncover technical details about our product, dive deep into the solutions it offers, and explain which decisions we had to make while designing it.

Dilemma

In the previous topics, we discussed the process of working with parts of whole files. Here, we will look at files as belonging to a bigger entity – a macro object.

If multiple files are combined to make up one (e.g. a project), then it might make sense to work with all of them together. We didn’t know if our clients needed this and what it was going to require from us to implement it. The ultimate question was: How can we optimize the time cost when restoring such data, and can we pay it before we actually need the data?

Background

Tiger Bridge works on the file level. In certain cases, this might be inconvenient. When you want to backup, restore, replicate, or perform another operation on every single from a common entity like a project, it would be better to apply a policy on the entire folder which stores them. The problem here is that this folder might be huge in size and restoring it could be time-consuming.

Restoration time cost is the time needed to recover data whenever it is needed. It can be paid:

  1. Upfront – you can click to retrieve the data for your whole project and then work with it when you need; it will be locally available
  2. On demand – you get each file when you click to open it
  3. Predictive – you get a file when you click on it; multiple other files get retrieved as well because you may need them soon after

Pros/Cons Analysis

The upfront method is easiest to manage because all files are treated together. However, it will only work if you have enough time to wait for each file to get downloaded locally. This makes it relevant for proper planning. If data retrieval is especially time-consuming, as is the case when it is done from an archive, upfront would be the best option. Still, it is a manual operation. There is no way for the system to know you will be working on a project. You have to go and ask for it.

Restoring data on demand is the cheapest of the three options. This method will only get the files that you need exactly when you need them. However, it might introduce unnecessary wait delay every time you try to open a file. This could be a showstopper for time-critical operations on macro objects.

The predictive restore mechanism takes all files which are considered part of an entity together, in parallel. When you click to open one of them within a project, the rest of the files (or parts of them, if they are big), get downloaded because the system expects you to access them next. There is a risk that not all of that data will be necessary. At the same time, if you are going to be working with one project, chances are you will need everything in it. This works best for certain consistent workflows.

Decision

Since our main goal is to serve mission and time-critical applications which our customers use on-premises, the predictive restore mechanism was a really good fit for a lot of cases. It makes the hybrid cloud better utilized and comes closest to the functionalities of the operating systems.

The other solutions wait for you to ask for a file before they get it. Tiger Bridge is built to be proactive in that area, which is helpful for workflows that work on macro object level. Since we support the other two mechanisms, we didn’t have to make an either/or kind of decision. It was more a matter of how to optimize things to work even better.