This article is part of our "Tiger Bridge - Under the hood" series, where we uncover technical details about our product, dive deep into the solutions it offers, and explain which decisions we had to make while designing it.
Definition/Background
Versioning here refers to the creation of restoration points, specifically multiple restoration points for the same file or folder. It expands to the concept of CDP (Continuous Data Protection) that we explained earlier. With traditional backup solutions, we have just a single point in time we can restore to; with CDP, it can be any point.
Having a restoration point as a practice aims to achieve two main goals:
- Protection – if you lose one or more files or make an undesired change, you must be able to restore a previous version
- Evolution – ability to see the progress and everything that has happened with the file or folder, not only so you can restore a known good version but also to be able to pick and choose it
Businesses quite often require for multiple restoration points to be kept. This is especially true for highly dynamic datasets in which data changes frequently. There might have been an unintended change at some point in the past that led to a bad result afterwards, and it only became obvious later. Writing code and its source control mechanisms is a good example. A modification might turn out to be unwanted after an entire year, and the source control system in a software development company has virtually unlimited depth. This is exactly where versioning comes into play.
The need for versioning was present way before CDP became usable. People were achieving versioning capabilities by making a lot of snapshots, with the help of which they were able to restore their data to specific dates or hours. The disadvantage of that approach is that you have to work with the entire dataset; by restoring to a previous date and time, you restore everything in the dataset, including things which shouldn’t necessarily be restored.
Problem
The question here is how to make a solution which helps the user do what they want, even though what they want is not clearly defined. There are mechanisms available in the cloud, but how do we make use of the on-premises? Versions are supported by the most popular cloud providers, but they are useless because they allow you to find just a single file and restore it to the version you need. The cloud can also let you restore to a certain point in time, but this is the equivalent to traditional backups.
Existing Solutions
On-premises solutions take up expensive local space because they create versions of files locally. The cloud can help as it provides versions which are stored elsewhere, but they have a problem keeping the integrity of a whole dataset.
People tried to offer a solution to that by working on Git and similar systems, thereby allowing you to keep versions of individual files while also retaining the dataset’s integrity. These systems are specifically tailored to letting you make deltas. You can use them for researching and then compare files for changes and drive business decisions based on that.
However, even though Git is a great use-case for versioning, it cannot help you identify changes made in Photoshop, like adding a new layer, for example. To get insights like this, you really need to understand the purpose of your work.
File systems do not provide versioning capabilities on their own, because one of their main purposes is to store data without wasting space. Solutions which perform versioning on file system level are optional modules or applications like Time Machine and Volume Shadow Copy.
When doing versioning on file level, we see the evolution of a file with each modification, which is listed as a separate version noted by a timestamp. However, maintaining a coherent version of an object is not straightforward. It presents some corner cases that need to be considered, such as:
-
- Are the versions property of the file?
If you copy file A and create file A2, do you expect A2 to have the same versions as A? Normally, this would be the case, but in reality, you get a new object with no previous versions.
- Are the versions property of the file?
-
- When you delete, do you simply remove the file or delete it together with its versions? (meaning: Is the delete operation a version?)
It gets even more interesting here. The delete operation typically marks the end of the history of an object. However, restoring a folder to a previous state is essentially the same as undeleting the file. In a normal situation, each version of a file is a delta of the previous one. With undelete, this is not the case. If you make a change after the undelete, what do you expect to happen? We consider the delete operation a change. You would probably want to create a version which knows that it is a delta of a previous version, not the latest one, but that would require not just a list of versions which the system gives you, but a tree with different branches and inheritance.
Versioning can also be done on folder level – like a snapshot of a folder at a given time. It includes versioning of all files within that folder, but since we are talking about a snapshot of the whole folder, pre-create and post-delete scenarios should also be considered.
- How to treat the rename operation?
What happens when you rename а file – does it become a new file and what happens to its versions? In the case where a file is named A and has two versions, if you change its name to B, what should you expect? File B with the two versions coming from A, or maybe a deleted file A with its versions and a new file B which is yet to have versions created?
You may expect the new file to be without versions; in reality, you get a file with a new name and all of the old versions attached to it.
- What to do when the rename operation is also a location change?
If a user moves a file from one folder to another, in their view, it will be the same file. From the file system’s perspective, it will be a different one because of its new identifier. To make things more intriguing, we can introduce the cloud. It works with an object and the name has no effect on this behavior – just a property that can be changed. And what if you wanted to restore the file name change operation?
This is linked to the rename corner case we discussed: a rename operation within a folder is usually associated by the user with preserving the same file; a rename together with a file move operation is typically associated with a new file.
What happens if you have a folder and file A, then you delete it and create a new file called A again but this time with new contents. Would you expect to have two versions of file A?
There is no right or wrong answer to any of these questions. The system cannot make a decision as it might not be what the user wants or expects.
Let’s see how Microsoft Word handles its saving operation. This valid and practical example illustrates why the concept described above is so important.
When saving your changes, you expect to have the same file you started with, and a new version added for the latest changes you made. In reality, since Microsoft Word uses temporary files, after the save operation, you end up with a brand-new doc file with no versions; all of its old versions remain useless together with the temporary file Word created and used as part of its save operation:
Cons
The existing on-prem solutions are very limited because users seek storage efficiency and file systems do not have a built-in solution for that. Furthermore, even Time Machine and Volume Shadow Copy work like snapshots and backups. They do not give you the CDP advantage of hand-picking a few files and restoring them to a point in time you need. The challenge is to create a good time machine solution.
Clouds provide you the tools for versioning with great depth. They allow you to turn versioning on but do not know what you are saving in the objects. This looks like an attempt at creating a generic solution.
We treat cloud versions more like tools rather than solutions. The cloud gives you a basic level of versioning which can be further enhanced with other software solutions. The main problem of working with cloud versions alone is that they only allow you to use versions of a single file.
Our Solution
We’ve created a better time machine which takes advantage of cloud features and brings them on-premises, producing a try hybrid solution. Our users get the experience they would expect from a time machine but with the help and additional benefits of the cloud.
Tiger Bridge is not trying to solve a user’s dilemma. Our mission is to make the cloud’s versions workable and understandable. The user would still have to decide how to approach it, but the options we give them would be better.
With the help of Tiger Bridge, you will be able to take advantage of enhanced versions with on-premises workflows, together with soft delete capability (the delete operation being another version of the file).
It allows you to not only revert but identify and potentially delete all obsolete or newer files. You can create policies for that.
By comparing versions, you can identify exactly when a problem like a virus attack or accidental deletion happened. You can revert your entire dataset to an earlier point in time, but then change the versions of just a handful of files you recently worked on so you do not lose progress and important changes.
This hybrid solution gives you the on-premises way of dealing with corner cases within the cloud, like the file rename or file copy operations.
The restore version operation is no longer attached to a single file, like it is in the cloud. You can now do it on the entire folder level.