Stub Files (Under the hood)

This article is part of our "Tiger Bridge - Under the hood" series, where we uncover technical details about our product, dive deep into the solutions it offers, and explain which decisions we had to make while designing it.

Dilemma

A stub file is a replacement file, or an object which represents another object. Stub files are used when files cannot reside in a specific location, usually because they are too big for the local storage. Users still want to see their files in their folder structure but without them consuming space.

The dilemma here is not whether to use stub files or not as they are obviously needed. If we simply moved the objects to the cloud or any other location because they are too big and leave nothing behind, end users would have a hard time recovering them on demand. This is why we absolutely must have stub files. The question is how representative they should be, where in the data flow they will be implemented, and how.

Background

 

To illustrate the stub file concept a little better, we will use a library analogy. If you have a big library and occassionally lend books to other people, you may need to build a mechanism for tracking books which are currently missing.

There are two options here:

    1. Keep a list which works like an Excel spreadsheet. Inside, you can put information about the book you have lent as well as a name and contact information about the person you gave it to.
    2. Put a marker directly in the place in the library where the book is supposed to reside.

If you know which book you are looking for, both approaches will work. Going with option a. might be easier in the beginning, but the bigger the list gets, the harder it will be to manage. On top of that, you sacrifice your library’s browsability. If you are searching for a new book to read or lend and you don’t know its exact title, you may want to go to the library, navigate to the section of the genre which you need, and then look for the books of a specific author.

Once you get there, there may be a few books to choose from – for example, fantasy titles written by a particular author. Option a. lets you see only the books that are currently in the library. For our analogy, this option is the usage of an external mechanism which keeps track of the missing books or files like some kind of a database.

Option b. makes all books visible, so you will be able to make your choice. The marker put on the missing book’s place will still show you there is another book from the same genre and author, albeit currently not there. If you decide this is the book you need, you may call your friend of family member and ask them to get it back to you. For our analogy, option b. is the usage of stub files.

Going back to the dilemma, we still don’t know how representative the marker should be. In our books example, should we just put the title name on the marker, or maybe the author’s name as well? Do we need the book’s cover for visual identification or just the summary on the last page? We can even go further – when lending a book to someone, we can make a copy of the first chapter and leave it in the placeholder for that book. This way, if we need the book, we can start reading from the first chapter as we usually do. By the time we finish, chances are we will have returned the book and our workflow would not suffer from the fact that the book was temporarily missing when we needed it.

Going back to the file system environment, there is no universal and complete stub file which can be used exactly like the original file. It would be the equivalent of the file itself, not a stub file. This is why we need a reasonable compromise which takes into account how the files will be used.

Normal data flow starts from the user of the application, goes through the operating system to the file system, and eventually reaches some kind of storage.

Historically speaking, in the file system world, the first invention related to our topic was the hard link – an object pointing to another object within the same system. It was used to represent a file existing in multiple locations – a feature easy to implement but limited in functionality.

At some point, symbolic links (also knowns as soft links) were invented. They were supposed to remain transparent for the application. In other words, the application requested a file or a function on that file and the operating system handled the redirection. The application worked as if the files were local. This approach is suitable for similar file systems which share the same function set. If the sym link points to another, similar file system, everything should work properly.

This is far from ideal when dealing with file systems like on-premises, where an application does the work and the cloud stores files which are important for that application. Since there is a different function set available, if the application requests a function native to the file system it resides on, but this function is not available in the system which hosts the actual file, then an error message or an application crash is likely to occur. That would be disruptive to the normal flow of the application user. When using sym links, you can only present the functions available both in the local file system and the target storage environment.

If this is not an option for the client’s environment, stub files could help. This specific implementation of sym links can be used to point to an object in another file system, like what we have in the cloud. They are hardest to integrate but present the biggest possible functionality. Microsoft designed a complete software development kit (SDK) just for implementing stub files. With it, all specific functions which the target does not have but the local application may need, are custom-implemented. This way, we provide local applications with the ability to work with seemingly fully functional local files.

With stub file implementation, whenever we want to get target data, we access the target object. However, if we need some specific file functions like access control lists for security, we open the stub file where these functions will be implemented on stub file level. Both sym links and stub files in particular are implemented as sparse files.

Pros/Cons Analysis

Here are the arguments we considered while thinking of implementing stub files instead of sym links:

Pros

Cons

Functionality / transparency for the application

Complexity – the sym link is much easier to incorporate

Efficiency – works faster

Implementation time – related to the complexity

 

OS-specific – there is no compatibility between the implementations for the different operating systems

 

Vendor name – sym links are implemented by Microsoft, stub files have a custom vendor-specific application

Decision

There are a lot of cons for using stub files and sym links are the overall easier option. However, we decided to commit to building specific stub files with Tiger Bridge which would provide much greater functionality when compared to sym links. In essence, we sacrificed ourselves and our time so we did not have to compromise the performance and speed we provide to our clients and their applications.

Arguments

Our number one argument is the added functionality that comes with stub files.

We are aware that our decision brings significant complexity and requires on-going implementation, but this does not deter us from serving our clients.

Stub files are OS-specific and do not offer compatibility between the implementations for different operating systems. In other words, they do not work for cross-platform solutions. However, Windows clients can benefit greatly from the added functionality.

While our vendor name is not Microsoft, we are still putting a lot of effort into making stub files functional and stable using Microsoft’s SDK.

Conclusion

There are certain applications which cannot work with sym links, and for some of our clients, they are mission-critical. This is why we decided to walk that extra mile by delivering better and more functional stub files. Check out the appendix for a full list of file system functions which do not work with sym links but go well with our stub files.