Managing Large Datasets (Under the hood)

This article is part of our "Tiger Bridge - Under the hood" series, where we uncover technical details about our product, dive deep into the solutions it offers, and explain which decisions we had to make while designing it.

Here, we will focus on product design while solving end users’ problems. We believe that the central issue is managing large datasets.

There are a couple of problems our clients encounter when dealing with large datasets:

  1. Understanding – as a dataset grows, grasp and control of its data decrease. If you do not understand your data well enough, you may not be able to make the most out of it.
  2. Protection/DR – larger datasets are harder to back up; they introduce higher cost and more time.
  3. Scalability – the bigger a dataset becomes and the more rapidly it grows, the sooner you get the same resource limitations problem.
  4. Cost – methods for storing data vary mainly in cost, especially when compared to the value of the data being stored. With large datasets, you can expect a big difference in the value of various pieces of data. As growth increases, so does the variance in the value, hence it is inefficient to allocate them the same cost.
  5. Collaboration – bigger datasets may store information for more than one person. They usually contain enterprise data and require collaboration between different people on the same files.

With Tiger Bridge, we have tried to address these challenges:

  • For Protection/DR, we have Replication/Backup policies which can be configured to protect your data by storing it in the cloud
  • For Scalability, we have Extension – the ability to attach the cloud to your on-premises infrastructure and make it virtually unlimited
  • For Cost management, we have implemented tiering – some of your data can be replicated to more expensive hot tiers of the cloud, and replication can occur more frequently due to the data’s high value; other data can be replicated using archive tiers, but replication happens less frequently – this way, both storing and transfer costs are limited
  • For Collaboration, we have added a sync-and-share mechanism which allows you to easily collaborate with colleagues, overcoming limitations of traditional solutions
  • For Understanding, Tiger Bridge does not have a built-in solution itself; however, since we recognize this problem, we have created an Analytics tool that can help you get much better understanding of your data