This is an introduction to a series of blog posts exploring the data factory, the premier approach to data management.
The data explosion
From the evolving landscape of data privacy regulation to record-breaking acquisitions in data production, distribution and analytics, the world has its eye on data.
Even in the finance industry, data is no longer only the concern of quantitative funds and data scientists. To varying degrees, data is infused in the investment decision-making process for all types of investors. While a quant fund may rigorously test a dataset for months to ensure that it’s up to the task of being built into signals for years to come, a fundamental or discretionary fund is more likely than ever to treat data as a key piece of a larger puzzle.
There are many benefits to a quantitative perspective in investing, from bypassing common human biases to uncovering alpha. The only wrench in the works is that, even for the most seasoned of investors, it’s hard to know where to start.
Too much data, too little infrastructure
The data explosion has made data readily available, but it’s also led the investment industry into the paradox of choice: faced with so many options, it becomes only more difficult to discern the right choices for your fund. What is the right data? What questions should funds be asking of the providers and the data itself? How can funds find value in all the noise? And perhaps most importantly, how can they do all of this at scale?
It’s true that generalist data scientists can do everything from acquiring to deploying data to building dashboards. But this type of data scientist is something of a unicorn in the investment field: highly prized and difficult to find. Further, this model presents a myriad of challenges when it comes to implementation at scale: repeatability, robustness and efficiency being just a few of them.
The data factory approach
Nasdaq’s Quandl—now known as Nasdaq Data Link—has been at the vanguard of the data industry since its founding in 2011, having pioneered the category of alternative data in finance. From corporate aviation intelligence to patent value estimates, we search for unique, predictive insights that go beyond core financial information.
One of the keys to our establishment as the premier source for financial, economic and alternative datasets is our innovative assembly-line process, inspired by the Ford factory method of work.
In the world of data, the equivalent is what we like to call “the data factory,” an assembly line breaking down the data workflow into stages and then specializing at each stage to maximize efficiency and quality.
The result is a process that can process large volumes of data with maximum efficiency while maintaining high quality standards. At every distinct stage, we can apply specialist expertise to optimize and troubleshoot the process. While it may be tempting to search for those rare data professionals that can effectively manage the entire assembly line, specializing at each stage is far easier to implement.
Further, the data factory approach is easier to scale because funds can hire on demand according to each stage. Ultimately, when the data process is broken down into its constituent parts, it’s also much more tunable—it’s easier to identify bottlenecks and problem areas and focus on fixing problems when they occur.
The factory replicates what data scientists do every day, but makes it more efficient, scalable and tunable. In the next post, we’ll take you through the first stage in the assembly line: acquiring the right data for your organization’s needs.
Further reading in the Data Factory series:
Part One: Acquiring the right data
Part Two: Transforming your data
Part Three: Applying your data
Part Four: Deploying your data