February Community Newsletter – Data Engineering: Get to the Crux

Hello from the CEO

Philip Brittan, Crux CEO

Delightful data is useful and useable. Data Scientists make data useful through analysis that extracts valuable insights from the data. But first, Data Engineers make that data usable by whipping it into shape: loading it, cleaning it, normalizing it, mapping it, joining it, and other transformations that get the data ready for Data Scientists to wring value out of it.

While Data Science gets the headlines, Data Engineering is working hard behind the scenes to make the Data Science magic possible. And by working hard, I mean that Data Engineering typically accounts for 70-80% of the total effort a firm spends on making use of data. Data Science and the unique insights it delivers are business differentiators, but most firms spend a minority of the time on them.

That’s why forward-looking companies increasingly turn to a partner like Crux. By offloading their Data Engineering work, these companies give more time and energy to Data Science and move much more quickly to produce valuable new insights that power their businesses.

Crux brings laser focus, deep expertise, operational oversight, and a valuable network of data suppliers to help you orchestrate, implement, and operate your information supply chain.

At Crux, we make data delightful.

 


Crux Insights Blog

How can you keep the right data flowing into your business? It is simple: Orchestrate, Implement and Operate. Read about Crux’s three step process in our last blog post. 

 

 


Five in 5 with Head of Data Engineering Andrew Clark

Andrew Clark is Crux’s head of data engineering with a tall ask. At 6’6” he sees the full spectrum of data needs for Crux clients. With deep experience in managing unstructured data, he’s a master of data transportation, storage and repackaging.  Here are five questions in 5 minutes with Andrew:

 

What does a data engineer do?
At Crux, being a data engineer means handling the tough work that makes data more actionable for our clients, and designing the tools that make our clients’ lives easier over time. Data engineers sit on the “data wrangling” side of the pipeline, meaning we are the folks who handle the hard work of figuring out where certain elements of the dataset live, slicing and dicing data, and repackaging it for distribution.

 

How has the data engineering landscape changed in the last 5 years?
Today, the folks managing information supply chains are embracing the fact that the whole process does not need to exist on-premises anymore. While firms used to believe their data engineering was their “secret sauce”, today they realize it’s the insights they can glean that are more important. Using experts like Crux to remove as much of the tedious, upfront work as possible is now the preferred model.

 

What are you most excited about?
At Crux, we’re illustrating the art of the possible for our clients. What was once difficult has now become easy. Helping clients realize the full potential of their data is truly exciting.

 

What do you do when you are not engineering data?
I am a big outdoorsman, so my favorite activities tend to be outside. I am an avid cyclist, I have a motorcycle and am currently building an airplane.

 

What would geolocation data tell us about you?
If you were to assess my geolocation data, you’d probably find that when I am not working, I like to go to places where the population density is low. This means you’ll probably find me on my bicycle, hiking, or somewhere outdoors and away from the city.

 


Crux Community

Is it difficult to get access to useable data? Let Crux experts engineer your data to make it ready to use. Our data engineers take on your data challenges so that you can spend your time finding signals. Click HERE  to chat with our team of experts.

Have data to share? Our data supplier community is growing by leaps and bounds.  Our diverse datasets range from stock quotes to corporate trends to transportation data and more.  No data is irrelevant.  Create a Crux login HERE to browse our network and become a supplier.


Out and About

We’ve been building our community. In the past month, we’ve met with hundreds of suppliers and buyers of alternative data.

 

Quandl Alternative Data Conference | January 18, 2018
New York, NY

 

Battlefin Discovery Day Miami | January 30-31, 2018
Miami, FL

 

Outsell Data Money | February 1, 2018
New York, NY

 

AI in Fintech Forum | February 8, 2018
Stanford University, Stanford, CA

Inside Market Data: Industry Heavyweights Unveil Startup ‘Data Wrangler’ Crux, Raise $10M Funding

Originally published November 8, 2017 at Inside Market Data

Author: Max Bowie

“I had the idea that there was an opportunity in the market that was not being met. At Bloomberg, Google and Thomson Reuters, I was on the supply side of the market. I saw the burden that companies faced in trying to make use of data—and nobody was stepping up to help clients with this,” he says. “So late last year, I started getting serious about it, and spent this year getting started, finding the right partners, getting people involved, talking to clients, creating the product and testing it with the market. We tried to listen to clients about where their pain points are.”
These pain points tended to be around identifying and integrating new sources of data and processing it in such a way as to be useful to firms. To address these challenges, Crux’s offering comprises three core areas: a “supplier network” of data sources that allows potential users to find, evaluate and procure new content via an online portal; a managed “data engineering concierge” service that leverages Crux’s team of data scientists and engineers to clean, normalize and transform raw data from those sources into “actionable” formats specific to each client; and its informatics platform—a “secure and scalable” cloud environment in which firms can analyze that transformed data in different ways, share it with other applications via a suite of APIs, and control who can access it via granular entitlements.

Reuters: Goldman Sachs leads $10 million investment round in data startup Crux

Originally published November 8, 2017 at Reuters

Author:  Anna Irrera

Crux’s platform processes the data for financial firms, including banks, hedge funds, private equity groups and insurers, so they can focus resources on carrying out more differentiating tasks such as building artificial intelligence algorithms to extract value from the information.

This removes the biggest pain point, or “crux” of data analytics in finance, said Philip Brittan, chief executive of Crux.

“Everyone is looking at how to get more data and how to get more value out of the data that they have,” Brittan said in an interview. But “firms spend the majority of their data time on stuff that is not differentiated,” he added.

Crux does not sell or resell the data, but has established a network of information suppliers to help clients discover new sources.

Silicon Angle: Fintech data startup Crux Informatics raises $10M to ‘make data delightful’

Originally published November 8, 2017 at Silicon Angle

Author: Eric David

The San Francisco-based company calls itself a “data engineering concierge service,” allowing businesses to extract value from their unstructured data quickly and efficiently. Rather than doing the data analysis itself, which is generally left up to specialized artificial intelligence programs, Crux instead extracts and organizes companies’ data to make it more digestible. The company specializes in financial data for banks, hedge funds, financial firms and so on, which is what drew Goldman Sachs to the startup.

American Banker: Data service startup Crux gets $10 million in funding led by Goldman Sachs

Originally published November 8, 2017 at American Banker

Author: Penny Crosman

It’s a given that banks, hedge funds, insurance companies, research firms and others have an insatiable need for data to make decisions – where to place bets, what companies to buy or fund, to whom to extend credit, and so forth.

Finding the right data from new sources, including data aggregators, alternative credit bureaus and satellite imagery, and making it readable to existing programs is a huge chore.

Techcrunch: Goldman Sachs leads $10 million round for data structuring startup Crux Informatics

Originally published November 8, 2017 at Techcrunch

Author: Jonathan Shieber

“Think of Crux as a Switzerland for data storage and services. The company won’t reveal any information or resell to anyone else the proprietary information it processes and holds for its clients. It’s merely a processing engine for taking the data that big banks and businesses that depend on big data sets need, and crunches that data — reducing it to the metrics that matter most for the clients it serves.”

Business Insider: A startup that wants to help Wall Street clean its data just landed $10 million in funding

Originally published November 8, 2017 at Business Insider 

Author: Frank Chapparo

“However, acquiring data, storing it, and then making sense of it, is a timely process for many firms and often draws resources away from actually figuring out ways to execute strategies based off the data you have. Brittan said he wants to take that non-differentiated grunt work off the hands of Wall Street’s hedge funds, banks, and private equity firms.

“Just like how a logistics firm helps manufacturing company orchestrate supply chains, we are orchestrating the supply chain of data,” Brittan said.

As such, Crux doesn’t just connect its customers with data from providers. Instead, it aims to help guide clients through the entire data supply chain, from acquiring data from providers to cleaning and preparing the data to then packaging it to clients in a way that’s relevant for them.”

 

Crux Informatics Announces $10 Million Series A Led by Goldman Sachs Principal Strategic Investments

NEW YORK and SAN FRANCISCO, Nov. 8, 2017 / PRNewswire/ — With the volume and variety of data exploding year-over-year, extracting value has never been more challenging. Trying to process data into actionable insights is a massively expensive and exhaustive undertaking. With all of this new data, companies’ information supply chains are getting more complex and harder to manage, yet to date, financial firms have generally managed them themselves. That’s why Crux Informatics (Crux) today emerged from stealth mode to announce it closed $10 million in Series A financing, led by Goldman Sachs Principal Strategic Investments (PSI) and other institutional investors.

“The emergence of unstructured data as an important input into the investment process creates a great opportunity for financial institutions, but only if actionable insights can be extrapolated from it,” said Darren Cohen, global head of Goldman Sachs’ Principal Strategic Investments group. “Crux’s innovative approach—coupled with their deep expertise in financial services and capital markets– brings economies of scale that will allow companies to be more agile, inventive and effective with data.”

Simplifying Data

Crux is an informatics company that is a natural outgrowth of the non-differentiated and time-consuming data work companies need to do before they can extract value from their data. Crux helps companies reduce the effort and money spent acquiring, exploring and processing large amounts of data. Crux implements and operates the data processing pipelines they create for their customers’ unique requirements. With Crux, companies can find actionable insights faster and easier.

For data consumers, Crux provides a data engineering concierge service, which delivers easy access to actionable data.

To fuel its customers’ appetite for data, Crux has created a rich network of data content suppliers. The network connects data suppliers with data consumers through Crux engineered supply chains.

Crux’s Informatics Platform offers best of breed technologies to store, explore and transform data. Delivered as an integrated cloud service, the platform makes it easy to develop and manage the execution of industrial grade information processing pipelines for its customers.

Immediate Access to Data Insights

Hedge funds, banks and insurance carriers rely on Crux to help them explore and make use of a wide range of traditional and alternative financially relevant data.

“Gathering information about the world, doing analysis, and driving unique insight is the life-blood of the financial industry,” said Philip Brittan, CEO of Crux. “It is a hard process filled with numerous pain points. Crux is a unique new offering, created to help our clients much more easily find, explore, and make use of relevant data. We take on the burdensome and non-differentiating aspects of our customers’ information supply chains, so they can focus on what really matters for their business. In doing do, we strive to make data delightful.”

A New Solution for Financial Services

Crux’ goal is to solve the pain of data wrangling by helping firms access data with ease. Crux delivers three core elements to address data challenges:

Informatics Platform

Crux offers a secure and scalable environment to store, explore and transform data through its integrated cloud service:

  • Toolset for creating data production pipelines to include data import, analysis, visualization and exploration
  • Rich suite of APIs to connect with best of breed third party services
  • Detailed audit trails, versioning and consumption metrics
  • Controlled access to data through granular entitlements 

Data Engineering Concierge Service

The Crux engineering team of data scientists and data engineers applies machine learning technologies and Crux developed data wrangling tools to organize data at scale to onboard, clean and normalize raw data to make it actionable.

We will:

  • Onboard unique content from a variety of data sources
  • Clean data to remove outliers and inconsistencies
  • Normalize and transform data to meet your requirements
  • Provide operational management and monitoring directly with vendors

Supplier Network

The Crux Supplier Network brings together content from multiple sources and enables in-house datasets to work alongside those from established data providers. The Network enables:

  • Discovery of new and interesting content
  • Evaluation of data sets before making a purchase decision
  • Ability to acquire the dataset directly from the data provider
  • Connection and interaction with a global data-focused community

Crux does not sell or resell any data or analytics—producers and consumers can count on Crux being an objective, neutral partner. Producers have full control over where their data goes. Providers license their content directly to customers and Crux acts as a third party facilitator to wire up and watch over the data pipelines, on behalf of customers. 

About Crux Informatics

Crux was built to solve the pain of data wrangling by helping firms acquire, explore and transform data with ease. Actionable insights are the life-blood of industry and Crux takes on the burdensome and non-differentiating aspects of our customers’ information supply chains so they can focus on what really matters most for their business. For providers of content and analytics, Crux accelerates sales by enabling direct access to target consumers and reducing friction for them evaluating and acquiring content. With the Crux Informatics Platform, customers benefit from a secure and scalable cloud environment to store, explore and transform data. At Crux, we make data delightful. Visit us at www.cruxinformatics.com

 

Media Contact:
Crux Informatics
Elizabeth Pritchard
email: [email protected]

 

Orchestrate, Implement, Operate

 

In my last blog post, I talked about how informatics firms help companies ‘orchestrate, implement, and operate’ their information supply chains.  What exactly do I mean by that?  As an Informatics firm, this is what Crux does:

 

Orchestrate:  ‘Orchestrating’ in general means pulling together and coordinating a variety of components to work together effectively, the way the conductor of an orchestra makes sure the individual musicians are playing together effectively to bring the music to life. The first step in creating a supply chain is deciding the elements that need to go into it. This is driven by the use case of the consuming customer (hedge fund, bank, insurance co, etc). What data do they need, and in what form do they need it?  Crux works with a supplier network of partners: data publishers, analytics firms, and service providers who form the components of the supply chain that Crux implements and operates.  In some cases, a consumer may have a specific dataset or vendor that they know they want to work with.  In some cases, the consumer only knows the type of data they want and they look to Crux to help them surface potential providers of that data and possibly to run tests on candidate datasets to objectively test the fitness of that data to the customer’s use case.  Crux works with a wide range of tools and 3rd party service providers and pulls them into the appropriate set to meet the needs of the specific supply chain.  For instance, there may be a specialist who transforms the data in some specific way (akin to a ‘refiner’ in my last blog post).  Crux partners can make themselves visible to clients on the Crux platform so that customers can browse and learn about specific datasets, analytics, and services, get inspired, and express interest in exploring any of them more deeply.

 

Importantly, Crux does not sell or resell any data or analytics itself — producers and consumers can count on Crux being an objective neutral partner and producers have full control over where their data goes.  Providers license their content directly to customers and Crux acts as a third party facilitator to wire up and watch over the data pipelines, on behalf of customers, as described below.

 

Implement:  A supply chain fundamentally involves the flow of goods from producer to consumer. In the physical goods world (traditional logistics), that involves transportation, storage, and (potentially) repackaging. In the case of an information supply chain, it involves the transportation, storage, and repackaging of data.  These are the fundamental data engineering tasks that allow data to flow between parties in a way that is maximally actionable for the consumer.  These data engineering tasks generally involve writing software that ingests the data (maybe picks up FTP files, copies from an entitled S3 bucket, legally scrapes a web site, hits an API, etc.), validates it (look for missing, unrecognizable, or erroneous data), structures it (usually into one or more database tables), cleans it, normalizes it, transforms it, enriches it, maps embedded identifiers, joins it with other data, removes duplicate entries, etc., all to support the specific use case of the customer.  This is the kind of data engineering work that Crux does to implement a specific supply chain for a customer, pulling in the appropriate data providers, tools, and value-added service providers identified in the Orchestration phase.

 

Operate:  Rarely is a dataset static. The vast majority of datasets receive regular updates, whether that’s once a month, or once per millisecond.  As that data flows, constant vigilance is needed to make sure data shows up when it is supposed to, that it’s not missing anything, that it doesn’t contain unidentifiable components.  Data Operations includes the monitoring and remediation of ongoing data streams.  Crux Data Operators set up dashboards and alerts to keep a close eye on data in motion and all the systems it travels through. When a problem is spotted, they immediately begin diagnosing and remediating the issue, in tight collaboration with the relevant data provider(s), to try to get ahead of the issue before it affects downstream consumers.  Data Operations also includes handling standard maintenance tasks such as watching for and reacting to data specification changes and scheduled maintenance outages coming from the data provider(s).

 

These are the key elements of Information Supply Chain Logistics in a nutshell.  It is a rich process and gives customers tremendous leverage in harnessing the integrated value of a network of suppliers.

 

Contact Crux if you’d like to learn more.

Informatics Firms and Information Supply Chains

Philip Brittan, CEO of Crux Informatics, Inc.

One of the most revolutionary steps in the evolution of manufacturing has been the emergence of sophisticated supply chains. To understand them, first imagine how a person or a firm could create a new product by gathering raw materials and making all the parts themselves. Then imagine how pieces of that process are picked up by others who specialize in various ingredients that go into creating the finished products, such as raw materials providers, tools makers, and (eventually) component manufacturers who create standardized subsets of a product that can be assembled by multiple downstream firms to produce different end products.

Over time the raw materials become more refined (planed lumber instead of timber, jet fuel instead of crude oil, steel instead of iron), and the refiners may in fact be separate companies in the supply chain who take in raw materials and output refined materials, perhaps in several steps by several companies. Over time, tools become more sophisticated and specialized, consuming materials and tools from their own supply chains. Components become increasingly complex and comprehensive (producing larger assemblages), again consuming materials, tools, and possibly sub-components from upstream. With this evolution, manufacturing supply chains have become exceedingly sophisticated and complex, with literally thousands of companies working together to build a car, for example.

One of the key innovations needed to allow this is standards. Thanks to accepted and widely used industry standards, a screw firm can specialize in making screws for a large number of downstream firms, without each screw being a bespoke project. That specialization/focus, and the automation that’s possible when manufacturing standardized components, drives economies of scale and advances in efficiency.

Along with physical goods, supply chains eventually also come to encompass value-added services, such as consulting, metrics gathering, supplier ratings, etc. A special kind of service provider associated with supply chains is the Logistics company. The Wikipedia definition of supply-chain logistics explains that “logistics is the management of the flow of things between the point of origin and the point of consumption in order to meet requirements of customers”. Logistics firms help companies orchestrate, implement, and operate their complex supply chains. They generally work with a network of suppliers that they can bring to bear when helping a firm set up a supply chain. And they have the skills and tools to make sure that the supply chains are operating smoothly, which in the physical goods world frequently involves planning and arranging efficient transportation and storage.

In information intensive industries, such as financial services, processing information to drive valuable insights is the core “manufacturing process”. For example, financial firms of all kinds—banks, hedge funds, research houses, private equity firms, insurance companies, etc.—all take in relevant information about the world, process and perform analysis on that information, drive insights, and take action on those insights. That action can take many forms—make a loan, place a trade, rebalance a portfolio, pitch a client, author a research report, buy a company, underwrite a policy, etc, depending on the type of firm—but all firms have at their core that critical process of gathering information and performing analysis to drive insight.

Over time, the range of information that firms utilize in this core process has grown in volume, velocity, and variety. As such, firms have started to move beyond simply collecting raw material (data), to thinking about their information supply chains, an evolution that closely mirrors what we have seen in manufacturing industries. We are witnessing rapid evolution in the tools that are available to companies to process and analyze data. And a large variety of suppliers, in the form of ‘alternative’ data vendors, have sprung up to meet the ever-expanding needs of financial firms to feed their insight generation processes. One interesting feature of information supply chains is that they may be looping, meaning company A may produce some data (perhaps exhaust from a trading system), feed it to one or more refiners, aggregators, or derived-data producers, who then feed their output back to company A to use in their analytics.

These information supply chains are getting more complex and thus harder to manage, yet—to date—financial firms have generally managed them themselves. This has led to inefficiencies and redundancies across the industry. Every firm has had to become at least basically competent in data management, many have built some form of in-house platform (some well, some poorly) to help manage their data flows. And we are left with a situation where hundreds (in some cases thousands) of firms are wiring up to the same sources of data, downloading the same data, storing the same data, cleaning the same data, mapping the same data etc, independently, redundantly, with no economies of scale.

Just as Logistics firms arose to help manufacturing firms manage their increasingly complex and burdensome supply chains, a new type of firm—Informatics firms—are an inevitable evolution of the market to help companies manage their information supply chains. Informatics firms help companies discover relevant sources of data and help them evaluate that data for fitness to the needs of the firm. They implement and operate the data processing pipelines that are needed to get the information from the supplier to the customer, while validating, cleaning, transforming, mapping, and enriching the data along the way (what we might call Data Engineering) so that it arrives at the customer in a form that is immediately actionable, meaning a firm can do something with it that is pertinent for their business (what we might call Data Science), as is, without requiring further refinement. With a supply chain mentality, Informatics firms pull in the right tools and partners to get the job done.

In effect, Informatics firms ‘manage the flow of information between the point of origin and the point of consumption in order to meet requirements of customers’. Informatics firms can bring economies of scale to the industry by wiring up to a specific source of data once, storing that data once, cleaning that data once, mapping that data once, on behalf of many clients, who can share the costs of those things rather than bearing them independently and redundantly. Informatics firms can also help with the broad implementation of industry standards, which allows for more automation and greater efficiency for everyone.

Firms in information-driven industries, such as financial services, need to think of their core data and analytics workflow as their ‘manufacturing’ process and they need to think about the content that feeds that process as their critical supply chain. As they do so, Informatics firms can help them orchestrate, implement, and operate those supply chains more effectively and efficiently.