2 min read

Managing and Monitoring our Data Pipelines

Picture of Crux Crux Dec 10, 2019

Inside Crux

Data Operations (DataOps) at Crux

DataOps at Crux, monitoring data pipelines

Beyond the core Crux Deliver feature of building data pipelines, a critical element of our value proposition is ensuring that the data pipelines are well-maintained. We sat down with Tim Marrin, Director of Data Operations to learn how they keep over 1,000 dataset pipelines running smoothly.

First off, what is the scale of what Crux is doing?

Did you know that the Crux team currently processes tens of thousands of data instances (or ingestion pipelines) over a 24-hour period? Each of these data instances can have multiple discrete tasks and some data instances can even have over a hundred discrete tasks. That’s not an insignificant number of activities and processes running through our pipelines.

What is the role of DataOps in this?

One of the jobs of our DataOps team is that it monitors these tens of thousands of ingestion pipelines per day. Pipelines are comprised of tasks that are orchestrated and run on cloud-based microservices that are also monitored by the team to ensure consistent, reliable and fast processing delivery of data.

What is the full range of services that Crux DataOps provides?

The easiest way will be to bullet this out. We have:

24/7 global support that monitors all data feeds
Cloud-based big-data stores and microservices for data processing and storage
Dedicated telephone, email, ticket-based support based on client preference
Incident and problem management with regular status updates to clients
Identification and proactive planning for format and schema changes with suppliers and data consumers
Fully managed and automated CI/CD infrastructure with canary deployments
Automated notifications through various delivery methods to show data availability and metadata
Full support and monitoring of supplier data feeds – handle support of suppliers on behalf of data consumers
Full transparency with reporting and analytics on production incidents and outages

What are example errors that the DataOps team monitors for?

This is a question that comes up with our clients and prospects and these are the ones that we lean on:

Data validation issues
- Invalid schema
- Invalid datatype
- Missing values
File delivery timing issues
Market and Calendar holiday related failures
Remote source not available (supplier ftp, supplier late with files, incomplete files, etc.)
Schema issues due to unscheduled changes

What are some of the challenges that our DataOps team is working on?

The big challenge is supporting our clients, platform and data simultaneously while we scale the number of data instances that we’re continuing to monitor. As we grow, the complexity increases with different instances, consumptions methods and SLAs we work with. These challenges can literally keep me up at night, but also keep us at the top of our game.

How does this compare to what you were doing previously?

I spent 15 years working on trading floors, most recently with the Electronic Trading SRE team at Goldman Sachs. We faced similar issues of scale, managing complex systems, real-time processing, and high throughput. I honestly find myself drawing upon that experience on a daily basis as we try to solve for even greater challenges here at Crux.

With this complexity, why should a firm work with us to outsource their data pipeline operations?

We’re leveraging performant big data and cloud solutions combined with best practices in ITSM (IT Service Management) to provide a very high level of services and technical solutions that is unique. The challenges of scaling to manage the volume of datasets is what’s driving the industry interest in Crux and the DataOps team is meeting the challenge through innovative engineering solutions. I’m really proud of our ability to solve these technical challenges while providing white-glove client experience in a startup.

To learn more about our Crux DataOps and get started, fill out this form:

What Cloud Marketplaces Do and Don’t Do

Crux: Jan 2, 2024

Not long ago, we observed here in our blog that the critical insights that drive business value come from data that is both (1) fast and (2) reliable.

The 3 Dimensions of AI Data Preparedness

Crux: Oct 31, 2023

This past year has been exciting, representing the dawning of a new age for artificial intelligence (AI) and machine learning (ML)—with large...

Insider Blog

How Do Small Hedge Funds Solve the Big Problem of External-Data Integration?

Crux: Oct 24, 2023

How do you get white-glove customer service from a major data supplier?

Insider Blog

Managing and Monitoring our Data Pipelines

Data Operations (DataOps) at Crux

Crux

What Cloud Marketplaces Do and Don’t Do

The 3 Dimensions of AI Data Preparedness

How Do Small Hedge Funds Solve the Big Problem of External-Data Integration?

OFFERING

Integrations

Solutions

Recent Resource

Inc. 5000 Named Crux on the 2023 list of America’s Fastest-Growing Private Companies