November 9, 2025

DataOps for Manufacturing: Scaling Industrial Data Pipelines Across Multiple Facilities

Many manufacturers face a common challenge: they've successfully implemented data collection at one facility or for one use case, but when they try to scale across dozens of factories, the approach falls apart. The problem isn't technical complexity. It's about solving what Aron Semle, CTO at Highbyte, calls the "n plus one problem" - how to change your data infrastructure at scale without rebuilding everything from scratch.

Semle has spent 15 years in industrial connectivity, starting with Kepware in 2008 and now leading DataOps strategy at Highbyte. His perspective on what works and what doesn't comes from watching manufacturers try various approaches to digital transformation.

‍

Understanding DataOps in Manufacturing Operations

‍

DataOps is relatively new in manufacturing, and even in traditional IT sectors like e-commerce, it's still developing. The concept centers on three elements: people, process, and technology working together to deliver data pipelines at scale.

Picture a large manufacturer with dozens of factories worldwide doing similar work. They want to centralize data for machine learning and analytics. A DataOps team - perhaps half a dozen people - manages a backlog of business-driven use cases: OEE tracking, scrap reduction, energy monitoring. This team implements the complete pipeline to enable these use cases.

Think of it like the oil and gas industry. Data gets collected down in the factory, moved efficiently across networks, and refined for business applications. The key difference from traditional approaches: this needs to work across 50 factories with a small team, not as one-off projects requiring dedicated staff at each site.

‍

Solving the Scale Problem: Data Pipeline Replication Across Sites

‍

Many manufacturers misunderstand the fundamental challenge. They think the problem is getting all their data to the cloud. It's not. That's actually expensive and creates a difficult engineering problem when you're dealing with millions of tags.

The real problem is what happens after your first success. You get a data pipeline working for one use case at one factory. You learn from it. Then you need to change it and roll those changes across all your facilities at nearly zero cost. How do you do that?

This is where most custom-built solutions fail. Organizations try to build their own systems, write custom applications, and manage their own source code. They feel the pain of the scale problem quickly. By the time they realize they need a different approach, they've already spent significant resources.

‍

Building Your Industrial DataOps Capability

‍

Starting a DataOps initiative requires getting three things right: people, process, and technology.

People with the right skills: You need team members with strong IT capabilities who understand OT complexity. Semle suggests a simple test - have someone walk through a data center, then walk through a factory floor, and explain the differences. The environments are dramatically different. Your DataOps team must understand both worlds.

Process for governance: This determines how you manage changes at scale. When you want to add a new data point or modify a calculation, what's the workflow? Who approves it? How does it get deployed? These questions become critical when operating across multiple sites.

Technology that doesn't burden OT staff: Your OT personnel are already managing production and keeping equipment running. They can't take on massive data projects. Give them simple interfaces to contribute their knowledge - like filling in what specific PLC tags represent - but don't hand them complex modeling exercises.

‍

Starting Small and Scaling Up

‍

The most practical approach starts with a single use case at one location. Don't try to model your entire factory or capture every data point. Pick one specific problem - maybe energy monitoring or a particular quality issue.

Get that working, prove the value, then figure out how to replicate it. This teaches you what your governance process needs to be. When you want to add something similar at another facility, how much work does it take? If it requires rebuilding everything, your approach won't scale.

The Unified Namespace architecture supports this gradual approach. You don't need everything modeled perfectly before you start. Build your namespace as you add use cases, focusing on making it easy to extend rather than achieving perfect completeness.

‍

Unified Namespace Implementation Strategies

‍

Two main approaches exist for UNS implementation: centralized and distributed.

A centralized approach puts one MQTT broker in the cloud serving all facilities. This works when you have good network reliability and lower data volumes. It's simpler to manage but creates a single point of failure.

A distributed approach puts MQTT brokers at each facility, with data flowing up to a central broker as needed. This requires more sophisticated management - you need tools to configure and monitor multiple brokers - but it's more resilient and handles higher data volumes better.

Most organizations end up with distributed approaches as they scale, even if they start centralized. The key is using tools that make managing multiple brokers practical rather than building custom solutions for each site.

‍

Managing Structured and Transactional Data

‍

Real-time streaming data through MQTT handles your time-series information well. But manufacturers also need access to structured data from ERP, MES, and other systems. This is where REST APIs become important.

The goal is harmonizing your MQTT namespace with your transactional data access. When a pump status message comes through MQTT with a pump ID, you should be able to query a REST API with a similar structure to get the most recent work order for that pump.

This creates an abstraction layer between client applications and underlying systems. Applications don't need to know whether data comes from MQTT streams or database queries. They use consistent patterns to access whatever information they need.

‍

Edge Computing as a Factory Data Platform

‍

Modern DataOps approaches treat the factory as an edge data platform. You're essentially building an edge data lake that provides real-time data access, stores historical information, and serves both cloud applications and local factory systems.

This requires edge computing infrastructure - Docker containers, Kubernetes orchestration, or similar technologies. The most advanced manufacturers are already deploying these platforms. They enable you to deploy and manage data applications across many sites without manual configuration at each location.

Right now, most consumers of factory data live in the cloud. But as edge platforms mature, factory systems increasingly connect directly to these local data infrastructures. It becomes the central data source rather than another isolated system.

‍

Common Mistakes to Avoid

‍

Several patterns consistently cause problems:

Building everything yourself: Unless you're in the business of building data infrastructure software, don't try to create your own DataOps platform. Use existing tools and focus your resources on implementing use cases.

Trying to capture all data: Don't start by attempting to get every tag from every machine into your cloud platform. It's expensive and you won't know what to do with most of it. Start with specific use cases and expand based on actual needs.

Perfect modeling before starting: You don't need complete digital twins or comprehensive data models before you begin. Start with basic context and add detail as use cases require it. Perfect is the enemy of good enough.

Putting too much work on OT staff: Your automation engineers are busy keeping production running. They have critical knowledge about what data means, but they can't spend weeks on modeling projects. Design your processes so their contribution takes hours, not months.

‍

The Future: AI and Automated Context

‍

An interesting question is emerging around large language models: Can you push raw data to the cloud and use AI to add context and build data models automatically?

Some experimentation shows this doesn't work reliably yet. A PLC tag named by a controls engineer for a specific machine doesn't contain enough information for AI to determine what it represents. Even with the data stream, distinguishing between an on/off state and a vibration measurement proves difficult based on tag names alone.

However, if you're already adding basic context at the edge - using UNS patterns that provide structure - AI tools become more useful. You can potentially generate applications that use your data without writing traditional code. This area will likely develop significantly in coming years.

‍

Building Your DataOps Team Structure

‍

DataOps requires a blend of IT and OT expertise, but it typically runs more IT-driven. The team needs to understand OT problems while using IT tools and methods.

This creates an interesting organizational challenge. The team can't put heavy requirements on OT personnel, but they need OT knowledge captured in the data infrastructure. The solution is giving OT staff simple ways to contribute what they know while the DataOps team handles the technical implementation.

For many organizations, this means the DataOps team sits between IT and operations, reporting to whoever owns the digital transformation initiative. They work with both groups but operate somewhat independently.

‍

Moving Forward with Industrial DataOps

‍

The pattern that emerges from successful implementations is clear: start with use cases, solve them at small scale, then figure out how to replicate efficiently. Don't build custom infrastructure when commercial tools exist. Use edge computing platforms to manage deployment complexity. Keep the burden on OT staff minimal while capturing their essential knowledge.

This isn't about technology choices as much as approach. Whether you use Highbyte, other commercial tools, or open-source solutions matters less than having a strategy that scales. The manufacturers who succeed treat data infrastructure as a platform that evolves with their needs rather than a project with a fixed end state.

‍

Kudzai Manditereza

Founder & Educator - Industry40.tv

Kudzai Manditereza is an industrial data and AI educator and strategist. He specializes in Industrial AI, IIoT, Unified Namespace, Digital Twins, and Industrial DataOps, helping manufacturing leaders implement and scale Smart Manufacturing initiatives.

Kudzai shares this thinking through Industry40.tv, his independent media and education platform; the AI in Manufacturing podcast; and the Smart Factory Playbook newsletter, where he shares practical guidance on building the data backbone that makes industrial AI work in real-world manufacturing environments. Recognized as a Top 15 Industry 4.0 influencer, he currently serves as Senior Industry Solutions Advocate at HiveMQ.