November 2, 2025

The 7 Core Capabilities of an Industrial Data Platform

Your process engineer needs vibration data from the historian, maintenance records from the CMMS, and quality results from the lab system to diagnose why Line 3 keeps going down. So they export three CSV files, manually align timestamps in Excel, add context about which batches were running, and spend two days on data wrangling before they can even start the actual analysis.

Next week, another engineer needs similar data for a different problem. The entire process repeats from scratch. No economy of scale. No reusable infrastructure. Just expensive expertise doing manual data plumbing.

This is the reality at most manufacturing companies, even those investing heavily in "digital transformation." Everyone agrees you need a data platform, but what does that actually mean? What capabilities does it require? Where do you start?

David Ariens, who spent 12 years at BASF building industrial data infrastructure before co-founding IT OT Insider, created the Industrial Data Platform Capability Map to answer exactly these questions. It's the clearest framework available for understanding what you're actually building, and what vendors are actually selling you.

‍

Why Platform Thinking Matters (And What It Really Means)

‍

Twenty years ago, the manufacturing data landscape looked simple on the surface but was a nightmare in practice. You had historians storing time series data on the OT side. You had data warehouses and BI tools on the IT side. Getting data between them meant custom interfaces, PowerShell scripts, CSV files on FTP servers, duct tape and prayer.

The fundamental problem: these systems weren't designed to work together, and neither side could handle the other's data properly. IT systems couldn't process high-frequency time series data with the operational context manufacturing needs. OT systems couldn't scale or integrate with enterprise analytics.

Fast forward to today, and most companies still operate in silos:

Multiple disconnected sources: Historian data here, MES data there, quality system over there, IoT data in some vendor's cloud, maintenance records in another system
No contextualization: Temperature readings exist, but which batch was running? Which product? Was the line under maintenance? That context lives somewhere else, or only in someone's head
Manual integration for every analysis: Each new question requires rebuilding the same data pipelines, with no way to reuse previous work

A platform approach changes this fundamentally. Think of it like the evolution on the IT side from isolated databases to data lakes and data warehouses—a centralized place where all operational data comes together with proper context, creating a single source of truth that gets smarter with each use case you build.

Every time you clean data, improve your asset model, or build a machine learning model, that intelligence stays in the platform. Future users automatically benefit from better, richer data. That's when you finally achieve economy of scale with your data initiatives.

‍

The Seven Core Capabilities (And What Each One Actually Does)

‍

The capability map breaks down the platform vision into concrete technical requirements you can actually evaluate and implement. Here's what you need:

‍

Capability 1: Connectivity Layer

You need to support both legacy industrial protocols (Modbus, Profibus, proprietary vendor protocols) and modern standards (MQTT, OPC UA). But connectivity extends beyond the plant floor—you also need database connections and APIs for grabbing data from cloud-based IoT services, MES systems, and ERP platforms.

Example: A water utility needs to connect both their SCADA systems managing treatment plants and pumping stations, plus the smart water meters living in a vendor's cloud service. Combining both sources enables digital twins that can detect network leaks by balancing flow data across the entire distribution system.

‍

Capability 2: Contextualization and Data Transformation

This is where most companies stumble. They think contextualization means organizing data in an ISA-95 tree structure. That's just the starting point.

Real contextualization means linking data to the physical and operational world:

Which recipe was running?
What batch was being produced?
Which step in the batch process?
How old is the catalyst?
When was the last maintenance?
What raw material lot was being used?

Without this context, you can't ask the right questions. You can't say "show me temperature profiles for all runs where we made chocolate chip cookies using Recipe A with flour from Supplier X." That level of specificity requires a rich contextual model, more like a graph than a tree.

Data transformation handles the calculations and conversions needed before data enters the platform, unit conversions, aggregations, derived values.

‍

Capability 3: Data Quality

Validate data before it enters your platform. Check for null values, spikes, connectivity issues, stale data, drift. Give each data point a quality flag so downstream users know what's trustworthy.

A Belgian water utility learned this the hard way when faulty smart meter readings generated €2 million invoices for residential customers. When you're making automated decisions based on data, quality validation isn't optional.

Note: For getting started, you can often skip this capability. Humans looking at trends can spot bad data visually. You only need automated quality checks when making automated decisions.

‍

Capability 4: The Platform/Broker

The central repository where all your contextualized, quality-checked data lives. Could be on-premise, cloud, or hybrid depending on your industry and requirements. This is your single source of truth—the foundation everything else builds on.

‍

Capability 5: Edge Analytics

Run advanced calculations where you have high-throughput data that you don't want to store entirely. Vision systems processing images at high frame rates. Vibration analysis at kilohertz frequencies. Edge ML models that calculate specific parameters rather than sending raw data streams.

You don't want to store every vibration waveform in your central platform—that's massive data volumes. But you do want to store calculated features from those waveforms.

‍

Capability 6: Visualization

Don't underestimate this. Most users just want to see a graph quickly. They don't need advanced analytics—they need simple, fast visualization of contextualized data. If you build sophisticated sharing capabilities but no easy visualization, adoption will suffer because your core user base can't easily access what they need.

‍

Capability 7: Data Sharing

APIs and SDKs that let applications pull contextualized data from the platform. This enables integration with BI tools like Power BI, trending tools like Grafana, cloud analytics platforms like Databricks or Snowflake, or custom applications. Data should flow both ways—calculated results should feed back into the platform, making it smarter over time.

‍

Where to Actually Start (The Minimum Viable Platform)

‍

You don't need all seven capabilities immediately. Start with the minimum that delivers value, then build from there based on a clear maturity model:

Dark Ages → Insights → Diagnose → Predict → Optimize

Phase 1: Insights (Minimum viable platform)

Connectivity
Basic storage (the platform itself)
Visualization

This alone gets you out of the dark ages. Suddenly, engineers can trend data without manually exporting CSVs. That's real value.

Phase 2: Diagnose (Where it gets powerful)

Add contextualization

Now you can compare temperature profiles across all batches of a specific product, linked to quality outcomes. You can diagnose problems by correlating operational context with process data. This is where ROI accelerates dramatically because you're answering questions that were previously impossible.

Phase 3: Predict (Advanced analytics)

Add data quality validation
Add edge analytics

Build predictive models. Run ML at the edge. Make forecasts. But only once you have the foundation—trying to jump here without proper context and data quality leads to "AI projects" that never leave the lab.

Phase 4: Optimize (Closed loop)

Full integration with control systems
Automated decision-making based on predictions

This is the autonomous agent territory from the reinforcement learning discussion, but you can't get here without the foundation.

Skip data quality initially if you're just providing insights to humans. They can visually identify bad data. Only invest in automated quality validation when you start making automated decisions.

‍

Unified Namespace - Where It Fits (And Where It Doesn't)

‍

If you're familiar with Unified Namespace (UNS), you're probably wondering how it relates to this capability map. The answer: UNS concepts span multiple capabilities but don't cover everything you need.

UNS core ideas show up in:

Connectivity: Standardized data access from multiple sources
Structuring: Organizing data in a consistent hierarchy (though UNS is more about structure than full contextualization)
The platform: Decentralized data access—the central concept of UNS

But here's the critical point: if your UNS stops at the broker level, you're missing historical data storage, rich contextualization, quality validation, and most of the capabilities needed for a scalable platform.

The industry needs to move beyond the "UNS = MQTT broker" misconception. The aspirational state UNS describes—standardized, contextualized, accessible operational data—requires the full platform approach. Use UNS thinking for how you structure and access data, but don't mistake a broker for a complete platform.

‍

Vendor Selection Reality Check

‍

When evaluating vendors, you'll discover an uncomfortable truth: every vendor claims they cover all seven capabilities. They don't.

The technical capabilities exist across the industry, but they're spread across different vendors and technologies. No single technology stack solves everything today. That's reality.

Questions to ask yourself first:

What do we need today versus in three years?
Which capabilities deliver the most immediate value?
What does our maturity roadmap look like?

Questions to ask vendors:

Which specific capabilities do you actually provide versus integrate with?
Show me a working example of contextualized data access (not just structured data)
How do you handle data quality validation?
What does lifecycle management look like for deployed models?
How do you support our maturity progression from insights to diagnosis to prediction?

The detailed capability descriptions and questions are available on the IT OT Insider blog. David deliberately published the capability map under Creative Commons—use it, adapt it, make it your own. It's not proprietary intellectual property; it's a framework to help the industry move forward.

‍

Conclusion

Data platforms aren't just about technology, they're about creating scalable infrastructure that gets smarter with each use case. Without this foundation, every analysis is a custom integration project. With it, you build once and reuse continuously.

Start with the minimum viable platform: connectivity, storage, and visualization. Get engineers away from manual CSV exports. Then add contextualization as quickly as possible—that's where diagnostic power and real ROI live.

The capability map gives you a common language for discussing what you're building and what vendors are selling. Use it to cut through marketing buzzwords and focus on concrete technical requirements.

Your competitors are building these platforms now. The manufacturers who get the foundation right—proper connectivity, rich contextualization, quality validation—will be positioned to leverage AI, advanced analytics, and autonomous systems. Those who keep doing custom integrations for every analysis will fall further behind.

The choice isn't whether to build a data platform. It's whether to build it strategically with a clear capability framework, or continue accumulating technical debt through one-off projects that never scale.

‍

Kudzai Manditereza

Founder & Educator - Industry40.tv

Kudzai Manditereza is an Industry4.0 technology evangelist and creator of Industry40.tv, an independent media and education platform focused on industrial data and AI for smart manufacturing. He specializes in Industrial AI, IIoT, Unified Namespace, Digital Twins, and Industrial DataOps, helping digital manufacturing leaders implement and scale AI initiatives.

Kudzai hosts the AI in Manufacturing podcast and writes the Smart Factory Playbook newsletter, where he shares practical guidance on building the data backbone that makes industrial AI work in real-world manufacturing environments. He currently serves as Senior Industry Solutions Advocate at HiveMQ.