October 7, 2025
October 7, 2025
Your process engineer needs vibration data from the historian, maintenance records from the CMMS, and quality results from the lab system to diagnose why Line 3 keeps going down. So they export three CSV files, manually align timestamps in Excel, add context about which batches were running, and spend two days on data wrangling before they can even start the actual analysis.
Next week, another engineer needs similar data for a different problem. The entire process repeats from scratch. No economy of scale. No reusable infrastructure. Just expensive expertise doing manual data plumbing.
This is the reality at most manufacturing companies, even those investing heavily in "digital transformation." Everyone agrees you need a data platform, but what does that actually mean? What capabilities does it require? Where do you start?
David Ariens, who spent 12 years at BASF building industrial data infrastructure before co-founding IT OT Insider, created the Industrial Data Platform Capability Map to answer exactly these questions. It's the clearest framework available for understanding what you're actually building, and what vendors are actually selling you.
Twenty years ago, the manufacturing data landscape looked simple on the surface but was a nightmare in practice. You had historians storing time series data on the OT side. You had data warehouses and BI tools on the IT side. Getting data between them meant custom interfaces, PowerShell scripts, CSV files on FTP servers, duct tape and prayer.
The fundamental problem: these systems weren't designed to work together, and neither side could handle the other's data properly. IT systems couldn't process high-frequency time series data with the operational context manufacturing needs. OT systems couldn't scale or integrate with enterprise analytics.
Fast forward to today, and most companies still operate in silos:
A platform approach changes this fundamentally. Think of it like the evolution on the IT side from isolated databases to data lakes and data warehouses—a centralized place where all operational data comes together with proper context, creating a single source of truth that gets smarter with each use case you build.
Every time you clean data, improve your asset model, or build a machine learning model, that intelligence stays in the platform. Future users automatically benefit from better, richer data. That's when you finally achieve economy of scale with your data initiatives.
The capability map breaks down the platform vision into concrete technical requirements you can actually evaluate and implement. Here's what you need:
Capability 1: Connectivity Layer
You need to support both legacy industrial protocols (Modbus, Profibus, proprietary vendor protocols) and modern standards (MQTT, OPC UA). But connectivity extends beyond the plant floor—you also need database connections and APIs for grabbing data from cloud-based IoT services, MES systems, and ERP platforms.
Example: A water utility needs to connect both their SCADA systems managing treatment plants and pumping stations, plus the smart water meters living in a vendor's cloud service. Combining both sources enables digital twins that can detect network leaks by balancing flow data across the entire distribution system.
Capability 2: Contextualization and Data Transformation
This is where most companies stumble. They think contextualization means organizing data in an ISA-95 tree structure. That's just the starting point.
Real contextualization means linking data to the physical and operational world:
Without this context, you can't ask the right questions. You can't say "show me temperature profiles for all runs where we made chocolate chip cookies using Recipe A with flour from Supplier X." That level of specificity requires a rich contextual model, more like a graph than a tree.
Data transformation handles the calculations and conversions needed before data enters the platform, unit conversions, aggregations, derived values.
Capability 3: Data Quality
Validate data before it enters your platform. Check for null values, spikes, connectivity issues, stale data, drift. Give each data point a quality flag so downstream users know what's trustworthy.
A Belgian water utility learned this the hard way when faulty smart meter readings generated €2 million invoices for residential customers. When you're making automated decisions based on data, quality validation isn't optional.
Note: For getting started, you can often skip this capability. Humans looking at trends can spot bad data visually. You only need automated quality checks when making automated decisions.
Capability 4: The Platform/Broker
The central repository where all your contextualized, quality-checked data lives. Could be on-premise, cloud, or hybrid depending on your industry and requirements. This is your single source of truth—the foundation everything else builds on.
Capability 5: Edge Analytics
Run advanced calculations where you have high-throughput data that you don't want to store entirely. Vision systems processing images at high frame rates. Vibration analysis at kilohertz frequencies. Edge ML models that calculate specific parameters rather than sending raw data streams.
You don't want to store every vibration waveform in your central platform—that's massive data volumes. But you do want to store calculated features from those waveforms.
Capability 6: Visualization
Don't underestimate this. Most users just want to see a graph quickly. They don't need advanced analytics—they need simple, fast visualization of contextualized data. If you build sophisticated sharing capabilities but no easy visualization, adoption will suffer because your core user base can't easily access what they need.
Capability 7: Data Sharing
APIs and SDKs that let applications pull contextualized data from the platform. This enables integration with BI tools like Power BI, trending tools like Grafana, cloud analytics platforms like Databricks or Snowflake, or custom applications. Data should flow both ways—calculated results should feed back into the platform, making it smarter over time.
You don't need all seven capabilities immediately. Start with the minimum that delivers value, then build from there based on a clear maturity model:
Dark Ages → Insights → Diagnose → Predict → Optimize
Phase 1: Insights (Minimum viable platform)
This alone gets you out of the dark ages. Suddenly, engineers can trend data without manually exporting CSVs. That's real value.
Phase 2: Diagnose (Where it gets powerful)
Now you can compare temperature profiles across all batches of a specific product, linked to quality outcomes. You can diagnose problems by correlating operational context with process data. This is where ROI accelerates dramatically because you're answering questions that were previously impossible.
Phase 3: Predict (Advanced analytics)
Build predictive models. Run ML at the edge. Make forecasts. But only once you have the foundation—trying to jump here without proper context and data quality leads to "AI projects" that never leave the lab.
Phase 4: Optimize (Closed loop)
This is the autonomous agent territory from the reinforcement learning discussion, but you can't get here without the foundation.
Skip data quality initially if you're just providing insights to humans. They can visually identify bad data. Only invest in automated quality validation when you start making automated decisions.
If you're familiar with Unified Namespace (UNS), you're probably wondering how it relates to this capability map. The answer: UNS concepts span multiple capabilities but don't cover everything you need.
UNS core ideas show up in:
But here's the critical point: if your UNS stops at the broker level, you're missing historical data storage, rich contextualization, quality validation, and most of the capabilities needed for a scalable platform.
The industry needs to move beyond the "UNS = MQTT broker" misconception. The aspirational state UNS describes—standardized, contextualized, accessible operational data—requires the full platform approach. Use UNS thinking for how you structure and access data, but don't mistake a broker for a complete platform.
When evaluating vendors, you'll discover an uncomfortable truth: every vendor claims they cover all seven capabilities. They don't.
The technical capabilities exist across the industry, but they're spread across different vendors and technologies. No single technology stack solves everything today. That's reality.
Questions to ask yourself first:
Questions to ask vendors:
The detailed capability descriptions and questions are available on the IT OT Insider blog. David deliberately published the capability map under Creative Commons—use it, adapt it, make it your own. It's not proprietary intellectual property; it's a framework to help the industry move forward.
Data platforms aren't just about technology, they're about creating scalable infrastructure that gets smarter with each use case. Without this foundation, every analysis is a custom integration project. With it, you build once and reuse continuously.
Start with the minimum viable platform: connectivity, storage, and visualization. Get engineers away from manual CSV exports. Then add contextualization as quickly as possible—that's where diagnostic power and real ROI live.
The capability map gives you a common language for discussing what you're building and what vendors are selling. Use it to cut through marketing buzzwords and focus on concrete technical requirements.
Your competitors are building these platforms now. The manufacturers who get the foundation right—proper connectivity, rich contextualization, quality validation—will be positioned to leverage AI, advanced analytics, and autonomous systems. Those who keep doing custom integrations for every analysis will fall further behind.
The choice isn't whether to build a data platform. It's whether to build it strategically with a clear capability framework, or continue accumulating technical debt through one-off projects that never scale.