November 7, 2025

Building Advanced Data Analytics in Manufacturing

If you're leading data and analytics for a manufacturing organization, you face a unique challenge: industrial operations generate massive volumes of high-frequency data, yet 90% of transactions still happen on-premises at the factory floor. How do you build analytics infrastructure that respects this reality while enabling the advanced capabilities your business needs?

Marcos Taccolini has spent three decades building industrial automation and data acquisition systems. As founder and CTO of TadSoft, he's seen what works and what fails when organizations try to implement manufacturing analytics. His perspective cuts through vendor hype about AI and machine learning to focus on building the foundational infrastructure that actually delivers value.

Here's what many organizations get wrong: they try to jump directly to machine learning and AI applications without establishing basic real-time data collection and metrics. This consistently fails. The systems aren't magic—they can't independently discover what data matters or define what constitutes good performance. You need humans with process expertise to teach them, and that requires months of operating with solid real-time metrics first.

The successful approach is phased. Start with real-time visibility into your processes. Establish key performance indicators. Understand your bottlenecks. Only then move to advanced analytics that build on this foundation. This guide provides the framework for building manufacturing analytics infrastructure that actually works, based on implementations across thousands of production lines.

‍

The 90% On-Premises Reality

‍

Before designing any cloud strategy for manufacturing, understand this fundamental constraint: research across manufacturers shows approximately 90% of data transactions happen on-premises at factories.

Why this matters:The physical equipment sits on the factory floor. The operators work there. Most interactions with systems occur locally. Even when you deploy cloud analytics, the vast majority of transactions remain at the facility.

This is fundamentally different from consumer IoT where 100% of data flows to the cloud. A smart thermostat or doorbell exists solely to send data to backend services. Manufacturing equipment has local control loops, operator interfaces, and machine-to-machine coordination that must continue operating regardless of cloud connectivity.

Architectural implications:Your cloud infrastructure must integrate with data that remains on-premises. You can't simply publish everything to the cloud like consumer IoT applications. Cloud analytics tools need the ability to reach into factory data stores in real time when they need deeper analysis.

For example, you might aggregate key metrics in the cloud for cross-facility analysis. But when an analyst needs to investigate an anomaly, the cloud system must connect to on-premises historians to retrieve detailed time-series data. You're not replicating terabytes of high-frequency sensor data to the cloud—you're providing selective access when needed.

The flexible approach:Some calculations happen only in the cloud—perhaps comparing performance across global facilities. But those cloud systems must seamlessly connect to factory-level data stores rather than assuming everything lives centrally.

This hybrid architecture respects the reality that you cannot and should not move all manufacturing data off-premises. Design systems that work with this constraint rather than fighting it.

‍

Industrial Time-Series Databases: Why Standard Databases Fail

‍

Manufacturing generates data volumes and frequencies that break traditional database approaches. Understanding why helps you architect appropriate storage infrastructure.

The volume problem:Consider a furnace with 1,000 measurements sampled every second. One day has 86,400 seconds, generating 86.4 million data points daily. In 10 days, you have 864 million records. Standard SQL tables fail quickly under this load.

Compare this to fleet management IoT—a truck sending GPS position every minute with 5-6 data points. That's 7,200 data points per truck daily. Manufacturing generates 10,000 times more data per device.

Time-series database characteristics:Time-series databases optimize for data with three characteristics: massive volume, temporal sequencing, and grouped timestamps. At any moment, you're recording 1,000 values all sharing the same timestamp. These databases compress and index based on these patterns.

Available options:Specialized vendors sell industrial historian packages designed specifically for this use case. Major cloud providers now offer time-series or stream data services in Azure, Google Cloud, and Oracle Cloud. These provide the storage and query optimizations you need without building custom infrastructure.

When to use them:Any manufacturing data collection at scale requires time-series storage. If you're reading hundreds or thousands of points at subsecond intervals, standard databases are the wrong tool. Plan for historians from the beginning rather than trying to retrofit after your SQL tables groan under billions of records.

This is not optional for industrial applications—it's a fundamental architectural requirement.

‍

Structured and Unstructured IIoT Data Integration

‍

Modern manufacturing data comes in multiple forms—high-frequency sensor readings, maintenance logs, operator notes, quality reports. Your data architecture must handle heterogeneous data types without creating integration nightmares.

The evolution:Twenty years ago, data lived in rigid SQL tables with predefined schemas. Then object-oriented databases like MongoDB offered complete flexibility—any structure, any data. This pendulum swing created new problems when nothing had structure.

The convergence:Modern databases like PostgreSQL and Oracle now support both models. You maintain structured tables for core data but can attach unstructured documents to key records. The customer table has standard fields like email and address, but can also store arbitrary JSON documents with customer-specific attributes.

Why this matters for manufacturing:Equipment data follows standard schemas—temperature, pressure, RPM measurements in defined formats. But maintenance notes, quality observations, and process adjustments are unstructured text or images. Trying to force everything into rigid tables fails. Trying to maintain completely unstructured data loses critical relationships.

The practical approach:Use databases that support hybrid models. Structured fields for equipment IDs, timestamps, and measured values. Unstructured storage for operator notes, photos, and process documentation. The database maintains relationships between them while allowing appropriate flexibility for each data type.

The visualization challenge:Storing hybrid data is solved. The emerging challenge is tools that can consume and visualize it effectively. The market for dashboards and business intelligence tools that handle structured and unstructured data together is growing rapidly because traditional BI tools struggle with this flexibility.

When evaluating analytics platforms, test how well they handle the hybrid data models you'll encounter in manufacturing environments.

‍

MQTT and Sparkplug B: Dynamic Device Management

‍

Industrial IoT requires protocols optimized for bandwidth efficiency and dynamic device provisioning. Understanding why helps you select appropriate connectivity standards.

The bandwidth constraint:Many industrial sites still use 3G or 4G connections. Bandwidth matters. You need lightweight protocols that minimize data transmission. MQTT provides this—it's far more efficient than alternatives like HTTP for high-frequency data transmission.

The dynamic provisioning problem:In traditional industrial programming, adding a device or data point takes months of planning and configuration. True IoT solutions must let you add and remove devices easily. This requires protocols that support dynamic discovery and self-identification.

Sparkplug B extensions:Sparkplug B builds on MQTT to solve the asset modeling problem. Devices can identify themselves and describe their capabilities dynamically. You can create hierarchical namespaces—Factory 1, Line 2, Sensor 5—programmatically rather than through manual configuration.

This enables true IoT scalability. New devices join the network and announce their capabilities. Your systems discover them automatically and incorporate their data into existing models without manual intervention.

Why this matters:If you have IoT devices that connect easily but still require old-style manual configuration to integrate their data, you're not actually implementing IoT architecture. The protocol layer must support dynamic provisioning, not just data transport.

MQTT with Sparkplug B provides both bandwidth efficiency and the dynamic device management industrial IoT requires.

‍

OPC UA: The Device Integration Standard

‍

OPC UA serves a different purpose than MQTT—it solves the protocol fragmentation problem that has plagued industrial automation for decades.

The fragmentation problem:Twenty years ago, there were 300+ communication protocols on factory floors. Even now, 50-70 protocols remain in active use. Each manufacturer historically created proprietary protocols to protect their market position. This creates integration nightmares.

OPC UA's role:OPC UA provides a unified interface that hides underlying protocol diversity. Your applications communicate with OPC UA servers. Those servers translate to specific device protocols—Modbus, Profinet, proprietary PLCs. You write integration logic once instead of supporting dozens of protocols.

Asset modeling:OPC UA specifications include standards for asset modeling and namespace organization. However, adoption of these standards has been slow. Most implementations use MQTT brokers or generic tools for data modeling rather than OPC UA's built-in capabilities.

The security driver:The migration from OPC Classic to OPC UA accelerated because of security and web accessibility. OPC Classic was difficult to secure and nearly impossible to access remotely through firewalls. OPC UA provides robust security and works easily over HTTP connections.

This matters for cloud integration. You can securely access factory floor devices through OPC UA from cloud systems without complex firewall configurations.

Where it fits:OPC UA excels as the common interface layer between diverse field devices and your applications. It's not replacing MQTT for lightweight device communication or real-time coordination. It's solving the multi-protocol integration problem that has plagued industrial automation.

Use OPC UA to simplify connectivity to existing equipment. Use MQTT/Sparkplug B for dynamic IoT device management. They solve different problems and often work together in complete architectures.

‍

Real-Time Metrics: The Foundation for Everything Else

‍

Before attempting machine learning or advanced analytics, establish basic real-time monitoring of your manufacturing processes. This is not optional—it's the foundation that makes everything else possible.

The fundamental principle:You can only optimize what you can measure. Without visibility into current operations, you're operating blind. Real-time metrics provide that visibility.

Starting with OEE:Overall Equipment Effectiveness (OEE) provides a standard starting point for manufacturing. It measures three critical dimensions: quality (good parts versus defects), availability (uptime versus downtime), and performance (actual speed versus rated capacity).

OEE is well-defined and applicable across almost all production environments. Modern IoT gateways make implementation straightforward—you don't need to touch existing infrastructure. Deploy gateways to collect data, push to the cloud, and run calculations there.

Immediate payback:Some real-time monitoring systems pay for themselves in the first month. When you prevent one expensive equipment failure or avoid stopping a production line unnecessarily, the cost savings can exceed the entire system investment.

A $50,000-$300,000 monitoring system investment becomes obvious when unplanned downtime costs $500,000 or more. Engineers get professional satisfaction from implementing modern systems while simultaneously delivering clear financial returns.

Beyond OEE:Real-time monitoring extends to maintenance prediction. When a heater requires increasing energy to maintain temperature, you detect degradation before failure. When equipment performance drifts from expected ranges, you schedule preventive maintenance during planned downtime.

The key is having current visibility. Dashboards showing production status, equipment health, and performance metrics against planned workloads enable proactive management rather than reactive firefighting.

The critical lesson:Companies that try jumping directly to machine learning and AI without establishing basic real-time metrics consistently fail. The advanced tools need humans with process expertise to define what variables matter and what constitutes good performance. You can't provide that guidance without understanding your current operations through real-time monitoring.

Invest in foundational visibility before advanced analytics. This sequential approach delivers value at each stage while building toward more sophisticated capabilities.

‍

Machine Learning in Manufacturing: Starting with the Right Foundation

‍

Machine learning can deliver significant value in manufacturing, but only when deployed strategically on proper foundations. Understanding when and how to apply it prevents expensive failures.

The teaching requirement:Machine learning systems are not magic. They don't independently discover what matters in your processes. You must teach them by identifying relevant variables, defining good outcomes, and providing examples of desired performance.

This requires process expertise and data. You need months to a year of operations with solid real-time metrics before you have sufficient understanding to guide machine learning effectively.

Predictive maintenance:One high-value application is predicting equipment maintenance needs before failures occur. Rather than relying on manufacturer specifications for maintenance schedules, machine learning analyzes actual equipment performance over time.

For example, monitor how much energy a heater consumes to maintain target temperature. As it degrades, energy requirements increase. Machine learning detects this degradation pattern and predicts when maintenance will be needed based on real performance rather than theoretical schedules.

The reality advantage:Manufacturer specifications may not reflect your actual operating conditions—perhaps equipment is used equipment, or environmental factors differ from specifications. Machine learning trained on your actual operations provides predictions based on reality, not theory.

Process optimization:In continuous process manufacturing, machine learning optimizes setpoints and control parameters. Rather than relying on static configurations, the system learns optimal settings for different conditions and materials.

Where it fails:Machine learning fails when deployed without adequate foundational data. You need reliable data acquisition, clear identification of variables to monitor, and defined success metrics. Attempting to deploy advanced AI on immature data infrastructure wastes time and money while generating frustration.

The phased approach:Phase 1: Deploy real-time monitoring and establish KPIs. Operate for months to understand your processes and identify optimization opportunities.

Phase 2: Select specific use cases where machine learning can improve on human-defined rules. Start with high-value problems like predictive maintenance or quality optimization.

Phase 3: Deploy machine learning with proper teaching—domain experts defining relevant variables and success criteria based on Phase 1 understanding.

This sequential approach succeeds where attempts to skip directly to AI fail. Machine learning delivers genuine improvements, but only when built on solid foundations.

‍

Digital Twins for Operational Intelligence

‍

Digital twins provide operational benefits beyond the marketing hype, but understanding what they actually are and when they deliver value matters for implementation decisions.

Demystifying digital twins:A digital twin is a computational model representing a physical device or system. If a temperature sensor outputs 5 milliamps at 25 degrees and 15 milliamps at 50 degrees, a software function implementing this relationship is a digital twin of that sensor.

Scale this to entire production lines—model every instrument, machine, and process step. That's a complete digital twin of your operation.

Three primary applications:

Pre-production simulation:Before building physical production lines, simulate operations using digital twins. Test process flows, identify bottlenecks, and optimize configurations before capital investment. This is increasingly common in manufacturing design.

Operational comparison:Run your digital twin alongside your real production line. Feed actual sensor data to the model and compare predictions with reality. This creates a learning loop—refine your model when predictions don't match reality, or identify equipment problems when the accurate model detects anomalies.

Over time, as your digital twin becomes more reliable, it can increasingly predict operational issues before they occur based on deviations from expected behavior.

Planning and testing:Use validated digital twins to test process changes, equipment modifications, or new materials before implementing them physically. Run simulations to understand impacts without disrupting actual production.

The investment timeline:Digital twins require initial investment before returning value. You need time to validate models and refine them based on actual operations. This might take weeks or months depending on complexity.

Early benefits come from model refinement and anomaly detection. Later benefits come from using validated models for planning and optimization. Don't expect immediate returns—this is a medium-term investment.

The prerequisite:To build and validate digital twins, you need comprehensive real-time data acquisition. You cannot compare model predictions with reality without accurate, continuous measurements of actual operations.

This reinforces the theme: establish foundational real-time monitoring before attempting advanced capabilities like digital twins.

‍

Conclusion

‍

Manufacturing analytics is not about deploying the latest AI technology. It's about building layered infrastructure that delivers value at each stage while enabling progressively more sophisticated capabilities.

The foundation is real-time visibility—understanding current operations through continuous monitoring and key performance indicators. This pays for itself through operational improvements while providing the data foundation for everything else.

The next layer is predictive analytics—using historical patterns to anticipate equipment failures, optimize processes, and improve quality. This requires the data foundation from Phase 1 plus domain expertise to guide the analytics effectively.

The advanced layer is digital twins and comprehensive optimization—validated models that predict system behavior and enable testing changes before physical implementation. This builds on Phases 1 and 2 to deliver sophisticated capabilities.

‍

Kudzai Manditereza

Founder & Educator - Industry40.tv

Kudzai Manditereza is an industrial data and AI educator and strategist. He specializes in Industrial AI, IIoT, Unified Namespace, Digital Twins, and Industrial DataOps, helping manufacturing leaders implement and scale Smart Manufacturing initiatives.

Kudzai shares this thinking through Industry40.tv, his independent media and education platform; the AI in Manufacturing podcast; and the Smart Factory Playbook newsletter, where he shares practical guidance on building the data backbone that makes industrial AI work in real-world manufacturing environments. Recognized as a Top 15 Industry 4.0 influencer, he currently serves as Senior Industry Solutions Advocate at HiveMQ.