November 8, 2025

Apache Kafka for Industrial IoT: Building Event-Driven Manufacturing Architecture

If you're leading data infrastructure for a manufacturing enterprise, you face competing demands: operations teams need millisecond responses at the factory floor while business analytics require aggregated data from facilities worldwide. Traditional architectures force you to choose between real-time operational capabilities and enterprise-wide analytics.

Kai Waehner, Field CTO at Confluent, has spent years implementing Apache Kafka across manufacturing operations globally. His perspective addresses how event streaming architectures enable both local real-time processing and global analytics without forcing artificial compromises.

Here's what makes event streaming different from traditional data architectures: rather than batch-moving data between systems or maintaining point-to-point integrations, you create a central nervous system where events flow continuously. Systems publish and subscribe to event streams independently, enabling loose coupling that scales as you add consumers without touching producers.

The insight most organizations miss: Kafka is not just messaging. It's messaging combined with storage, data integration, and stream processing. This combination eliminates the need for separate ETL tools, message queues, and integration middleware. You run one infrastructure that handles ingestion, transformation, routing, and storage.

This guide provides the framework for implementing event streaming in manufacturing based on deployments processing trillions of events daily across automotive, aerospace, and heavy industry.

Event Streaming: Beyond Point-in-Time Processing

Before implementing any technology, understand what event streaming actually means and why it differs from traditional data processing approaches.

The core concept:Event streaming means continuously processing data flows rather than batch-processing snapshots. Events arrive from sensors, application logs, transactions, and systems. Rather than accumulating batches for periodic processing, you analyze and act on events as they occur.

Real-time definition:When Kafka vendors talk about "real-time," they mean milliseconds to seconds, not microseconds. This is not the hard real-time of PLC control loops. It's fast enough for predictive maintenance alerts, quality decisions before parts reach the next assembly stage, or coordinating actions across distributed facilities.

Hard real-time (microseconds) remains in PLCs and control systems. Event streaming (milliseconds) sits above this layer, enabling decisions too fast for batch processing but not requiring control-loop timing.

The value comes from correlation:The real power emerges when correlating events from multiple sources. A vibration sensor reading alone means little. That same reading correlated with production speed, ambient temperature, maintenance history, and similar equipment across facilities reveals patterns enabling predictive maintenance.

Traditional architectures require moving data to data lakes, running batch jobs, then feeding results back to operations—too slow for many manufacturing use cases. Event streaming processes correlations in-flight while data is in motion.

Storage and messaging together:Unlike pure messaging systems that deliver and forget, Kafka stores events. This enables replay, multiple consumers reading at different speeds, and historical analysis. Some consumers process events in milliseconds, others aggregate hourly, still others run daily batch analytics—all from the same event stream.

This combination of continuous processing and durable storage eliminates the choice between real-time operations and historical analytics. You get both.

Kafka Versus MQTT: Complementary Not Competing

Understanding when to use Kafka versus MQTT prevents architectural mistakes. They solve different problems and often work together.

MQTT's role:MQTT is lightweight publish-subscribe messaging designed for constrained devices and unreliable networks. It's perfect for hundreds of thousands of sensors in oil fields with intermittent connectivity or connected vehicles with bandwidth constraints.

Kafka's role:Kafka provides messaging plus storage plus data integration plus stream processing. It's designed for stable networks and scales to massive throughput—single clusters handling 10 gigabytes per second. But it's not optimized for hundreds of thousands of direct device connections or highly unreliable networks.

The complementary pattern:Most industrial deployments combine both. MQTT connects massive numbers of edge devices with potentially unreliable connectivity. An MQTT broker or IoT gateway aggregates from devices and publishes to Kafka. From there, Kafka handles enterprise-scale event distribution, storage, and processing.

Think of it as networking layers. MQTT handles the device access layer where constraints matter. Kafka handles the enterprise integration and processing layer where scale, durability, and sophisticated processing matter.

When to use which:Use MQTT when connecting hundreds of thousands of devices, especially with unreliable connectivity or severe bandwidth constraints. Use Kafka when integrating systems at enterprise scale, processing high-volume event streams, or coordinating across facilities and business systems.

Don't force everything through one technology. Architectural elegance comes from using appropriate technologies for different problems.

Hybrid Deployment: Edge and Cloud Working Together

Most manufacturing implementations deploy Kafka in hybrid architectures—some processing at the edge, some in the cloud. Understanding when and why helps you architect appropriately.

The cloud-first reality:Nearly all large manufacturers have cloud-first strategies. Cloud provides elasticity, integration with SaaS applications, and capabilities difficult to replicate on-premises. Modern ERP and analytics increasingly run in the cloud.

Why edge deployment still matters:

Latency requirements:Predictive maintenance predictions, quality decisions in assembly lines, or coordination within facilities can't wait for cloud round-trips. Processing at the edge delivers millisecond responses where needed.

Cost optimization:Connecting thousands of sensors and replicating all data to the cloud becomes expensive quickly. Many implementations start cloud-only during pilots but add edge processing for production to pre-filter and aggregate before cloud transmission.

Data sovereignty:Some regions legally prohibit data from leaving borders. China is a clear example—automotive companies selling there must process Chinese factory data entirely within China. Edge Kafka clusters in those regions operate independently from global infrastructure.

The replication pattern:Typical architectures run small Kafka clusters at factories or edge sites for local processing and integration with MES or SCADA systems. These replicate filtered, aggregated data to large Kafka clusters in the cloud for enterprise analytics, cross-facility comparisons, and long-term storage.

Confluent provides replication technology linking clusters in real-time regardless of location. Some implementations use replication for disaster recovery and business continuity. Others use it for data aggregation across global operations.

Regional deployments:Even cloud-only implementations usually deploy regionally—one cluster in the US, one in Europe, one in Asia. This reduces latency, cuts data transfer costs, and addresses data privacy regulations. Global manufacturers coordinate across regional clusters rather than forcing everything through one location.

The insight: hybrid isn't an architecture choice—it's the reality of global manufacturing operations. Design for it from the beginning rather than retrofitting later.

Connectivity to Industrial Control Systems

Kafka needs access to industrial data, but factory floor systems use hundreds of proprietary protocols. Understanding connectivity options prevents integration nightmares.

Open interfaces work directly:When systems provide open interfaces—TCP, HTTP, REST APIs, or even MQTT—Kafka connects directly. Kafka Connect, the built-in integration framework, provides connectors for standard protocols without additional middleware.

Proprietary protocols need intermediaries:The reality of OT environments is proprietary systems and legacy equipment. Siemens PLCs, Modbus devices, and specialized SCADA systems don't speak protocols Kafka understands directly.

Two approaches work:

Open source options:Apache PLC4X provides open-source connectivity to common PLCs and industrial protocols. If you're comfortable with open source integration work, this eliminates vendor dependencies for common industrial equipment.

IoT gateways and existing infrastructure:Many implementations leverage existing industrial software—OSIsoft PI, Siemens MindSphere, or specialized IoT gateways. These systems handle the "last mile" connection to proprietary equipment, then provide open interfaces Kafka can consume.

The practical approach:Most deployments combine both. New equipment with open interfaces connects directly to Kafka or through lightweight open-source components. Legacy equipment uses existing gateways that already solve proprietary protocol challenges.

Migration paths:Some organizations maintain parallel infrastructure long-term. Others gradually migrate to more open architectures as equipment cycles. The decision depends on vendor strategies—some are moving toward open standards under market pressure, others remain stubbornly proprietary.

The key principle:Don't rip out working infrastructure to force everything through Kafka. Complement existing systems during transitions. Over time, you can choose whether to maintain hybrid approaches or consolidate to more open architectures.

Enterprise Application Integration

Once you have industrial data in Kafka, integrating with enterprise applications becomes straightforward. This is where Kafka was originally designed to excel.

Built-in integration framework:Kafka Connect provides out-of-the-box connectivity to enterprise systems without additional middleware. You're not adding another integration layer—Kafka itself handles integration.

Broad system support:Connectors exist for traditional databases (Oracle, MySQL, PostgreSQL), modern data stores (MongoDB, Elasticsearch, Cassandra), cloud data warehouses (Snowflake, BigQuery, Redshift), and enterprise applications (Salesforce, SAP, ServiceNow).

This means your single Kafka infrastructure connects everything from factory floor to cloud analytics to enterprise applications. No separate ETL tools, no additional message queues, no custom integration code for each connection.

The custom connector option:When no pre-built connector exists—perhaps for proprietary internal systems or specialized data formats—building custom connectors is straightforward. The Kafka ecosystem provides frameworks making connector development manageable for standard engineering teams.

The OT-IT security challenge:The primary integration challenge isn't technical capabilities—it's security policies. IT systems routinely make outbound requests to other services. OT security policies typically prohibit inbound connections from IT to OT systems.

This requires architectural patterns where OT systems initiate connections. Kafka wasn't originally designed for this, so Confluent provides capabilities like cluster linking and dedicated routing that maintain OT security policies while enabling data flow to IT systems.

The practical pattern:Deploy Kafka in the OT zone at the factory. It collects from industrial systems and initiates outbound connections to IT systems or cloud platforms. This respects security boundaries while enabling integration.

The result: you break down OT-IT silos without violating security policies that keep industrial operations safe.

Real-Time Analytics at the Edge

Running Kafka at the edge enables analytics where latency matters most—directly on the factory floor before parts reach the next assembly stage or equipment degrades further.

Beyond simple rules:Some implementations start with basic business rules—if temperature exceeds threshold, trigger alert. This adds value but barely scratches the surface of edge analytics capabilities.

Machine learning model deployment:Advanced implementations train models in the cloud using big data sets, then deploy trained models to edge Kafka applications. TensorFlow, H2O.ai, or other ML frameworks create models. Kafka applications at factories load these models and apply them to every event in real-time.

Quality assurance example:In assembly lines, edge Kafka applications apply image recognition or deep learning models to every product at production speeds. The decision—proceed to next stage, reprocess, or scrap—happens in milliseconds before the product moves on.

This requires processing potentially millions of events per second with consistent low latency. Batch processing that analyzes quality afterward is too late—you've already invested in subsequent assembly stages.

Predictive maintenance:Rather than waiting for equipment failure, edge analytics detect degradation patterns. Vibration analysis, energy consumption changes, or temperature drift trigger maintenance scheduling during planned downtime.

The difference from cloud-based predictive maintenance: edge applications make decisions in milliseconds without network round-trips. When prediction indicates imminent failure, you respond immediately rather than after cloud analysis completes.

The deployment pattern:Train models centrally where you have computational resources and aggregate data from all facilities. Deploy trained models to edge Kafka clusters for application. Update models periodically as you gather more training data.

This separates computationally expensive training (batch, centralized) from lightweight inference (real-time, distributed). You get both sophisticated models and millisecond edge decisions.

Kafka as Data Historian: Rethinking Industrial Data Storage

Traditional industrial historians face challenges at Industry 4.0 scale. Kafka provides an alternative approach that solves scale problems while opening architecture.

The scale problem:Traditional historians like SCADA systems and proprietary middleware weren't designed for the data volumes Industry 4.0 generates. Adding thousands of high-frequency sensors overwhelms systems built for hundreds.

Two architectural patterns emerge:

Kafka as pre-processor:Some implementations use Kafka for pre-processing and filtering before data reaches traditional historians. Kafka consumes raw sensor streams, aggregates and filters, then feeds manageable volumes to existing historians. This extends historian life while adding capabilities they lack.

Kafka as historian replacement:Others implement Kafka as the primary historian. Kafka is inherently a distributed, scalable log where events are appended with timestamps and guaranteed ordering. This is exactly what a historian should be.

Confluent adds tiered storage enabling long-term retention of terabytes or petabytes in Kafka without separate archival systems. You can query historical data directly from Kafka rather than extracting to separate analytics databases.

The open architecture advantage:Proprietary historians lock you into specific vendors with limited integration options and escalating license costs. Kafka-based historians remain open—you choose analytics tools, storage approaches, and integration patterns without vendor constraints.

The practical decision:Legacy historians often continue for existing processes while new data flows through Kafka. Over time, organizations choose whether to maintain parallel systems or consolidate. The flexibility to make this decision based on actual needs rather than vendor limitations is itself valuable.

For greenfield deployments or significant expansions, Kafka as historian avoids introducing proprietary systems that become integration bottlenecks and scale limitations later.

Postmodern ERP: Escaping Monolithic Architectures

Gartner coined "postmodern ERP" to describe architectures escaping monolithic enterprise systems. Kafka enables this transition in manufacturing environments.

The monolithic problem:Traditional ERP, MES, and APM systems are monolithic, proprietary, and inflexible. They don't scale well with Industry 4.0 data volumes. Adding functionality is difficult or impossible. You're locked to one vendor's timeline and pricing.

The postmodern approach:Rather than one massive system, combine best-of-breed solutions for different functions. Buy commodity systems for non-differentiating capabilities. Build custom solutions for competitive differentiators. Connect everything through a flexible event streaming backbone.

Kafka as the central nervous system:Kafka provides the integration layer connecting heterogeneous systems. Real-time event streams flow between your custom applications, purchased enterprise software, cloud services, and legacy systems.

The key difference from traditional integration: Kafka operates in real-time at scale with guaranteed reliability. Traditional ETL batch jobs or fragile custom integrations can't support modern manufacturing requirements.

The vendor evolution:Interestingly, many ERP, MES, and APM vendors now build their next-generation systems on Kafka. They face the same scale and flexibility challenges as their customers. Under the hood, modern versions of enterprise manufacturing software often use Kafka even if users don't see it.

The migration path:You don't rip out working ERP systems. You incrementally add new capabilities through event streaming architecture. Some functions remain in legacy systems. New capabilities deploy as microservices connected through Kafka. Over time, you reduce reliance on monolithic systems where it makes business sense.

The build-versus-buy principle:Use packaged software for non-differentiating capabilities. Build custom solutions where you compete. Kafka enables both approaches in one architecture without forcing everything through one vendor's vision.

Machine Learning Integration: Training Centrally, Inferring at the Edge

Kafka plays critical roles throughout machine learning pipelines in manufacturing, from data collection through model training to real-time inference.

Understanding the ML lifecycle:Machine learning involves two distinct processes:

Model training:Batch process using historical data to create models. This is computationally expensive and happens periodically (daily, weekly) in data centers or cloud environments with significant resources.

Model inference:Applying trained models to make predictions on new data. This happens continuously in real-time and can run on modest hardware.

Kafka's roles:

Data lake ingestion:Kafka is nearly ubiquitous as the ingestion layer for big data lakes feeding model training. Whether using Hadoop, Spark, or cloud-based data science platforms, data typically flows through Kafka before landing in training data sets.

Real-time ETL:Rather than batch transformations, Kafka enables real-time data preparation. Filter irrelevant events, aggregate data, enrich with context—all in-flight before storage. This improves data quality for training while reducing storage costs.

Model deployment:Deploy trained models into Kafka applications at the edge. The TensorFlow or PyTorch model becomes part of a Kafka stream processing application. Every event gets evaluated by the model as it flows through.

Language flexibility:Kafka's open architecture supports multiple programming languages. Use Java for standard applications, C/C++ for low-latency embedded environments, or SQL for stream processing without coding. Deploy ML models in whatever environment makes sense.

Monitoring and feedback:The third ML component is monitoring whether models perform well in production. Kafka enables real-time monitoring of model predictions and outcomes, feeding back to data scientists for model refinement.

The complete architecture:Kafka connects data collection, model training infrastructure, edge deployment environments, and monitoring systems in one coherent architecture. This is far simpler than cobbling together separate systems for each ML pipeline stage.

Digital Twins: Event-Driven State Management

Digital twins—virtual representations tracking physical assets throughout their lifecycle—benefit from event streaming architectures in several ways.

Defining digital twins:A digital twin maintains the complete state and history of a physical asset. For a manufactured car, this includes every customer interaction, configuration choice, manufacturing step, maintenance event, and operational data throughout the vehicle's life.

Two architectural approaches:

Separate database storage:Some implementations store digital twin state in dedicated databases (MongoDB, PostgreSQL, etc.). Kafka streams events to these databases, updating state as things happen. Queries go to the database for current state or historical analysis.

Kafka as twin storage:Others store the complete event stream in Kafka itself. Since Kafka provides durable, ordered, timestamped event storage with guaranteed ordering per asset, it inherently maintains everything needed for digital twins.

The advantage: Kafka's immutable log naturally provides the event history digital twins require. You can replay any asset's complete history by consuming its events. No separate database means less complexity, cost, and failure modes.

End-to-end customer experience:Consider buying a car. You receive marketing emails (events), visit the mobile app and configure a vehicle (events), discuss with family and modify configuration (events), visit a dealer who sees your complete history (event replay), purchase and track manufacturing progress (events), receive over-the-air updates (events), and schedule maintenance (events).

Each interaction adds events to that customer and vehicle's digital thread. At any point, systems can replay history to understand context and personalize experience. The dealership doesn't query 15 different systems—they replay the customer's event stream.

Real-time coordination:Digital twins aren't just historical records. They enable real-time coordination. As the vehicle moves through manufacturing, tracking updates in real-time. When maintenance is due, systems correlate driving behavior, part wear, and service history to recommend specific actions.

The architectural benefit:Event streaming architectures naturally support digital twins because they already maintain ordered, timestamped event histories. You're not building separate twin infrastructure—you're leveraging capabilities event streaming provides inherently.

Conclusion

Event streaming represents fundamental architectural shift for manufacturing data infrastructure. Rather than batch-moving data between systems or maintaining fragile point-to-point integrations, you create a real-time event backbone where systems publish and subscribe independently.

The value isn't theoretical. Organizations like Tesla process trillions of events daily through Kafka, integrating manufacturing, energy systems, vehicle operations, customer interactions, and business systems. Automotive companies worldwide use Kafka to coordinate across global factory networks. Manufacturers across industries depend on it for real-time quality control and predictive maintenance.

Success requires understanding that Kafka is not just messaging. It combines messaging, storage, integration, and stream processing—eliminating the need for separate technologies handling these functions. This consolidation reduces complexity while improving capabilities.

The architectural pattern that works: hybrid deployments processing locally where latency matters while aggregating globally for enterprise analytics. MQTT connecting massive device populations to Kafka handling enterprise integration. ML models trained centrally but deployed at edges for real-time inference.