November 11, 2025

Open-Source Industrial IoT Platform for Data Collection and Analysis: Apache StreamPipes

Blog Details Image

Bridging the gap between operational technology (OT) and information technology (IT) remains a significant challenge in manufacturing. Apache StreamPipes addresses this by providing an open-source toolbox that enables non-technical users to connect, analyze, and exploit industrial IoT data streams. This comprehensive guide explores StreamPipes' architecture, features, and capabilities for building connected factory solutions.

Understanding Apache StreamPipes is essential for industrial engineers, data scientists, and architects building IIoT analytics platforms without extensive programming expertise.

The Challenge: Making Industrial Data Accessible

Modern factories contain numerous sensors and data sources using diverse protocols—OPC UA, MQTT, Modbus, PROFIBUS, and robotics-specific protocols like ROS. These systems generate substantial data from sensors, production control systems, and automated assembly lines.

The core problem: a significant gap exists between OT personnel familiar with production processes and IT specialists who can perform data analytics. Bridging this gap requires tools that make industrial data accessible to non-technical experts.

Apache Ecosystem Integration

As an Apache project, StreamPipes integrates with other Apache Software Foundation projects:

Apache PLC4X

StreamPipes relies on Apache PLC4X for IoT connectivity—a library providing drivers for industrial protocols enabling connections to PLCs and production systems.

Apache ECharts

The platform uses Apache ECharts, a powerful visualization library, for all data visualizations including charts, graphs, and dashboards.

Other Integrations

Beyond Apache projects, StreamPipes connects to numerous third-party systems:

  • Message brokers (Kafka, MQTT, RocketMQ, NATS)
  • Databases (InfluxDB, PostgreSQL, TimescaleDB)
  • Cloud platforms
  • Analytics frameworks

System Architecture Overview

StreamPipes consists of several modules supporting different stages of the IIoT lifecycle:

Data Connectivity

Quickly connect to industrial data sources without programming:

  • OPC UA servers
  • PLCs (Siemens, Allen-Bradley, Modicon)
  • MQTT brokers
  • REST APIs
  • File uploads (CSV, JSON)
  • Simulators for testing

Pipeline Editor

Web-based tool for creating data analytics pipelines:

  • Drag-and-drop interface
  • Reusable processing elements
  • Real-time data transformations
  • Harmonization and enrichment
  • Complex event processing

Data Explorer

Visual exploration of historical time-series data:

  • Query builder interface
  • Multiple visualization types
  • Aggregation functions
  • Data export capabilities

Live Dashboard

Real-time monitoring for shop floor personnel:

  • Customizable widgets
  • Multiple dashboard layouts
  • Real-time updates
  • Role-based access control

Programmatic Access

SDKs for developers:

  • Python client for data science workflows
  • Java client for application development
  • Integration with machine learning libraries

Technical Architecture

Data Flow Architecture

Data Sources: OPC UA, PLCs, MQTT, REST APIs, file systems

Adapter Library: Microservices containing protocol-specific connectors configured through the web UI. Adapters collect data from sources and forward to the message broker.

Message Broker: Central communication channel between data sources and processing algorithms. StreamPipes supports multiple brokers:

  • Apache Kafka
  • NATS
  • MQTT

The messaging layer is exchangeable—choose the broker that fits your infrastructure at installation time.

Time-Series Database: Included storage for historical data analysis. StreamPipes can use various time-series databases including InfluxDB.

Pipeline Element Microservices: Standalone services providing business logic for transforming and analyzing live data:

  • Trend detectors
  • Filters and aggregators
  • Statistical operators
  • Machine learning algorithms
  • Notification handlers

Pipeline Management: Component orchestrating microservices based on user-defined pipelines, interacting with the web UI and programmatic clients.

User Interface: Web-based interface for all StreamPipes modules—pipeline editor, data explorer, dashboards, adapter configuration.

Microservices Architecture

StreamPipes uses a microservices architecture providing several advantages:

Scalability: Individual components scale independently based on workload

Flexibility: Add or remove processing elements without affecting the system

Technology Diversity: Different microservices can use different technologies and languages

Fault Isolation: Failures in one component don't cascade to others

Connecting Industrial Data Sources

Supported Protocols

StreamPipes supports numerous industrial protocols:

OPC UA: Modern standard for industrial communicationS7 (Siemens): Direct PLC connectivity via Apache PLC4XModbus: Widely used in legacy equipmentMQTT: Lightweight messaging protocolREST APIs: HTTP-based interfacesFile Formats: CSV, JSON uploads for testingSimulators: Built-in data generators

Adapter Configuration Process

Connecting data sources follows a consistent workflow:

Step 1: Select Protocol

Choose the appropriate adapter from the library based on your data source type.

Step 2: Configure Protocol Settings

Provide protocol-specific parameters:

  • Connection details (IP address, port)
  • Authentication credentials
  • Polling frequency or subscription mode
  • Security settings

Configuration templates can be saved for reuse with similar data sources.

Step 3: Define Format (if applicable)

For brokers or file sources, specify the data format:

  • JSON
  • CSV
  • XML
  • Binary formats

Step 4: Model Event Schema

StreamPipes works with event streams having defined schemas. Configure:

Data Types: Integer, Float, Boolean, String, Timestamp

Semantic Descriptions: Add metadata describing what measurements represent

Transformations: Apply unit conversions or calculations during ingestion (e.g., Fahrenheit to Celsius)

Required Fields: Ensure timestamp fields exist (auto-generate if missing)

The event schema modeling step is crucial—it enables StreamPipes to provide intelligent assistance when building analytics pipelines.

Step 5: Execute Adapter

Start the adapter, which begins collecting data and publishing to the message broker. Optionally, automatically persist data to the time-series database for later analysis.

Summary

Apache StreamPipes provides a comprehensive open-source platform for industrial IoT data collection and analysis. Its microservices architecture enables non-technical users to connect diverse industrial protocols, build sophisticated analytics pipelines through drag-and-drop interfaces, explore historical data visually, and create monitoring dashboards—all without programming.

For data scientists and developers, StreamPipes offers Python and Java clients enabling programmatic access, integration with machine learning libraries like River, and a platform for building custom IoT applications. The active Apache community ensures ongoing development, while commercial support options exist for enterprise deployments.

By bridging the gap between OT and IT, StreamPipes empowers manufacturing organizations to unlock the value in their industrial data, enabling data-driven decision-making and continuous improvement across production operations.