November 8, 2025

Time Series Data Architecture for Manufacturing

When data and analytics leaders plan industrial data platforms, the database selection often becomes a tactical decision delegated to engineering teams. But Brian Gilmore, Director of Product Management for IoT at InfluxData, argues that how you architect time series data storage fundamentally shapes what analytics capabilities you can deliver and at what cost. His perspective on time series databases reveals strategic choices that affect everything from predictive maintenance accuracy to cloud infrastructure spending.

‍

Why Time Series Data Is Different: Understanding the Core Challenge

‍

Manufacturing generates fundamentally different data than transactional business systems. When a sensor on a production line captures vibration measurements 1,000 times per second, or when you're monitoring temperature across hundreds of reactors every few seconds, you're dealing with data where the timestamp isn't just metadata—it's the primary organizing principle.

Gilmore explains: "Individual timestamp streams together make up a high-definition model of operations. Having all that information streamed and stored in a way that you can piece it back together based on that construct of time is what creates value."

This distinction matters because traditional relational databases weren't designed for this data pattern. When you attempt to store high-frequency time series data in SQL databases, you encounter:

Write performance bottlenecks: Relational databases optimize for transactional consistency, not for ingesting millions of timestamped data points per second
Query inefficiency: Retrieving time-windowed data requires expensive index scans when timestamps are just another column
Storage inefficiency: Standard compression algorithms don't exploit the patterns inherent in time series data, leading to unnecessary storage costs
Schema rigidity: Industrial environments evolve—new sensors get added, measurement parameters change. Relational schemas require migration overhead for these changes

Process historians solved this problem decades ago in industrial control systems, becoming the original time series databases. But they typically operated in isolation—proprietary systems locked inside plant networks, disconnected from enterprise analytics platforms. Modern time series databases bring that same specialized capability but integrate with cloud analytics, machine learning frameworks, and enterprise data ecosystems.

‍

The Architecture Question: Edge, Cloud, or Hybrid Storage

‍

One of the most consequential decisions data leaders face is where to store time series data. This isn't just about database deployment location—it fundamentally affects your platform's capabilities and economics.

Edge storage provides:

Resilience during network outages: Manufacturing facilities with unreliable connectivity need local analytics to continue operating when cloud connectivity fails
Reduced latency for local dashboards: Operators monitoring production lines need real-time visualization without round-trip cloud delays
Bandwidth optimization: Store high-frequency data locally, aggregate or filter it, then transmit only meaningful summaries to the cloud
Regulatory compliance: Some industries require operational data to remain within specific geographic or network boundaries

Cloud storage provides:

Cross-facility analytics: Comparing performance across multiple plants requires centralized data
Unlimited scale: Cloud storage grows with your data volume without capital expenditure
Advanced analytics access: Machine learning and complex analytical workloads benefit from cloud compute resources
Long-term retention: Cost-effective storage for historical data that informs strategic decisions

Hybrid architectures deliver both, but require thoughtful planning. Gilmore emphasizes understanding your use cases first: "Are you trying to monitor operations in real-time? Are you trying to do historical analysis? Are you trying to do predictive stuff? These different use cases require different deployment patterns."

The common pattern emerging in manufacturing is edge collection and local analytics with selective cloud replication. Store everything locally at high frequency, run operational analytics at the edge, then transmit aggregated or anomaly-flagged data to the cloud for cross-facility insights and model training.

‍

Deployment Options: Matching Architecture to Organizational Capability

‍

Time series databases offer multiple deployment models, and the choice reflects your organization's operational maturity and resource constraints. Each model carries different trade-offs for control, operational burden, and cost.

Self-managed on-premises deployment:

You install and operate the database on your own infrastructure. This provides maximum control over data location, security policies, and customization. But it requires dedicated database administration expertise, capacity planning, backup management, and upgrade processes. For organizations with established data operations teams and specific compliance requirements, this model makes sense.

Managed cloud service:

The database vendor operates the infrastructure while you use the database through APIs. This eliminates operational overhead—no capacity planning, patching, or backup management. You pay for usage rather than infrastructure. The trade-off is less control over data location and potential vendor lock-in. For organizations prioritizing speed of deployment and focusing data team resources on analytics rather than operations, managed services reduce time-to-value.

Hybrid edge-cloud deployment:

Edge gateways run local database instances while synchronizing with cloud-hosted central storage. This combines edge resilience with cloud scalability but increases architectural complexity. You're now managing distributed database synchronization, conflict resolution, and ensuring data consistency across locations.

The strategic question isn't which deployment model is "best"—it's which aligns with your organization's capabilities and constraints. A global manufacturer like Toyota with established data operations teams might prioritize control through self-managed deployment. A pharmaceutical company like Novo Nordisk prioritizing rapid deployment of analytics across new facilities might choose managed services to accelerate rollout.

‍

Data Ingestion Strategy: Connecting Industrial Systems

‍

The database selection matters less than the ingestion architecture—how you actually get data from industrial equipment into storage. Manufacturing environments present unique challenges that enterprise IT systems don't typically face.

Protocol diversity across the industrial stack:

Modern facilities mix legacy SCADA systems using Modbus with newer equipment using OPC UA, alongside IoT sensors communicating via MQTT. Your data platform needs adapters for all of these protocols. Gilmore notes that InfluxData maintains integrations with most industrial protocols, but emphasizes the importance of message brokers like HiveMQ as intermediation layers.

The MQTT + time series database pattern:

A common architecture pattern uses MQTT brokers as the ingestion point for all devices, then subscribers write data to the time series database. This decouples data collection from storage, providing flexibility to add new data consumers without modifying edge devices. It also provides buffering during database maintenance or upgrades.

Edge preprocessing considerations:

Not all sensor data deserves storage. High-frequency vibration sensors might generate gigabytes per hour, but only anomalies or statistical summaries need long-term storage. Edge preprocessing—filtering, aggregation, or feature extraction—before database ingestion dramatically reduces storage and transmission costs.

The ingestion architecture decision affects your long-term flexibility. Tightly coupling devices directly to database APIs creates brittle systems. Using intermediation layers like message brokers and edge processing provides architectural flexibility as your requirements evolve.

‍

The Ecosystem Approach: Legos vs Monoliths

‍

Perhaps the most valuable insight Gilmore offers is his "Legos analogy" for building industrial data platforms. Rather than purchasing monolithic platform suites from single vendors, modern data leaders assemble best-of-breed components.

"In the hands of amazing technologists, systems integrators, consultants, these Legos come together and create models that people have no idea how happened. But when we just hand operations managers a box full of Legos, they have no idea how to get from the box to what they want."

This presents both opportunity and challenge. The opportunity is architectural flexibility—you're not locked into a single vendor's roadmap or pricing. You can swap components as better options emerge. The challenge is integration complexity and the need for architectural expertise.

Components in a modern industrial data stack:

Message broker (HiveMQ, Mosquitto): Protocol translation and message routing
Time series database (InfluxDB, TimescaleDB): High-frequency data storage optimized for temporal queries
Visualization (Grafana): Operational dashboards and real-time monitoring
Edge orchestration (Balena, K3s): Managing and updating edge deployments
Analytics frameworks (Python/Pandas, Apache Spark): Complex analytical workloads
Machine learning platforms (TensorFlow, PyTorch): Predictive analytics and anomaly detection

The ecosystem approach works when you have strong architectural guidance. This is why partnerships between complementary vendors matter—not just technical integration, but documented reference architectures that show how components work together. Gilmore mentions collaboration between InfluxData, HiveMQ, Balena, and Grafana to provide these "instruction manuals" that help organizations assemble complete solutions.

For data leaders, this means evaluating vendors not just on individual product capabilities, but on their ecosystem partnerships and commitment to interoperability.

‍

Open Source vs Commercial: Balancing Access and Support

‍

InfluxDB's open source foundation raises important questions about when to use open source versus commercial database offerings. This decision affects both short-term costs and long-term platform sustainability.

Open source advantages:

Low barrier to entry: Development teams can start using databases immediately without procurement processes
Transparency: You can inspect code, understand performance characteristics, and troubleshoot issues without vendor dependencies
Community innovation: Open source projects benefit from contributions across many organizations
Cost efficiency at scale: No per-node licensing makes large deployments economically feasible

Commercial offering advantages:

Enterprise support: Critical production issues get rapid vendor response
Advanced features: Clustering, high availability, and security features often exist only in commercial versions
Compliance certifications: Enterprise requirements for SOC2, GDPR, or industry-specific compliance
Operational services: Managed cloud services eliminate database administration burden

The strategic approach many organizations take is starting with open source for proof-of-concept and development environments, then upgrading to commercial offerings for production deployments requiring enterprise support and advanced features. This lets data teams validate technology fit before committing budget, while ensuring production systems have necessary support.

‍

Conclusion

‍

Time series databases aren't just specialized storage systems—they're foundational infrastructure that determines what analytics capabilities your platform can deliver. The choice between edge and cloud storage, self-managed versus managed services, and open source versus commercial offerings reflects deeper strategic choices about control, speed, and organizational capability.

For data and analytics leaders in manufacturing, the message is clear: database selection decisions made today affect your platform's capabilities for years. Choose based on your actual query patterns, plan explicitly for edge-cloud hybrid architectures, and prioritize ecosystem compatibility over feature checklists. The goal isn't finding the "best" time series database—it's architecting a complete data platform that delivers analytics value at sustainable cost.

As industrial data volumes continue growing exponentially, organizations with thoughtfully architected time series storage will deliver real-time operational intelligence while controlling infrastructure costs. Those defaulting to general-purpose databases will find themselves fighting performance battles and absorbing unnecessary costs. The difference is strategic architecture, not just technology selection.

‍

Kudzai Manditereza

Founder & Educator - Industry40.tv

Kudzai Manditereza is an industrial data and AI educator and strategist. He specializes in Industrial AI, IIoT, Unified Namespace, Digital Twins, and Industrial DataOps, helping manufacturing leaders implement and scale Smart Manufacturing initiatives.

Kudzai shares this thinking through Industry40.tv, his independent media and education platform; the AI in Manufacturing podcast; and the Smart Factory Playbook newsletter, where he shares practical guidance on building the data backbone that makes industrial AI work in real-world manufacturing environments. Recognized as a Top 15 Industry 4.0 influencer, he currently serves as Senior Industry Solutions Advocate at HiveMQ.