October 3, 2025

How to build a local industrial AI agent [Python, LangChain, Ollama, Mistral, OPC UA]

The AI agent we built in Part 1 works beautifully—but it has a significant limitation. It relies entirely on Claude, a cloud-based large language model hosted on Anthropic's servers. For many industrial applications, this creates real problems around cost, data privacy, and latency. When you're making operational decisions on a factory floor, you can't always depend on internet connectivity or accept the risk of sending sensitive production data to external servers.

That's exactly what we're solving in this tutorial.

What We're Building

This is Part 2 of our five-part series on building agentic AI for industrial systems. In this tutorial, we're taking the exact same AI agent we built previously—the one that monitors our simulated batch production facility and makes go/no-go decisions about production runs—and we're making it run entirely locally on edge hardware. No cloud dependency, no external API calls, no data leaving your network.

The remarkable thing is how little code we actually need to change. We're keeping all our data connections, all our tool definitions, and all our agentic logic exactly the same. We're literally just swapping out the brain of our agent, replacing the cloud-based Claude model with a locally-running large language model using Ollama and Mistral.

Why Local Deployment Matters

Running AI agents in the cloud works fine for many applications, but industrial environments have unique requirements that make local deployment not just preferable but often essential. Data privacy is a major concern—production data, recipes, and operational metrics are often proprietary information that companies cannot risk sending to external servers. Latency becomes critical when decisions need to happen in real-time, and you can't afford the delay of round-trip network calls. Cost considerations matter too, especially when you're making thousands of queries per day across multiple facilities.

Beyond these practical concerns, there's also the matter of reliability. Industrial operations can't stop because of an internet outage or an API service disruption. A locally-running AI agent continues to function regardless of network conditions, making it far more suitable for mission-critical industrial applications. This is what we mean by "edge deployment"—running the AI where the data is generated and where decisions need to be made, not in some distant data center.

The Technical Approach

The beauty of using LangChain as our framework is that it abstracts away the differences between various large language models. Whether you're talking to Claude in the cloud or Mistral running on your local hardware, the interface remains the same. This means our agent's architecture—how it calls tools, processes information, and makes decisions—doesn't need to change at all.

We're using Ollama to manage our local language model deployment. If you're familiar with Docker, think of Ollama as Docker but specifically designed for AI models. It handles all the complexity of downloading models, managing memory and compute resources, optimizing performance for your specific hardware, and providing clean APIs that our application can connect to. Ollama is completely free, open source, and runs on Windows, Mac, and Linux, making it accessible regardless of your development environment.

For the language model itself, we're using Mistral, specifically the 7 billion parameter version. This is a powerful open-source model that's small enough to run on typical edge hardware but capable enough to handle the reasoning required for our industrial decision-making. The model selection depends on your hardware—more powerful machines can run larger models with better performance, while more constrained environments might need smaller, more efficient models.

What the Tutorial Covers

The full video tutorial walks through the complete process of converting our cloud-based agent to run locally. We start by reviewing the existing agent code so you understand exactly what we're working with—the data access scripts that connect to our OPC UA server and TimescaleDB database, the tools that expose these functions to the AI, and the main agent logic that orchestrates everything together.

Then we dive into installing and configuring Ollama. You'll see the actual installation process, how to verify it's working correctly, and how to download and run your first local language model. We test the model with a simple query to make sure it's functioning properly before integrating it into our agent.

The code modification itself is surprisingly minimal. We install the LangChain Ollama integration, change a single import statement, and replace one line where we instantiate our language model. That's essentially it. The rest of our code—all the industrial data connections, all the tool definitions, all the prompt engineering—remains completely unchanged. This demonstrates the power of building with abstraction layers and frameworks that separate concerns.

Finally, we run the modified agent and watch it make production decisions using the locally-running model instead of the cloud-based one. You'll see the agent query our OPC UA server for real-time tank levels and machine states, pull product recipes from the database, perform the necessary calculations, and deliver a decision with detailed reasoning—all without a single call to an external API.

What You'll Learn

By following this tutorial, you'll understand the fundamental differences between cloud-based and edge-deployed AI systems, and more importantly, when each approach makes sense for your use case. You'll learn how to install and configure Ollama for local LLM deployment, including how to choose appropriate models based on your hardware constraints. The tutorial demonstrates the practical process of migrating from cloud to local AI, showing you that it's often simpler than you might expect.

You'll also gain insight into the role of frameworks like LangChain in creating portable, flexible AI applications. Because we built our agent properly from the start—with clear abstractions and separation of concerns—swapping out the underlying language model becomes trivial. This is a crucial lesson for building production AI systems that can adapt as technology evolves.

Perhaps most importantly, you'll see a working example of an industrial AI agent running entirely at the edge, processing real operational data without any external dependencies. This is the foundation for production deployments where reliability, privacy, and performance are non-negotiable.

‍

Watch the Full Tutorial

The complete video tutorial provides a step-by-step walkthrough of converting a cloud-based industrial AI agent to run entirely locally on edge hardware. You'll see the actual installation process, code modifications, and live testing of the local agent making production decisions.

‍

Current Limitations and What's Next

The video demonstrates that while the local agent works and makes correct decisions, there are some rough edges with response formatting that need adjustment for specific models. Different language models have different quirks and strengths, and part of deploying locally means tuning your prompts and parsing logic for your chosen model. This is normal and expected—the video shows you the reality of working with these systems, not just the polished final result.

More significantly, there's a major limitation that applies to both the cloud and local versions of our agent. Right now, the agent only reasons over structured data—the real-time signals from our OPC UA server and product recipes from our database. But industrial operations generate massive amounts of unstructured data that's equally important for decision-making. Think about maintenance records stored in PDFs, operational procedures in Word documents, or historical performance data in Excel spreadsheets.

For example, imagine there's a scheduled maintenance task due in eight hours. If our AI agent doesn't know about it, it might make the wrong call about whether to proceed with a long production run. The agent needs access to this unstructured context to make truly informed decisions.

Part 3 of this series addresses exactly this challenge. We'll build an agentic RAG (Retrieval-Augmented Generation) pipeline that allows your AI agent to bring unstructured documents into its context, creating a system that can reason over both your real-time operational data and your institutional knowledge stored in various document formats.

‍