Skip to main content

Blog

Serving a Model Inside a Closed Oil-and-Gas Network

Production subsurface AI is not a cloud-inference problem — it is an on-prem serving problem where the network's air gap is a hard requirement, not a deployment preference. This is how a fracture-detection model went from a notebook on a GPU server to a per-well inference service inside a mid-sized Middle East carbonate operator's closed corporate network: a Streamlit app containerized with Docker, exposed on port 8501, reachable only over an internal LAN API into the operator's earth-analytics platform, with zero internet egress.

Tarry Singhby Tarry Singh10 min read
EarthScan insight

The model worked. That was supposed to be the hard part. A Detection Transformer, trained on borehole image logs to pick fractures and bedding planes and regress the depth, dip, and azimuth of each sinusoid, was clearing its accuracy gates and producing picks a petrophysicist would sign off on. In a roughly twenty-month, three-phase engagement with a mid-sized Middle East carbonate operator we partnered with, the science was effectively settled. And then we hit the part nobody puts on a roadmap slide: getting that model to serve predictions — reliably, to a real interpreter at a real workstation — inside a corporate network that does not, and will not, touch the public internet.

This piece is about that second hard part. It argues something applied-AI teams underestimate constantly: for an operator handling confidential subsurface data, production serving is not a cloud-inference problem you solve with a managed endpoint. It is an on-prem serving problem, and the network's isolation is not a deployment preference to be negotiated away. It is a hard requirement that shapes the entire serving architecture from the first line of the Dockerfile.

The deployment constraint comes first, not last

Most MLOps tutorials present serving as the victory lap: train the model, wrap it in a web framework, push the container to a registry, expose an HTTPS endpoint, point a DNS record at it. Every step assumes the one thing a producing oil-and-gas asset will not give you — egress.

The data here is a confidential producing carbonate play. The image logs, the interpreter's picks, the well tops, the field-wide correlation — all of it is among the operator's most sensitive intellectual property, and none of it leaves the perimeter. There is no model artefact uploaded to a cloud bucket, no inference request that traverses the open internet, no telemetry phoning home. The serving target was the operator's own earth-analytics platform, on the operator's own hardware, reachable only from inside the operator's own network: a closed corporate network behind a VPN, served to with zero internet egress.

That constraint is not a footnote. It is the design input that determines everything downstream — the framework, the transport, the packaging, the place the container physically runs. So we designed serving backward from the air gap.

THE 85% UNDER THE MODEL · 6-LAYER STACK~50%of pilots never ship3 / 6 layers load-bearingBuild the stack up — the model is only the capA model is only as production-ready as the weakest layer below it.production ceilingModel — ~15% of the journey⤓ detached — POC purgatoryHPCbuilt · load-bearingData engineeringbuilt · load-bearingData unificationbuilt · load-bearingAI / MLdrift watch is decorativeAgentsunauditablePlatform & deploymentoutside the perimeterbuild linedrag the build line ↑ · column sizing schematicWHY THE PILOT STALLS3 layers missing below the model.Lowest gap — AI / ML:drift watch is decorative.The model can't reach production overan incomplete stack. It joins the ~50%that never ship — a failure of plumbing.The working model is ~15% of the journey.The other ~85% is the six-layer stack —and pilots die where the stack has seams.Own the stack: data + weights stay in your perimeter.~15% model / ~85% stack, the six named layers & the ~50%-never-ship figure are the whitepaper's own · column sizing schematic
Pilots don't stall because the model is weak. The working model is only ~15% of the journey; the other ~85% is a six-layer engineering stack (HPC → Data engineering → Data unification → AI/ML → Agents → Platform/deployment), and a project ships only when every layer below the model is built to production grade. Drag the build line up the load-bearing column: with all six built the model reaches the production ceiling; with any gap below it the model detaches into POC purgatory — the ~50% that never ship. The ~15%/~85% split, the six layers and the ~50% figure are the whitepaper's own; the equal-sixths column sizing is schematic.

The funnel above is the reason this matters. A working model is a thin slice of a production system — on the order of fifteen percent. The other eighty-five percent is a load-bearing stack of engineering layers below it, and the bottom layer, platform and deployment, is precisely the one that sits outside the perimeter by default. Roughly half of AI pilots never cross from proof-of-concept into production, and in our experience with operators across the Middle East and the United States the failure is rarely the model. It is that the serving layer was treated as an afterthought and collided, late, with a security requirement that should have been a starting assumption. Get the bottom of the stack wrong and the fifteen percent at the top never ships.

The serving shape: a containerized inference app, not an API gateway

So what does serving actually look like when the network is closed? Concretely: a single, self-contained inference application, containerized, running on a server inside the network, reachable over the internal LAN — and nothing more exotic than that.

The serving framework was Streamlit. That choice surprises people who expect a hardened model-server like TorchServe or Triton fronting a REST contract. But the deployment context rewards a different trade-off. The consumer of these predictions is a geoscientist, not another microservice. What they need is to open a well, see the model's predicted bedding and fracture sinusoids overlaid on the uninterpreted image-log section, and judge the picks with their own eyes. A serving stack that bundles inference and an interpretable visual surface into one artefact is a better fit than a bare prediction API that then needs a separate front end someone has to build, deploy, and secure independently. One app, one container, one thing to harden.

The packaging was Docker. The model, its Python runtime, the inference code, and the Streamlit serving layer were built into a single image with a Dockerfile that EXPOSEs Streamlit's default port, 8501. That image is the entire deployable unit. It is not pulled from a public registry at runtime; it is built, carried inside the perimeter, and run on an internal corporate server. Containerization here is doing the job it is actually good at — pinning the exact runtime so the thing that serves is bit-identical to the thing that was validated — rather than its more glamorous job of elastic cloud orchestration, which is irrelevant when there is exactly one place the container is allowed to run.

The transport was a LAN API. The architecture and timeline were agreed directly with the operator's geomatics team, and the integration was defined as an API living entirely within the local LAN. The earth-analytics platform embeds the Streamlit app per well: each well's description card carries a link that opens the model's view for that specific well. From the interpreter's side it is one click from a well to its predicted sinusoids. From the network's side it is one internal hostname and one port, never resolvable, never routable from outside.

Why port 8501 is a security decision, not a default

Streamlit listens on 8501 out of the box, and it is tempting to treat that as boilerplate. Inside a closed network it is the opposite. Because the container is reachable only over the internal LAN, that single exposed port is the entire attack surface of the serving layer. There is no public ingress to firewall, no TLS-terminating proxy to misconfigure, no API key to leak — the perimeter is the network itself. The serving app does not authenticate users or guard egress because, architecturally, it cannot reach anything it should not, and nothing outside the network can reach it. The air gap does the work a stack of cloud security controls would otherwise have to.

Why "just deploy it to the cloud" is the wrong instinct

It is worth naming the alternatives we did not take, because each is the reflexive answer an engineer trained on consumer ML reaches for, and each breaks against the same wall. A managed cloud endpoint is out: the model cannot be served from anywhere the data is not. A hybrid "model in the cloud, data on-prem" split is out for the same reason — inference needs the image-log pixels, and those pixels are the confidential asset. Even a private cloud VPC, which we did use for development and test labs earlier in the programme, is the wrong home for production serving, because production lives where the production data lives. The deployment-strategy menu collapses from the usual containers-versus-VMs-versus-managed-services debate down to one viable answer: a container, on raw internal iron, behind the VPN.

This is the inversion that defines on-prem serving. In a typical cloud deployment the network is plumbing you provision to reach your model. In a closed oil-and-gas network the network is a given you serve your model into. You inherit where the data lives, how it is reached, and whether the world can see your endpoint — and the engineering problem is to make a working model live comfortably inside someone else's locked building.

What the serving layer inherits from the model

A closed-network serving design also leans on properties the model has to already possess, because you cannot patch them in production when production is unreachable.

Inference has to be self-contained and deterministic. The served model applies a fixed inference path with a probability threshold of 0.5 to decide which predicted sinusoids to keep — no post-hoc tuning, no human-in-the-loop reranking at serve time. It also has to respect the data's physical limits honestly: a single pixel of the source binary wireline log image corresponds to about 3 cm of true depth, so an irreducible roughly ±3 cm depth uncertainty is baked into every served pick, and the interpretation surface presents picks as decision support against that floor rather than implying a precision the sensor never had.

And the apparatus has to be reproducible enough that the operator's own ICT team can rebuild and redeploy it without us in the room. That is why the surrounding programme ran on a deliberately ownable on-prem stack — a multi-GPU server-class training box (on the order of 7.7 TB SSD, 512 GB RAM, 320 GB of GPU memory) for training, a long-term-support Linux base, an enterprise Git platform, experiment tracking, self-hosted file sync, and a custom MLOps layer — all inside the perimeter. Serving is the last mile of that chain. A container only one vendor can build is not a handover; it is a hostage.

The takeaway for infra and security teams

If you are standing up subsurface AI inside a real operator's network, the lesson is to treat the air gap as the first requirement, not the last surprise. The model is the easy fifteen percent. The serving layer is where applied AI meets your security posture, and in a closed oil-and-gas network those two things are not in tension — the network is the security control, and a well-shaped on-prem serving design lets you lean on it instead of rebuilding it in software.

Concretely: a single containerized inference-plus-visualization app, packaged with Docker so the served runtime is identical to the validated one, exposed on one internal port, reachable only over the LAN, embedded per-well into the platform the interpreter already uses, with no egress and nothing to phone home to. It is unglamorous. It is also the difference between a model that impressed a steering committee and a model an interpreter actually uses on Monday morning.

Key takeaways

  1. For an operator handling confidential subsurface data, production serving is an on-prem problem, not a cloud one. The network's isolation is a hard requirement that shapes the whole serving architecture — design backward from the air gap, not forward from a managed endpoint.
  2. The deployed system was a Streamlit inference app containerized with Docker (EXPOSE port 8501), run on an internal corporate server behind the VPN, reachable only over a local LAN API, and embedded per-well into the operator's earth-analytics platform — with zero internet egress.
  3. A working model is roughly 15% of a production system; the other 85% is the engineering stack beneath it, and the platform-and-deployment layer is the one that sits outside the perimeter by default. Roughly half of AI pilots never ship, and the serving layer — not the model — is where they usually die.
  4. The network constraint collapses the deployment menu: managed endpoints, hybrid splits, and even development VPCs are all out for production. The one viable answer is a container on internal iron behind the VPN — so the single exposed port becomes the entire, deliberately minimal, attack surface.
  5. Serving inherits what the model already is: deterministic inference at a fixed 0.5 probability threshold, honest physical limits (1 binary wireline log pixel ≈ 3 cm depth, an irreducible ±3 cm floor), and enough reproducibility that the operator's own ICT team can rebuild and redeploy the container without the vendor in the room.
Go to Top

© 2026 Copyright. Earthscan