From R&D to Continuous AI: A Phased Operating Model for O&G AI Programs

Almost every oil-and-gas AI program is sold on the model and dies on the handover. A team trains something that beats the baseline, presents a deck that impresses a steering committee, and then the program quietly stops — not because the model failed, but because nobody built the operating path from a research result to a system the operator runs by itself. We have watched this pattern across operators in the Middle East and the United States, and the cure is not a better architecture. It is an explicit, phased operating model that treats the transition into production as a funded engineering phase rather than a closing courtesy. This piece lays out the Phase 0-5 ladder we built and ran across a roughly twenty-month subsurface-AI engagement with a mid-sized Middle East carbonate operator we partnered with, and explains why the most under-budgeted box on that ladder — the ICT handover — is where most programs stall.

Why "the model is done" is the most dangerous sentence in an AI program

A trained model is a function: data in, predictions out. In our engagement the anchor model was a Detection Transformer that picks fractures, beds, and vugs off high-resolution borehole image logs as a set of sinusoids — a genuinely hard computer-vision problem, solved end-to-end with set prediction and a Hungarian matching loss rather than a hand-tuned image pipeline. That model earned real peer-reviewed metrics. And on the day it converged, the program was nowhere near done.

The reason is that the model is a small fraction of the engineering stack underneath it. A working model needs seismic-scale compute beneath it, a data-engineering layer that knows the provenance of every raw log file it ingests, a unification layer that resolves the operator's many logging conventions into one ontology, an MLOps layer that versions data and experiments and watches for drift, and finally a platform layer that serves predictions inside a closed network where a petrophysicist — not a data scientist — invokes them. Skip any layer below the model and the model detaches into pilot purgatory. The instrument below lets you drag the build line up that load-bearing column and watch the model ship only when every layer beneath it is built to production grade.

Pilots don't stall because the model is weak. The working model is only ~15% of the journey; the other ~85% is a six-layer engineering stack (HPC → Data engineering → Data unification → AI/ML → Agents → Platform/deployment), and a project ships only when every layer below the model is built to production grade. Drag the build line up the load-bearing column: with all six built the model reaches the production ceiling; with any gap below it the model detaches into POC purgatory — the ~50% that never ship. The ~15%/~85% split, the six layers and the ~50% figure are the whitepaper's own; the equal-sixths column sizing is schematic.

The funnel is the technical statement of the problem. The operating model is the organisational answer to it: a sequence of phases, each with a clear deliverable and a clear decision gate, that walks a program from a research idea up that stack without skipping a layer. We numbered them 0 through 5.

The Phase 0-5 ladder

The ladder is a value-versus-time path. Early phases cost money and produce capability, not revenue; later phases convert that capability into operational leverage and, eventually, into a reusable asset the operator can scale across business units. The discipline is to know which phase you are in, what its exit gate is, and to refuse to claim a later phase's value while still standing in an earlier one.

Phase 0 — Governance, scoping, and EDA. Before a single model trains, you settle data-sharing and confidentiality, agree the scope, and run exploratory data analysis on what the operator actually has. In our engagement the scope was a database of more than eighty processed and interpreted borehole image logs, with image detail down to roughly 50 micrometres. Phase 0 is unglamorous and it is where most of the avoidable failure is designed out: a program that has not agreed how data leaves the operator's perimeter, or what "good" looks like, has not started — it has only scheduled its first argument.

Phase 1 — Dataset development, MLOps foundation, algorithm and model development, performance evaluation. This is the phase people mistake for the whole program. It is where the dataset gets engineered (in our case roughly 900 image-and-ground-truth pairs were grown to more than 55,000 — a 65x increase — through overlap and augmentation), where the data-versioning and experiment-tracking spine is laid, where the architecture is developed and trained, and where it is honestly evaluated against held-out geology. Phase 1 produces a model. It does not produce a product, and the exit gate must say so explicitly, or the program will try to bank Phase 4 value on a Phase 1 artefact.

Phase 2 — Data pre-processing at scale, multi-well correlation, and AI model integration. Here the work crosses from one-well demonstrations to the field. The pre-processing pipeline hardens, the model integrates with the operator's tooling, and — the defining Phase 2 capability in our engagement — correlation scales from a handful of wells toward the full field, a span we scoped as 2 to 80 wells. This is also where the program first feels its data dependency: trustworthy multi-well correlation needed a minimum of roughly ten to fifteen wells with consistent picks before the results held up.

Phase 3 — Production CI/CD, model serving and rollback, and an MVP for new business areas. Phase 3 is industrialisation. The model gets a serving layer, a continuous-integration pipeline, the ability to roll a bad version back, and a minimum-viable footprint in a new part of the business. In our engagement the productised tools that emerged here — automated fracture and vug interpretation, and a well-to-well correlation tool — delivered roughly 5x faster interpretation, and on correlation a +60% lift in interpreter productivity, +75% improvement in interpretation accuracy, 95% precision against target horizons, and 90% stratigraphic correlation success. None of those numbers came from a new loss function; they came from the serving, CI/CD, and evaluation layers built in Phase 3.

Phases 4 and 5 — Commercialisation, an AI fund, and scale-out. The final rungs convert a working capability into a reusable, fundable asset: a standing budget to keep models fresh, new growth areas, and scale-out or spin-out of the platform beyond its first business unit. These are the phases most programs gesture at in a kickoff deck and never reach — because they require everything below to have actually shipped.

The naive read of a 10× productivity jump is 'cut 90% of the team.' The article argues the opposite: at the same headcount, the dividend buys ten times the interpreted acreage. Drag the allocator — spend the dividend on headcount cuts and acreage stays at 1×; hold the team and acreage climbs toward 10×. The single orange marker is the article's own position. The ~10× throughput multiple (6–18 weeks → hours) is sourced; the headcount-retained axis is the arithmetic inverse of 10× and is labelled illustrative, since the article states no specific headcount figure.

The handover is the phase nobody budgets

If the ladder has a single load-bearing rung, it is the transition out of R&D and into the operator's own ICT team — the move from Phase 3 into a steady operational state. This is where programs stall, and they stall for a reason that has nothing to do with model quality: a system that was engineered as research cannot be handed over, because handover is the acceptance test for everything built before it. If the data cannot be re-versioned, the pipeline cannot be re-run, the model cannot be rebuilt from a frozen checkpoint, and the runbooks live only in a researcher's head, then there is nothing to hand over — only a demo to admire.

A real handover is a deliberate choice among operating models, each with a different cost and a different degree of operator autonomy. In our engagement we framed three. At one extreme, the operator runs everything in-house on its own hardware behind its own firewall — maximum sovereignty, where retraining is a same-day, minutes-to-hours operation once the stack is local. In the middle, a managed off-prem arrangement where a retrain cycle runs on the order of two to three weeks. At the other extreme, a fully operator-driven build-it-yourself path measured in days of setup but maximal independence. These are not just deployment topologies; they are different answers to the question "who keeps this alive after we leave?" — and the investment envelope we put in front of the operator tracked them, scoped in tiers from roughly USD 250-350K to 650-800K to 1.5-4M depending on how much of the platform the operator owned outright.

The handover also has a human dimension that the technical funnel does not show. A model that lands inside an operator with no one trained to run it is a model with a half-life. In our engagement the program deliberately built local capability — a trained cohort of roughly 55 young professionals, of whom 15 were national hires — so that the people maintaining the system after transition were not a flight risk on a single contractor's roster. Capability transfer is part of the handover, not a separate goodwill exercise.

Continuous AI is a posture, not a milestone

The word "Continuous" in Continuous AI is doing real work. A model is not a deliverable you ship once; it is an asset that decays. Logging tools change, new fields come online, the geology under the next licence area is not the geology under the last one, and a model trained on yesterday's wells drifts on tomorrow's. The reason Phases 4 and 5 include an AI fund rather than a closing invoice is that keeping a deployed model honest is a standing operating cost, not a capital event.

This is also where the well-maturity view earns its place in the steering deck. In our engagement we tracked the field as a maturity gradient — early on, roughly 3 of 80 wells were model-ready, then 8 of 80, with a near-term target of 20 to 25 of 80. That single ratio reframed the whole conversation away from "is the model finished?" toward the right question: "how much of the field is the system trusted to interpret, and how fast is that coverage growing?" A program that answers the second question is in Continuous AI. A program still arguing about the first never left Phase 1.

If you are sponsoring an oil-and-gas AI program, the operating model is your real risk-management instrument — more than any single technical choice. Three things follow. Name your phases and their exit gates before kickoff, and refuse to bank a later phase's value from an earlier phase's artefact; the most expensive failures are programs that declared victory at Phase 1 and budgeted as if they were at Phase 4. Fund the handover as a phase, with versioned data, reproducible pipelines, rebuildable checkpoints, written runbooks, and trained people — it is the acceptance test for everything upstream. And treat Continuous AI as a posture you capitalise, not a finish line you cross. Do those three things and the model, the part everyone obsesses over, becomes the easy part.

Key takeaways

Most O&G AI programs stall at the ICT handover, not the model. Budget the transition into production as a funded engineering phase, with its own deliverable and decision gate.
Use an explicit Phase 0-5 ladder: governance/EDA -> dataset+MLOps -> multi-well correlation (2-80 wells) -> production CI/CD/serving -> AI fund and scale-out. Know which phase you are in and refuse to claim a later phase's value from an earlier phase's artefact.
The model is a small fraction of the stack. HPC, data engineering, data unification, MLOps, and a serving platform all sit beneath it; skip any layer and the pilot detaches into purgatory.
Phase 3 industrialisation — serving, CI/CD, rollback — is where value appears: roughly 5x faster interpretation and, on well-to-well correlation, +60% productivity, +75% accuracy, 95% target precision, 90% stratigraphic success. None of it came from a new architecture.
Continuous AI is a funded posture, not a finish line. Pick a hosting and retraining model deliberately (we framed three, from fully operator-run to managed off-prem) and capitalise it via a standing AI fund so models stay fresh after handover.

From R&D to Continuous AI: A Phased Operating Model for O&G AI Programs

Why "the model is done" is the most dangerous sentence in an AI program

The Phase 0-5 ladder

The handover is the phase nobody budgets

Continuous AI is a posture, not a milestone

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on

Why "the model is done" is the most dangerous sentence in an AI program

The Phase 0-5 ladder

The handover is the phase nobody budgets

Continuous AI is a posture, not a milestone

What this means for a program sponsor

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on