Localisation by Design: Building Local AI Capability Inside an O&G Engagement

There is a particular slide near the end of almost every applied-AI proposal to a national oil company, and it is almost always the weakest one in the deck. It is the localisation slide — training local talent, building national capacity, and, in the Gulf, supporting national workforce localisation or its equivalent. It is weak because everyone in the room treats it as the part that does not have to be true. The models are the product; the people are the courtesy. The metrics get a results section; the talent gets a photo of a graduation. This piece argues the priority is exactly backwards — and the reasons are engineering ones, not just political. In a roughly twenty-month subsurface-AI engagement with a mid-sized Middle East carbonate operator we partnered with, the trained cohort was not the soft deliverable. It was the load-bearing one: a 55-person cohort, 15 of them young national hires from regional universities, was what converted a one-off productivity gain into a capability the operator could actually keep. Localising AI talent was a line item in the return, not a footnote in the sustainability report.

The throughput dividend is real, and it is the whole argument

Start with the number that makes the case, because without it "build local capability" is just a slogan. The two computer-vision tools we built on the operator’s borehole resistivity-image logs — one for fracture and bedding-plane detection with dip and azimuth regression, one for vug detection and sizing — interpreted roughly 5x faster than the manual pick-and-fit workflow they replaced, at around 30 seconds per metre of log. The cross-well correlation tool lifted interpretation productivity by about 60% and interpretation accuracy by about 75%, hitting 95% target-location precision and 90% stratigraphic-correlation success on the operator’s own wells.

A productivity multiple like that has two readings, and the naive one is corrosive: if a machine interprets five times faster, you need a fraction of the interpreters. That read turns a localisation programme into theatre — you train people for a system being sold, on the same slide, as the thing that makes them redundant. The read we built the programme around is the inverse: hold the team, and the dividend buys multiples of the coverage the same team could never have reached by hand. The acreage interpreted goes up; the headcount does not come down. The trained cohort is not competing with the dividend — it is the only thing that can spend it.

The naive read of a 10× productivity jump is 'cut 90% of the team.' The article argues the opposite: at the same headcount, the dividend buys ten times the interpreted acreage. Drag the allocator — spend the dividend on headcount cuts and acreage stays at 1×; hold the team and acreage climbs toward 10×. The single orange marker is the article's own position. The ~10× throughput multiple (6–18 weeks → hours) is sourced; the headcount-retained axis is the arithmetic inverse of 10× and is labelled illustrative, since the article states no specific headcount figure.

That distinction decides what the cohort is for. If the plan is to cut, you train nobody and outsource the residual. If the plan is to cover more ground with the people you have, you need those people fluent in the system — and the training budget becomes the mechanism that realises the dividend, not a cost subtracted from it.

What "trained" actually meant: the human layer of an engineering stack

The talent slide is usually weak because "trained" is left undefined. Trained on what? To do what? Here the answer was specific, and it maps onto a real engineering substrate rather than a workshop attendance sheet. A subsurface-AI tool is not a checkpoint file. It is a layered stack — high-performance compute, data engineering, the model itself, an MLOps layer, and a serving application — and every layer below the model needs a human who can operate it when next year’s wells arrive. The cohort was trained against the operator’s own data, on the operator’s own tools, across exactly those layers:

Data engineering. The detector models were trained from a corpus grown from roughly 900 image-and-ground-truth pairs to more than 55,000 — a 65x expansion through overlapped patch extraction and geometry-preserving augmentation. A data engineer who knows why two of an early ten-well intake were excluded for abnormal static-image ranges before training is worth more, at the next intake, than the trained weights are.
Model and computer vision. The fracture detector is a Detection Transformer — a DETR-style set-prediction model with a from-scratch ResNet-10 backbone, trained for 100 epochs at batch size 256. An interpreter who understands that the model emits a set of sinusoids — each with a regressed depth, dip and azimuth — can sanity-check a predicted pick against the geology. One who treats the output as an oracle cannot.
MLOps. Experiment tracking, data versioning, an orchestration layer and a lightweight serving app were handed over as code and runbooks. An MLOps engineer who can read a retraining run and tell genuine drift from sampling noise is the difference between a system that improves with each well and one that quietly rots at the first distribution shift.

This is why the localisation layer is the one no GPU provides. Hardware procures in weeks. A team that trusts, runs, retrains and improves a computer-vision pipeline — against confidential data, under an NDA, on a fractured carbonate play with its own quirks — does not. That is months of work on the operator’s own geology, and it is the only part of the handover that cannot be bought as a SKU.

Capability is not a workshop

The failure mode of every localisation programme is treating "training" as a generic upskilling event detached from the system being handed over. Capability transfer that sticks is trained on the operator’s data, against the operator’s tools, across the specific engineering layers the operator will have to run — data engineering, the model, and MLOps. The cohort that learned the pipeline on someone else’s benchmark dataset has learned a course; the cohort that retrained the operator’s own detector has learned a job.

Why this is ROI, not charity

The political case for national workforce localisation is well understood: national content requirements, sovereign technical capacity, a young workforce a state is investing in. None of that is the argument here, because none of it makes a steering committee fund a training budget over a procurement budget. The argument is that the trained cohort is the asset that makes every other asset durable.

Consider the alternative. An operator buys excellent models and the hardware to run them, and trains nobody. They work beautifully on the day of acceptance. Six months later a new tranche of wells arrives with a slightly different tool string, the input distribution shifts, accuracy drifts — and no one in the building can diagnose it, retrain the detector, or even tell whether the drift is in the data or the model. The 5x dividend evaporates, not because the model failed but because the capability to keep it was never transferred. The operator slides back to the service-company invoice it was trying to retire, and now owns idle GPUs as well.

The trained cohort is the insurance against exactly that. Fifteen young national hires and forty other young MENA professionals who can operate the data pipeline, read a retraining run, and validate a pick against the geology are what convert a point-in-time productivity number into a standing in-house function. That is why we counted the cohort as a deliverable with the same seriousness as the models — and why the localisation slide should be the strongest in the proposal, not the weakest. Across our engagements with operators in the Middle East and the United States, the operators who keep their AI running are consistently the ones whose own people were in the loop before the vendor left, not the ones who bought the best model and the most compute. The transfer of know-how is the product; the model is the medium it travels on.

How to budget it like you mean it

If capability transfer is ROI, it has to be budgeted like ROI — costed, scheduled, and acceptance-tested — not assumed into existence at the close. A few principles from this engagement:

Make the cohort a deliverable with an acceptance test. "55 trained professionals" is a vanity number unless the test is operational: can this team, unaided, retrain the detector on a new well intake and validate the output? That is the only definition of "trained" that survives the year after launch — and it means training on the operator’s data and tools, on the layers the operator will run, not on a generic course built around a public benchmark.

Pull the universities in early, not at graduation. The 15 young national hires came through regional universities, and the value of that pipeline is in the months of hands-on work on real subsurface data, not the photograph at the end. A regional AI-education partner can scale the cohort; the operator’s own geology is what makes the training stick. Frame it as a multi-cohort pipeline feeding a standing function, not a one-time class.

Size the cohort to the dividend, not to the optics. The right cohort size is set by the coverage you intend the throughput dividend to unlock — how much more acreage, how many more wells per cycle — and by the stack layers that need a named owner: a data engineer, an MLOps engineer, an interpreter who can challenge the model. Across this multi-phase programme that worked out to a cohort large enough to staff every layer below the model, with depth to spare. Staff to the function you are building, not to the headcount on a slide.

Get this right and the localisation slide stops being the part of the deck nobody believes. It becomes the part that explains why the operator will still be running its own subsurface AI long after the engagement that built it has closed.

Why localising AI talent is the deliverable, not the footnote

The throughput dividend is the whole argument: AI interpretation ran ~5x faster (about 30 s/m) and the correlation tool lifted productivity ~60% and accuracy ~75% (95% target precision, 90% stratigraphic success). The naive read is 'cut the team'; the correct read is 'hold the team and multiply the acreage' — and the trained cohort is the only thing that can spend the dividend.
'Trained' has to mean trained on the operator's own data, against the operator's own tools, across the real engineering layers — data engineering (a corpus grown 900 → 55,000 image-GT pairs, a 65x expansion), the DETR-style computer-vision model, and the MLOps/retraining layer — not a generic upskilling workshop.
The cohort is the insurance that makes every other asset durable: without people who can diagnose drift and retrain on new wells, the 5x dividend evaporates at the first distribution shift and the operator slides back to the service-company invoice it was trying to retire.
Budget capability transfer like ROI: 55 trained MENA professionals including 15 young national hires from regional universities, made a deliverable with an operational acceptance test, sized to the coverage the dividend unlocks and to the stack layers that need a named owner — not to the optics of a slide.

References

Three-phase formation-evaluation engagement (December 2021 kickoff – July 2023 close) with a mid-sized Middle East carbonate operator we partnered with. Talent, throughput, dataset and architecture figures derived from internal Phase 3 progress and transition materials. Operator identity, well, field and partner names withheld under confidentiality.
ThroughputDividendAllocator reflects the programme’s own framing: the ~5x interpretation dividend is the sourced quantity; the headcount axis is the arithmetic inverse and is labelled illustrative, since the engagement states no specific headcount figure.

Localisation by Design: Building Local AI Capability Inside an O&G Engagement

The throughput dividend is real, and it is the whole argument

What "trained" actually meant: the human layer of an engineering stack

Why this is ROI, not charity

How to budget it like you mean it

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on