Skip to main content

Blog

The Quiet Rise of In-House AI Teams at Oil and Gas Operators

Operators used to treat subsurface AI as something you licensed, one seat at a time, from a vendor whose model you never got to open. That is changing. A quieter pattern is taking hold, where a Texas onshore operator stands up a small in-house capability instead, and owns both the model and the number on the invoice. This is a build-versus-buy note from the operator's side of the table: what a fixed-price, phased build actually costs, how an accelerated track trades money for time against a standard one, and where the per-seat licence stops being the cheaper option. The technical story of VeerNet lives elsewhere; this is the story of the decision to build it in the first place, told through the two numbers that decide it, the outlay and the seat count.

Quamer NasimTannistha Maitiby Quamer Nasim, Tannistha Maiti8 min read
EarthScan insight

For years the default answer to subsurface AI at an operator was to buy it. A vendor had a tool, the tool had a per-seat price, and the model inside it was a sealed box you were not invited to open. That arrangement is quietly losing ground. Across the operators we work with, and specifically at the Texas onshore operator whose log-digitisation work sits behind VeerNet, the decision has been going the other way: stand up a small in-house capability, own the model, and own the number on the invoice. This note is not about VeerNet's architecture, which we have written about elsewhere. It is about the decision that comes before any architecture exists, the build-versus-buy call, and the two numbers that actually settle it.

Why the sealed box stopped being enough

The pitch for buying is real and worth stating fairly. A licensed tool is live on the day you sign, someone else carries the maintenance, and the cost is a predictable line per user. For a capability an operator uses lightly and generically, that is often the right call and we say so.

Subsurface work is where the pitch frays. The problems are specific to a basin, a vintage of paper logs, a set of curve conventions that a general vendor has no reason to have tuned for. When the model is sealed, an operator cannot see why a curve came out wrong, cannot retrain it on their own archive, and cannot carry the capability into the next problem next door. Koroteev and Tekic make the version of this point that applies to the whole upstream: the data is proprietary and the problems are specialised, which is exactly the combination that resists a one-size outsourced stack [3]. What you are really renting, in that case, is not a solution but a dependency.

There is a second, quieter cost that does not show on the licence. The part of a machine-learning system that is the model is small; the plumbing around it, the data pipelines, the retraining, the monitoring, the integration into how the operator actually works, is most of the real system and most of the real cost. Sculley and colleagues named this the hidden technical debt of machine learning, and their point cuts against the naive read of build-versus-buy [1]. Buying does not remove that debt. It relocates it to a vendor and hands you the bill as a recurring fee, and when the fee scales per seat, it scales with your success rather than with the work.

What a build actually costs, without the mystique

The reason operators historically bought is that building sounded unbounded. The corrective is a fixed-price, phased structure, because a number you can put in a budget is a number a manager can defend. The work we scoped for the onshore operator was four blocks, delivered on one of two tracks. The accelerated track runs 16 weeks with 6 people at 180,000 EUR. The standard track runs 32 weeks with 4 people at 100,000 EUR. Compute is a separate, small line, a GPU rented between 750 and 1,800 EUR per month depending on tier, and paid only while the work runs.

The two tracks are the same scope bought at different speeds. Accelerated pays more, and puts more people on it, to have the capability live in a quarter rather than two. Standard trades that speed for a lower outlay when the timeline is not the binding constraint. Neither track is a discount on the other; they are a straight exchange of money for time, which is the honest way to price delivery.

What matters for the build-versus-buy call is that both tracks are a one-off. You pay the outlay, you keep the model, and the marginal cost of the next user inside the operator is close to nothing, because one trained model serves the whole team. That is the structural difference from a per-seat licence, and it is the whole argument.

The two lines that cross

Put the two options on one axis and the decision stops being a matter of taste. The buy line rises with headcount: every seat is another 1,200 USD per year, so a licence for a growing team is a bill that grows with it. The build line is flat for the year: the fixed track price plus a year of compute, and then the same number no matter how many people inside the operator use it. Two lines with different slopes cross exactly once, and that crossing is the decision.

IN-HOUSE CAPABILITY · BUILD ONCE VS LICENCE PER SEAT91seats where owning beats rentingA fixed-price build amortises across the team; a per-seat licence scales with itAt 90 seats the licence is still cheaper for year oneA · DELIVERY TRACKStandard100k EUR32 wk · 4 FTEAccelerated180k EUR16 wk · 6 FTEbuild outlay100,000 EUR+ year of compute9,000 EURtime to live32 weeks · 4 FTESHIPPED IN 4 PHASED BLOCKSB1B2B3B4BUILD TARGET ON CURVE EXTRACTIONpeak R-squared0.9891best MAE0.0132B · ONE-YEAR COST VS SEATS LICENSED0k120k240k360k480k0100200300400buy: 1200/seat/yrbuild: fixed for the yearbreak-even 91SEAT-COUNT LEVERdrag the team size you wouldotherwise licence per seat010020030040090build/yr109kbuy/yr108kcheaperbuysourced: 4 blocks, 16 wk/6 FTE/180k & 32 wk/4 FTE/100k EUR, GPU 750-1800/mo, R2 0.9891, 1200/seat · 12-mo horizon & EUR-USD parity illustrative
An operator's build-vs-buy reader for standing up in-house AI capability. Lever A toggles the delivery track: standard is 32 weeks with 4 FTE at 100,000 EUR, accelerated is 16 weeks with 6 FTE at 180,000 EUR, each shipped in four phased blocks and each targeting the same measured quality on curve extraction, a peak R-squared of 0.9891 and a best MAE of 0.0132. Lever B drags the seat count the operator would otherwise licence at 1,200 USD per seat per year. The plot puts one year of each option on a single axis: the buy line rises with headcount while the build line is a flat one-off, so they cross at a break-even seat count, and the orange marker is the only element that argues, the seats at which owning the capability becomes the cheaper choice. Every price, timeline, FTE, compute band, and quality figure is sourced from the engagement archive; the one-year comparison horizon and the EUR-to-USD parity used to place both lines on one axis are illustrative framing, not a quote.

The exhibit lets you set the two levers and read the crossover. Toggle the track to set the one-off outlay, drag the seat count to set the licence you would otherwise pay, and the orange marker reports the break-even, the seat count past which owning is cheaper than renting for the first year alone. On the standard track the crossover sits near ninety seats; on the accelerated track, which costs more up front, it sits higher, past a hundred and fifty. Below the crossover the licence genuinely is the cheaper choice, and the honest instrument shows that rather than hiding it. Above it, and especially in year two when the build has no repeated outlay and the licence bills again in full, the build pulls away.

That framing is deliberately conservative. It compares a single year, where the build carries its entire cost and the licence carries only twelve months of fees. Stretch the horizon and the build looks better, not worse, because its cost was already spent. We prefer to argue from the year that flatters the licence, and let the crossover speak from there.

What ownership buys that the axis does not show

The break-even is the part you can put on a slide, but it undersells the case, because it prices only the fee. The measured quality is the other half. On curve extraction the build reaches a peak R-squared of 0.9891 and a best MAE of 0.0132, and the operator can see those numbers, reproduce them, and hold the next iteration to them. A sealed tool reports a result; an owned model reports a metric you can audit and a training set you can extend. Paleyes and colleagues catalogue how much of the real difficulty in deployed machine learning is maintenance, monitoring, and integration rather than the initial model [2], and every one of those is easier to carry when the team that carries it is your own and the model is not a black box.

This is the quiet part of the trend. Operators are not building in-house AI teams because building is fashionable. They are doing it because the per-seat licence turned a specialised, proprietary problem into a recurring dependency, and because a phased fixed-price build turned the alternative into a number a budget can hold. The template is not exotic. It is four blocks, a track chosen for how much the timeline is worth, a small compute line, and a crossover you can compute before you commit.

Limitations

The crossover in the exhibit is a first-year total-cost comparison, not a full total cost of ownership. It counts the fixed build outlay, a year of compute at the low tier, and the per-seat licence, and it deliberately omits the internal cost of running an in-house team past delivery, the vendor's own upgrade and support value, and any migration cost either way. The build prices, timelines, FTE counts, compute band, and the two quality figures are the real engagement numbers; the one-year horizon and the EUR-to-USD parity used to place both lines on one axis are framing choices, flagged as such, chosen because they are conservative rather than because they are precise. The 1,200 USD per seat is one buy alternative, not the market. And the numbers are from one operator's onshore log-digitisation problem, so the crossover seat count does not transfer as a constant to a different basin, a different scope, or a different data archive. The point that carries is the shape of the two lines, not the exact seat at which they meet.

References

[1] Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., and Dennison, D. Hidden Technical Debt in Machine Learning Systems. Advances in Neural Information Processing Systems 28 (NIPS 2015). The model is a small fraction of a real machine-learning system; the surrounding plumbing is where long-run cost and lock-in accumulate. https://proceedings.neurips.cc/paper_files/paper/2015/hash/86df7dcfd896fcaf2674f757a2463eba-Abstract.html

[2] Paleyes, A., Urma, R.-G., and Lawrence, N. D. Challenges in Deploying Machine Learning: A Survey of Case Studies. ACM Computing Surveys 55, 6 (2022), Article 114. What actually makes deployed machine learning hard to operate, including the maintenance and integration burden a buyer inherits either way. https://dl.acm.org/doi/10.1145/3533378

[3] Koroteev, D., and Tekic, Z. Artificial Intelligence in Oil and Gas Upstream: Trends, Challenges, and Scenarios for the Future. Energy and AI 3 (2021), 100041. Data ownership and the specialised nature of subsurface problems as reasons operators cannot simply outsource the whole stack. https://www.sciencedirect.com/science/article/pii/S2666546820300410

Go to Top

© 2026 Copyright. Earthscan