For years the default answer to subsurface AI at an operator was to buy it. A vendor had a tool, the tool had a per-seat price, and the model inside it was a sealed box you were not invited to open. That arrangement is quietly losing ground. Across the operators we work with, and specifically at the Texas onshore operator whose log-digitisation work sits behind VeerNet, the decision has been going the other way: stand up a small in-house capability, own the model, and own the number on the invoice. This note is not about VeerNet's architecture, which we have written about elsewhere. It is about the decision that comes before any architecture exists, the build-versus-buy call, and the two numbers that actually settle it.
Why the sealed box stopped being enough
The pitch for buying is real and worth stating fairly. A licensed tool is live on the day you sign, someone else carries the maintenance, and the cost is a predictable line per user. For a capability an operator uses lightly and generically, that is often the right call and we say so.
Subsurface work is where the pitch frays. The problems are specific to a basin, a vintage of paper logs, a set of curve conventions that a general vendor has no reason to have tuned for. When the model is sealed, an operator cannot see why a curve came out wrong, cannot retrain it on their own archive, and cannot carry the capability into the next problem next door. Koroteev and Tekic make the version of this point that applies to the whole upstream: the data is proprietary and the problems are specialised, which is exactly the combination that resists a one-size outsourced stack [3]. What you are really renting, in that case, is not a solution but a dependency.
There is a second, quieter cost that does not show on the licence. The part of a machine-learning system that is the model is small; the plumbing around it, the data pipelines, the retraining, the monitoring, the integration into how the operator actually works, is most of the real system and most of the real cost. Sculley and colleagues named this the hidden technical debt of machine learning, and their point cuts against the naive read of build-versus-buy [1]. Buying does not remove that debt. It relocates it to a vendor and hands you the bill as a recurring fee, and when the fee scales per seat, it scales with your success rather than with the work.
What a build actually costs, without the mystique
The reason operators historically bought is that building sounded unbounded. The corrective is a fixed-price, phased structure, because a number you can put in a budget is a number a manager can defend. The work we scoped for the onshore operator was four blocks, delivered on one of two tracks. The accelerated track runs 16 weeks with 6 people at 180,000 EUR. The standard track runs 32 weeks with 4 people at 100,000 EUR. Compute is a separate, small line, a GPU rented between 750 and 1,800 EUR per month depending on tier, and paid only while the work runs.
The two tracks are the same scope bought at different speeds. Accelerated pays more, and puts more people on it, to have the capability live in a quarter rather than two. Standard trades that speed for a lower outlay when the timeline is not the binding constraint. Neither track is a discount on the other; they are a straight exchange of money for time, which is the honest way to price delivery.
What matters for the build-versus-buy call is that both tracks are a one-off. You pay the outlay, you keep the model, and the marginal cost of the next user inside the operator is close to nothing, because one trained model serves the whole team. That is the structural difference from a per-seat licence, and it is the whole argument.
The two lines that cross
Put the two options on one axis and the decision stops being a matter of taste. The buy line rises with headcount: every seat is another 1,200 USD per year, so a licence for a growing team is a bill that grows with it. The build line is flat for the year: the fixed track price plus a year of compute, and then the same number no matter how many people inside the operator use it. Two lines with different slopes cross exactly once, and that crossing is the decision.
The exhibit lets you set the two levers and read the crossover. Toggle the track to set the one-off outlay, drag the seat count to set the licence you would otherwise pay, and the orange marker reports the break-even, the seat count past which owning is cheaper than renting for the first year alone. On the standard track the crossover sits near ninety seats; on the accelerated track, which costs more up front, it sits higher, past a hundred and fifty. Below the crossover the licence genuinely is the cheaper choice, and the honest instrument shows that rather than hiding it. Above it, and especially in year two when the build has no repeated outlay and the licence bills again in full, the build pulls away.
That framing is deliberately conservative. It compares a single year, where the build carries its entire cost and the licence carries only twelve months of fees. Stretch the horizon and the build looks better, not worse, because its cost was already spent. We prefer to argue from the year that flatters the licence, and let the crossover speak from there.
What ownership buys that the axis does not show
The break-even is the part you can put on a slide, but it undersells the case, because it prices only the fee. The measured quality is the other half. On curve extraction the build reaches a peak R-squared of 0.9891 and a best MAE of 0.0132, and the operator can see those numbers, reproduce them, and hold the next iteration to them. A sealed tool reports a result; an owned model reports a metric you can audit and a training set you can extend. Paleyes and colleagues catalogue how much of the real difficulty in deployed machine learning is maintenance, monitoring, and integration rather than the initial model [2], and every one of those is easier to carry when the team that carries it is your own and the model is not a black box.
This is the quiet part of the trend. Operators are not building in-house AI teams because building is fashionable. They are doing it because the per-seat licence turned a specialised, proprietary problem into a recurring dependency, and because a phased fixed-price build turned the alternative into a number a budget can hold. The template is not exotic. It is four blocks, a track chosen for how much the timeline is worth, a small compute line, and a crossover you can compute before you commit.
Limitations
The crossover in the exhibit is a first-year total-cost comparison, not a full total cost of ownership. It counts the fixed build outlay, a year of compute at the low tier, and the per-seat licence, and it deliberately omits the internal cost of running an in-house team past delivery, the vendor's own upgrade and support value, and any migration cost either way. The build prices, timelines, FTE counts, compute band, and the two quality figures are the real engagement numbers; the one-year horizon and the EUR-to-USD parity used to place both lines on one axis are framing choices, flagged as such, chosen because they are conservative rather than because they are precise. The 1,200 USD per seat is one buy alternative, not the market. And the numbers are from one operator's onshore log-digitisation problem, so the crossover seat count does not transfer as a constant to a different basin, a different scope, or a different data archive. The point that carries is the shape of the two lines, not the exact seat at which they meet.
References
[1] Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., and Dennison, D. Hidden Technical Debt in Machine Learning Systems. Advances in Neural Information Processing Systems 28 (NIPS 2015). The model is a small fraction of a real machine-learning system; the surrounding plumbing is where long-run cost and lock-in accumulate. https://proceedings.neurips.cc/paper_files/paper/2015/hash/86df7dcfd896fcaf2674f757a2463eba-Abstract.html
[2] Paleyes, A., Urma, R.-G., and Lawrence, N. D. Challenges in Deploying Machine Learning: A Survey of Case Studies. ACM Computing Surveys 55, 6 (2022), Article 114. What actually makes deployed machine learning hard to operate, including the maintenance and integration burden a buyer inherits either way. https://dl.acm.org/doi/10.1145/3533378
[3] Koroteev, D., and Tekic, Z. Artificial Intelligence in Oil and Gas Upstream: Trends, Challenges, and Scenarios for the Future. Energy and AI 3 (2021), 100041. Data ownership and the specialised nature of subsurface problems as reasons operators cannot simply outsource the whole stack. https://www.sciencedirect.com/science/article/pii/S2666546820300410