What a Three-Phase Subsurface AI Programme Really Costs in GPU-Hours

Abstract

A subsurface AI engagement is usually sold as a fixed price, and a fixed price hides the one number that would let a buyer sanity-check it: how much compute the work actually burns. That number is not a fixed quantity a vendor can quote from a template. It is a product of two levers, how many training runs a phase holds concurrently and how long each run takes, and once you see the telemetry through that concurrency-times-run-length lens the whole cost curve becomes legible. We publish that telemetry. Across a three-phase programme built on a mid-sized Middle East carbonate operator's borehole-image logs, we tracked GPU-hours budgeted against GPU-hours spent, phase by phase, and priced the result against a three-point rate card. A single training run takes 6 to 18 hours, and the cluster holds 80 to 90 percent utilization across 28 GPU-partition containers, so a phase's burn tracks its concurrency directly. Phase 1 came in on budget at 1,200 GPU-hours running a single supervised track. Phase 2 spent 2,600 hours against a 1,200-hour budget, a 2.2x overrun that appeared the moment a supervised fracture track and an unsupervised bedding-and-vug track ran at the same time and roughly doubled the number of concurrent runs. Phase 3, field-scale well-to-well correlation, forecasts 7,500 hours against 5,800 budgeted, and it is here that the concurrent-run count is projected to grow five to seven times over the pilot to a peak of 60 to 90 runs at once. The programme totals are 11,300 GPU-hours actual against 8,200 budgeted. Cost is concurrency times run-length, so the overrun sits in the phase that doubled its concurrency and the largest line sits in the phase that multiplies it again. Priced at 30, 24.60, and 17.22 US dollars per GPU-hour, the telemetry becomes a unit-economics benchmark an operator can measure a vendor quote against, which is exactly what the market does not publish.

The number that is missing from every quote

Ask three vendors to build a fracture-and-bedding detector on your image logs and you will get three fixed prices. Ask any of them how many GPU-hours the training will take, and the question tends to stop the conversation. The compute budget is treated as the vendor's internal business, folded into a margin, and never surfaced as a line an operator can inspect. That is a problem for the buyer, because compute is the part of the cost that scales with the difficulty of the actual data rather than with the confidence of the sales deck. A quote you cannot decompose into hours is a quote you cannot check.

We are in a position to publish the decomposition because we kept it. The programme this piece draws on ran in three phases against a mid-sized Middle East carbonate operator's borehole-image logs, and the compute ledger was maintained alongside the technical work, not reconstructed after the fact. The point of publishing it is not to confess an overrun. It is that a real budget-versus-actual record, tied to run-level parameters, is the only honest way to build a unit-economics benchmark for this kind of work, and we have not seen one in the open.

Reading the ledger phase by phase

Phase 1 was a bounded pilot: define the datasets, fit the first supervised model on borehole-image logs, and prove the approach on a handful of wells. It was budgeted at 1,200 GPU-hours and it spent 1,200. On budget is worth stating plainly, because it is what makes the later overrun legible rather than alarming. The pilot ran one kind of workload, a single supervised track, and a single track is cheap to forecast.

Phase 2 is where the number moves. The budget was again 1,200 GPU-hours; the actual was 2,600, a 2.2x overrun. The cause was not a blown estimate on any one model. It was structural. Phase 2 split into two tracks that ran at the same time: a supervised track for fracture detection and an unsupervised track for beddings and vugs. Running them in parallel roughly doubled the number of concurrent runs on the cluster, and because compute is billed by the hour a run holds a card, doubling the concurrent runs doubled the burn rate for the weeks the two tracks overlapped. The overrun is the signature of parallelism, not of waste.

Phase 3 is a forecast rather than a settled actual, and it is the largest line. Field-scale well-to-well correlation across the operator's wells is budgeted at 5,800 GPU-hours and forecast at 7,500. The scale-up is expected because correlation multiplies the work: instead of training a model per well, the phase runs many models across many wells and reconciles them, and the concurrent-run count is projected to grow five to seven times over the pilot. The mechanics of that correlation step are their own subject, treated in our field-scale correlation write-up, and we do not re-derive them here; the point for the ledger is only that the hour count grows with the number of concurrent models, exactly as the earlier phases predicted it would.

Summed, the programme is budgeted at 8,200 GPU-hours and runs to 11,300 actual and forecast. That 3,100-hour gap is not spread evenly. It concentrates in the phase that ran two tracks at once, and it is the concentration, not the total, that a buyer should learn to look for.

Why the cost is concurrency times run-length

The reason the overrun is predictable is that a phase's compute cost reduces to two levers. The first is run-length: a single training run on this data takes 6 to 18 hours, depending on the model and the dataset size. The second is concurrency: at peak, 60 to 90 runs execute at the same time, spread across 28 GPU-partition containers on the cluster, with utilization held at 80 to 90 percent so the cards are rarely idle. A peak wave of work therefore burns close to the product of those two numbers, and the phase that pushes concurrency up pushes the burn up with it.

\mathrm{GPU\text{-}hours}_{\mathrm{wave}} \;=\; N_{\mathrm{concurrent}} \times t_{\mathrm{run}}, \quad N \in [60, 90],\; t_{\mathrm{run}} \in [6, 18]\ \mathrm{h}

That equation is the whole argument in one line. It says the cost of a phase is not a fixed quantity a vendor can quote from a template; it is a function of how many runs the phase holds concurrently and how long each holds a card. Phase 2 overran because it doubled the first term. Phase 3 is forecast large because it multiplies the first term again. The rate card only converts hours to money; it does not change the shape of the curve.

The instrument below makes the two facts sit next to each other. On the left it plots budget against actual for each phase, with the Phase 2 overrun drawn as the one element that carries the point. On the right it lets you drive the levers directly: drag the concurrency from 60 to 90 and toggle the run-length between 6, 12, and 18 hours, and watch the GPU-hours a single peak wave burns, priced at each point on the rate card.

Real budget-vs-actual GPU-hour telemetry from a three-phase subsurface AI programme, read as a unit-economics benchmark. Left: Phase 1 landed on budget at 1,200 GPU-hours; Phase 2 ran 2,600 hours against a 1,200-hour budget, a 2.2x overrun (the orange delta) once the supervised and unsupervised tracks ran in parallel; Phase 3 forecasts 7,500 hours against 5,800 budgeted, for programme totals of 11,300 actual vs 8,200 budgeted. Right: the reader that explains the miss. A single peak wave costs concurrency times run-length, so drag the concurrency lever (60-90 concurrent runs, the peak Phase 3 projects as its count grows 5-7x over the pilot) and toggle run-length (6, 12, or 18 hours) to see the GPU-hours one wave burns and its cost at the three rate-card points, 30, 24.60, and 17.22 USD per GPU-hour. Utilization ran 80-90% sustained across 28 MIG containers. Sourced from the engagement's additional-budget request and MLOps cost ledger: the phase-level hours, programme totals, per-run range, concurrency range, utilization band, MIG count, and rate card. The single-wave GPU-hour figure the reader derives is concurrency times run-length, an arithmetic read of those sourced levers rather than an independently sourced total, and the costs shown are compute unit economics only, not a quoted price.

Drive the concurrency lever up and the cost rises linearly, which is the honest shape of the thing: there is no economy of scale hiding inside a wave of parallel runs, because each run holds its own card for its own 6 to 18 hours. The only way the per-wave number falls is if run-length falls, and run-length is set by the model and the data, not by the schedule. This is why forecasting compute for a parallel-track phase is genuinely harder than for a single-track pilot, and why a fixed price that ignores the distinction is quietly betting that the hard phase behaves like the easy one.

Turning telemetry into a benchmark a buyer can use

The rate card turns the hours into a defensible dollar figure. We priced the ledger against three points: an industry baseline of 30 US dollars per GPU-hour, a standard rate of 24.60, and an offered rate of 17.22. Those are not the operator's confidential field economics; they are the compute unit prices, and they are the numbers that let an outside buyer do the arithmetic a vendor will not do for them. Multiply the programme's 11,300 actual hours by any of the three and you have a compute cost that can be laid against a quoted fixed price. If the quote is far below the baseline-rate compute cost, the vendor is either running cheaper hardware than they imply or absorbing the difference somewhere; if it is far above, the buyer is paying for margin, not for hours.

The benchmark also reframes what an overrun means in this work. In a fixed-price frame, a 2.2x compute overrun reads as a failure to estimate. In a unit-economics frame, it reads as the expected cost of parallelism, and the useful question shifts from "why did it overrun" to "which phase carries the parallel tracks, and did the budget account for the concurrency there." A buyer armed with the concurrency-times-run-length model can ask that question during procurement rather than discovering the answer during Phase 2.

None of this replaces the money-side view of a programme, which we treat separately: the six-month burn-rate ledger of an accelerated delivery, and the serving-side unit cost of digitising one more log, both live in their own pieces and answer different questions. This one is narrower and, we think, more useful to a technical buyer for being narrow. It is the compute the work costs, in hours, before anyone marks it up.

Limitations

The ledger is honest about what is measured and what is inferred. The Phase 1 and Phase 2 hour counts are settled actuals; the Phase 3 figure of 7,500 GPU-hours is a forecast against a 5,800-hour budget, and a forecast can move, particularly since field-scale correlation is the least-exercised of the three workloads. The concurrency range of 60 to 90 runs and the run-length range of 6 to 18 hours are real, sourced bounds from the engagement, but the single-wave GPU-hour figure the instrument derives is their arithmetic product, an illustration of the mechanism rather than an independently measured wave total; a real wave mixes run-lengths and does not sit at a single concurrency for its whole duration. The rate card of 30, 24.60, and 17.22 US dollars per GPU-hour is a compute unit price, not a market survey, and it reflects single-cluster pricing; larger multi-node configurations carry different economics that we do not model here. The dollar figures are therefore a compute cost only, not a price and not a total programme cost, which would add labour, data preparation, and everything the hours do not capture. Finally, this is the telemetry of one three-phase programme on one operator's carbonate image logs; the concurrency-times-run-length model should generalise, but the specific hour counts are ours and are offered as a worked benchmark, not as an industry average.

References

This piece draws on the compute ledger and additional-budget record of the engagement it describes, and on our companion write-ups of the field-scale correlation step, the six-month burn-rate ledger, and the serving-side marginal cost of a digitised log. It cites no external published work, so no external reference list applies.

What a Three-Phase Subsurface AI Programme Really Costs in GPU-Hours

Abstract

The number that is missing from every quote

Reading the ledger phase by phase

Why the cost is concurrency times run-length

Turning telemetry into a benchmark a buyer can use

Limitations

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on