Most compute budgets for an applied-AI programme are written before anyone has trained a model, and they encode a belief about where the hard work sits. The usual belief is that data collection is the expensive, unpredictable part, and that once the dataset exists, training is a bounded exercise you can size on a spreadsheet. Our compute ledger for a roughly twenty-month engagement with a mid-sized Middle East carbonate operator says the opposite, in numbers we had to defend to the client mid-engagement. We budgeted 8,200 GPU-hours across three phases and consumed 11,300. The phase that landed exactly on budget was the data phase. The phase that ran more than double was model development, and it was the smallest allocation on the sheet.
This is the compute-economics companion to the throughput and unit-cost work we have written elsewhere. Those priced a single digitised log at serve time; this one prices the whole programme's training budget and shows which phase breaks it.
The three-phase allocation, before anything ran
The engagement was scoped in three phases, each with its own compute budget. Phase 1 covered data collection and dataset construction: pulling raw borehole-image logs, cleaning them, building the labelled sets. Phase 2 was supervised model development, the phase where we would find out whether the idea held. Phase 3 was correlation and industrialisation across many wells, the heaviest lift by design.
The GPU-hour allocation followed that intuition. Phase 1 and Phase 2 each got 1,200 hours. Phase 3 got 5,800, because scaling a validated pipeline across dozens of wells is where the compute obviously piles up. At the sourced rate the ledger carried, that came to USD 141,204 of budgeted compute. On paper the risk sat in Phase 3: the most hours, the most wells, the most moving parts. If any line was going to slip, planning intuition put it there.
What the meter actually recorded
Phase 1 closed at 1,200 hours against 1,200 budgeted. Exactly on plan. Data collection is slow and frustrating for reasons that have nothing to do with GPUs, and its slowness misleads you: because the calendar time is long, you assume the compute time is large. It is not. Ingesting, cleaning, and labelling raster logs is bounded and cheap in GPU-hours. The phase that dominates a Gantt chart barely touches the compute meter.
Phase 2 closed at 2,600 hours against 1,200 budgeted. That is a 2.2x overrun in the phase we had allocated the fewest hours to, and it added roughly USD 24,000 of unplanned compute on its own. The reason is specific to how model development proceeds, and it is the whole argument. A single training run took six to eighteen hours. Model development is not one such run; it is dozens, run in parallel, most of which you throw away. Here the supervised and unsupervised tracks split into two compute paths that both had to be explored, because unsupervised methods won for vugs and beddings while supervised regression won for fractures. Every architecture you test, every augmentation ratio you sweep, every learning-rate schedule you abandon is compute spent to buy information, not a finished model. That search is the cost, and it does not appear in a plan that budgets for the model you keep, because at planning time you do not know how many you will discard to find it.
Phase 3 closed at 7,500 hours against 5,800 budgeted, a 1.29x overrun. Real, but proportionate: the phase was correctly sized as the largest, and it overshot by less than a third. The industrialisation phase, the one everyone worries about, was the best-behaved relative to its own budget.
Add the three phases and the programme consumed 11,300 GPU-hours against 8,200 budgeted, USD 194,586 of compute against USD 141,204, with the gap recorded as an overrun ask of USD 53,382. Note where that gap was manufactured. Phase 3's absolute overshoot in hours was larger, but Phase 3 was budgeted to be large and behaved like it. Phase 2 broke its own frame: the smallest allocation, given because the phase was assumed to be bounded, running at more than twice its budget because model development is a search and search does not size cleanly. Carry the ratio out of this ledger, not the total. On budget, 1.29x, 2.2x. The overrun concentrates in the phase intuition treats as safe.
Pricing the gap, and why the rate you pick is a conversation
The 3,100 overrun hours are only half the funding conversation. The other half is the rate you bill them at, and the ledger carried three: an industry baseline of USD 30 per GPU-hour for this class of MLOps infrastructure and services, our standard rate of USD 24.60, and the rate we had actually offered on this engagement, USD 17.22, a 30% discount struck at proposal time. The instrument above slides across those three points and reprices the same 3,100 hours.
That spread is the frame for the overrun ask. When a programme runs over its compute budget, the reflexive client reading is that the vendor mismanaged the estimate. The rate card reframes it: the overrun hours were consumed at a rate already discounted 30% below the vendor's own standard and well under the industry baseline, so the unplanned compute was still cheaper per hour than a client re-procuring it on the open market would pay. The conversation stops being about a broken estimate and becomes about how many additional discounted hours the search required. That is a conversation an engagement can survive.
How to structure the overrun ask without derailing the engagement
The failure mode is not the overrun; it is the surprise. A 2.2x line item that lands at phase close with no prior signal reads as loss of control, whatever the rate. Three things kept this one from derailing the work.
Separate the on-plan phases from the runaway one out loud. The instinct under pressure is to defend the total, to argue that 11,300 against 8,200 is a 38% overrun and 38% is defensible. The honest argument is stronger: two of three phases were on or near plan and a single phase, for a nameable reason, ran over. A client can act on that. They can fund the specific gap or constrain the search that caused it. They cannot act on a diffuse 38%.
Attach the ask to a reason the client already believes. Model development ran over because both tracks had to be explored and because a single run cost six to eighteen hours, and both facts predate the overrun and were visible in the reporting. The energy market did the rest of the framing: this engagement ran through a period when hosting a DGX-class box moved from roughly USD 12,000 to 15,000 a month to north of USD 20,000, so a compute-hour ask landed against a backdrop the client was already living. When the reason for an overrun is something the client has independently watched happen, the ask is a reconciliation, not a renegotiation.
Put the discounted rate on the same page as the overrun hours. The single most useful line in the whole ledger was not the USD 53,382 total; it was the USD 17.22 next to it. An overrun priced at a rate the client can verify is below market is an overrun that reads as a shared cost of a research programme, which is what it was, rather than a vendor error passed downstream.
The planning point sits under all three. If you allocate the most compute to the phase with the most calendar time and the most deliverables, you will under-fund model development every time, because its cost is the search you cannot see at planning time and cannot enumerate until you are inside it. Budget the model phase for the experiments you will discard, not the model you will keep. The data phase lands where you put it. The model phase is where the meter runs.
Limitations
This is one programme's ledger, a research engagement rather than a productionised deployment, so the phase shape is specific to it. The 2.2x figure is the ratio of actual to budgeted GPU-hours for the model-development phase here; it is not a benchmark for supervised training in general, and a programme that front-loads architecture search into a discovery phase, or one with a narrower model scope, would distribute its overrun differently. The dollar figures follow directly from the GPU-hour counts and the sourced rate; they are compute cost only and exclude staff effort, data acquisition, and licensing, which the programme accounted for separately. The rate-card lever prices the overrun at three sourced points and does not model volume or multi-node pricing, which the ledger noted moved to multiples of the single-box rate. The energy-market figures are context from the same period, not a controlled attribution of the overrun to any one cause.