A 1,000,000x Speedup Is a Marketing Number Until You Define the Baseline

A speedup is a fraction, and a fraction has a denominator. When a vendor deck or a survey abstract says a model made something a million times faster, the eye lands on the million and skips the part that actually carries the meaning, which is faster than what. That missing denominator is where the entire honesty of the claim lives. Change it and the same true result reads as spectacular or as ordinary, and both readings can be defensible at once, because a multiplier is not a property of a model. It is a property of a comparison, and the comparison is a choice. This note is about making that choice visible, using a single published survey of AI in upstream oil and gas that, read carefully, contains its own correction: the same paper reports a 1,000,000x acceleration for one task and figures three to four orders of magnitude smaller for the tasks that resemble what EarthScan actually does, which is lift curves off scanned raster well logs. The gap between those numbers is not a contradiction to explain away. It is the lesson.

We should be clear about what this is not. It is not the VeerNet whitepaper, and it is not a claim about VeerNet's own throughput. It is the reading discipline you should bring to any acceleration headline, including ours, before you let it move a decision. VeerNet has a speedup story of its own against manual curve picking, and that story is only worth telling if we are willing to state its baseline as plainly as we are about to ask of everyone else's. So treat the numbers below as a worked example of a method, not as a scoreboard. The method is: take the multiplier apart into the thing measured and the thing compared against, and then ask whether the baseline was a fair opponent or a strawman chosen because it made the ratio large.

The denominator does the arguing

Koroteev and Tekic's survey of upstream AI collects acceleration figures across several tasks, and the spread across them is the whole point [1]. A deep neural network used for geological assessment is reported at around 1,000,000x against manual mapping. Reservoir-engineering surrogates, where a learned model stands in for a conventional numerical simulator, come in at 200x to 2,000x. A gradient-boosted production forecast lands near 100x. Alongside these the same literature reports a 20 percent cost saving from drilling optimisation, which is a different kind of number entirely, and elsewhere a CO2-storage classifier at 90.476 percent accuracy [2], which is not a speedup at all.

Line them up and the first thing to notice is that the largest multiple has the weakest baseline. The million-fold figure is measured against a human geologist doing mapping by hand, an activity paced by attention and coffee and the working day. Any automated pass over the same task will be enormous against that denominator, because the denominator is slow for reasons that have nothing to do with the difficulty of the computation. The 200x to 2,000x reservoir figures, by contrast, are measured against a conventional simulator, which is itself an optimised numerical method running on a computer. Beating an already-fast baseline by 200x is a far stronger statement about the model than beating a human by a million, and yet it is the smaller, less quotable number. The 100x gradient-boosting figure is smaller still and, for our purposes, the most relevant, because a boosted forecast over structured inputs is closer in spirit to a learned pass over a scan than a geological survey is.

Which of these is even the same kind of claim

Before comparing multiples you have to check they are multiples of the same thing, and here they are not. A speedup compares two ways of producing the same output and reports the ratio of their times. The 1,000,000x, the 200x to 2,000x, and the 100x are all of that kind, time against time, however different their baselines. But the 20 percent drilling saving is a cost ratio, not a time ratio, and putting it on the same axis as a speedup is a category error that a hurried slide will happily commit. The 90.476 percent CO2-storage figure is a classification accuracy, whose baseline is a held-out test set and whose honest comparison is against a naive predictor, not against a slower method [2]. Three numbers, three different denominators, one of which is not even a denominator of time. A claim that quietly mixes them, a bar chart with a speedup and a cost saving and an accuracy sharing one axis, is not lying about any single figure. It is lying about their commensurability, which is harder to catch and does more damage.

The instrument below is that discipline made visual. It plots the sourced acceleration claims on a single logarithmic speed axis, so the orders of magnitude between them are legible at a glance rather than flattened by a linear scale that would bury everything under the million. It draws a band across the tier of claims that resemble curve digitisation, the 100x to 2,000x range, and lets you toggle which baseline you read the field by: its single loudest number, or that band. The verdict at the top swaps as you toggle, without any plotted datum moving, because nothing about the data changed, only the baseline you agreed to read it by. The one orange element is the bracket that measures the gap between the headline and the band, and that gap, 500x from the band's ceiling and 10,000x from its floor, is the size of the marketing premium you pay for the loudest baseline.

Acceleration claims from one survey of AI in upstream oil and gas, plotted on a log speedup axis so the spread is legible. The loudest claim, a deep network doing geological assessment against manual mapping, sits at 1,000,000x. The tasks that actually resemble lifting curves off a scanned log, a learned surrogate standing in for a numerical reservoir simulator (200x to 2,000x) and a gradient-boosted production forecast (100x), sit in a band two to four decades lower. Toggle the baseline you read the field by, its single loudest number or the band for digitisation-like work, and the verdict at top swaps without any datum moving. The one orange element is the collapse bracket: the multiple, 500x from the band ceiling and 10,000x from its floor, by which the headline overstates the work at hand. Every plotted rung is a sourced figure (Koroteev and Tekic 2021; the 20% drilling cost saving and the 90.476% CO2-storage accuracy are shown as context because they are not speedups, and nothing on the plot is illustrative.

The general rule, and why it protects you

The pattern is not specific to oil and gas, and the machine-learning community has named it more than once. Sculley and colleagues called it the winner's curse: reported gains often track how hard the winning system was tuned and how weak the baseline was allowed to be, not any real advance in method, and the fix is procedural, state what you compared against and give the baseline a fair budget [3]. Dehghani and colleagues made the structural version of the point: the choice of comparison, not only the model, decides which system looks best, so a number is only interpretable once its baseline is fixed [4]. A million-fold against a human, a 100x against a boosted model, and a 200x against a simulator are not competing claims to be ranked. They are three readings whose only honest ranking is by the strength of the opponent each beat.

For a buyer of subsurface ML this is a cheap filter. When a claim arrives without its denominator, that is the tell, and the response is to ask the one question the headline omitted: faster than what, run by whom, on what hardware, producing which output. A vendor who answers crisply, faster than a two-day manual pass by a log analyst, on a single GPU, producing the same depth-indexed curve, has given you a claim you can price. One who can only repeat the multiple has given you a number that means whatever you let it mean. The larger the bare multiple, the more suspicious you should be, because the easiest way to make a ratio enormous is to pick a slow enough baseline.

What this means for how we talk about VeerNet

We hold ourselves to the same test, which is why this note stands on its own rather than folded into the architecture writeup. VeerNet's honest comparison is against manual curve picking by an analyst, and against that baseline the acceleration is real and large, in the same family as the geological-mapping figure because it shares the same slow human denominator. So we are obliged to state that denominator every time we quote it, and to resist letting the bare multiple stand where a reader would assume the baseline was another automated method rather than a person with a ruler. The survey tasks closest to ours, the 100x boosted forecast and the 200x simulator surrogate, are the fairer reference for what a strong learned system does against an already-fast opponent, and they are the numbers a careful buyer should hold us to, not the million.

Limitations

The acceleration figures read here are as reported in a single survey and inherit whatever their original studies assumed about hardware, problem size, and what counted as the baseline task; the survey does not always give those conditions in full, so the exact denominators behind the 1,000,000x, the 200x to 2,000x, and the 100x are known in kind but not in every detail, and the reading above depends only on that kind, not on the precise figure [1]. The 500x and 10,000x gaps the instrument draws are arithmetic on the sourced endpoints, not new measurements, and they describe the distance between reported claims, not a controlled head-to-head we ran. The 20 percent cost saving and the 90.476 percent accuracy are shown only as examples of non-speedup claims and are not comparable to the time ratios on any shared axis [2]. Finally, this is a note about how to read a multiplier, not a validation of any specific model; a claim can pass the baseline test, be honestly stated, and still describe a system that produces a curve no petrophysicist would sign off on, which is a separate question that no speedup number answers.

The one question to keep

The habit worth keeping is small and almost rude in a meeting: when someone quotes a speedup, ask what it was faster than, and do not move on until the denominator is on the table. A million-fold number with a human baseline and a hundred-fold number with a machine baseline can both be true and both be honest, and the smaller one is often the stronger claim. The multiple is the part that markets itself. The baseline is the part that tells you whether the multiple is worth anything, and it is the part a careful reader supplies out loud when the headline leaves it off.

References

[1] Koroteev, D., and Tekic, Z. Artificial Intelligence in Oil and Gas Upstream: Trends, Challenges, and Scenarios for the Future. Energy and AI 3 (2021), 100041. The source of the acceleration figures read here, including the roughly 1,000,000x geological-assessment claim, the 200x to 2,000x reservoir surrogate range, the roughly 100x boosted production forecast, and the 20 percent drilling cost saving. https://www.sciencedirect.com/science/article/pii/S2666546820300410

[2] Buah, E., Linnanen, L., and Wu, H. Machine Learning Prediction of the CO2 Storage Engagement of Countries. Energies 13, 23 (2020), 6259. The source of the 90.476 percent accuracy quoted as context and an example of a claim whose baseline is a held-out test set rather than an alternative method. https://www.mdpi.com/1996-1073/13/23/6259

[3] Sculley, D., Snoek, J., Wiltschko, A., and Rahimi, A. Winner's Curse? On Pace, Progress, and Empirical Rigor. ICLR 2018 Workshop. The argument that headline gains often reflect tuning budget and weak baselines rather than method. https://openreview.net/forum?id=rJWF0Fywf

[4] Dehghani, M., Tay, Y., Gritsenko, A. A., et al. The Benchmark Lottery. arXiv:2107.07002 (2021). On how the choice of benchmark and comparison point determines which system looks best, and why a multiplier is only interpretable against a stated baseline. https://arxiv.org/abs/2107.07002

A 1,000,000x Speedup Is a Marketing Number Until You Define the Baseline

The denominator does the arguing

Which of these is even the same kind of claim

The general rule, and why it protects you

What this means for how we talk about VeerNet

Limitations

The one question to keep

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on