Once, a number left one of our drafts because a reviewer took it apart in revision. It was a speed claim, under 30 seconds to process a metre of borehole imagery for the vug detector, and we could not defend the comparison it rested on, so we removed it. That specific retraction, and what retracting a live number costs, is told in The Number We Deleted. This piece is not that story. It is the discipline we built afterward so that the next shaky number gets caught by us, before it ever reaches a reviewer, and not by a stranger at the worst possible moment.
The discipline has a name we use internally: the self-scrub. Before any number goes into a document, a slide, or a procurement answer, it runs through two gates, in order. First, can you state its scope in one sentence a stranger would accept? Second, does it trace to a source someone outside the room can find? A number that clears both is a claim. A number that fails either is an impression wearing a decimal point, and the honest move is to hold it back until it earns its place, or to let it go. The rest of this piece is about why those two gates catch the numbers that matter, why the loud figure is usually the one that fails the first gate, and how to run the scrub on yourself when the number in question is one you would rather keep.
Gate one: can you state the scope in one sentence
The vug figure is the clean worked example of a scope failure, so it is worth being precise about what went wrong. A metre of imagery goes through several stages: reading and converting the raw log, handling missing samples, the adaptive-thresholding and mode-extraction pass, contour extraction, the geometric refinement that rejects false positives, and writing the per-interval vug statistics. Some of those stages were in our timing. Some were not. The under-30-seconds figure had lived across two occurrences in a draft, and neither occurrence pinned down the boundary. Was it wall-clock on one machine or amortised across a batch? Did it include the one-time cost of establishing the per-well base thresholds, or only the steady-state run? Against the prior baseline we were beating, were both numbers measured on the same hardware, the same image resolution, the same definition of "done"?
We did not have crisp answers to all of those. That is the whole point of gate one. A speed claim is only as good as the scope you can attach to it, and a metric whose scope you cannot state in one sentence is not a measurement. It is an impression. The gate is not asking whether the number is impressive or even whether it is true. It is asking a narrower question: could a stranger read your one-sentence scope and know exactly what was and was not counted. For the vug figure the honest answer was no, and a no there means the number does not ship, however good it looks.
What makes this gate worth running before submission rather than after is that the loud number is usually the one that fails it. The figure that made the demo land is almost always the one with an unstated boundary doing quiet work: a stage left out of the timing, a baseline tuned differently on each side, a per-well setup cost amortised away. The excitement and the scope failure tend to share a cause. So the gate that feels most pedantic on your best-looking number is exactly the gate that number most needs.
Gate two: does it trace to a source outside the room
A number can pass gate one and still fail here. Suppose you can state the scope cleanly but the only evidence for it is a run you did on your laptop last spring that nobody logged. That is a private conviction, not a public claim. Gate two asks for a source someone outside the room can find: a named poster, a numbered table in a deck, a logged benchmark with a date on it. The test is not whether you remember the number being true. It is whether a skeptic could go and look.
The two gates are ordered on purpose. Scope first, because a number whose scope you cannot state cannot be sourced in any useful way, since you would not know what the source was even supposed to show. Provenance second, because a well-scoped number with no traceable origin is a claim you are asking people to take on trust, and the whole discipline is about not doing that. Run them in order and most of the temptation resolves itself, because the impressive-but-shaky figure fails gate one and never reaches gate two.
Why scrub before submission, not after
The reason to run the self-scrub yourself, in advance, is that the alternative is having a reviewer or a buyer run it for you, in public, on your loudest number. When we could not defend the vug speed figure, the reviewer's objection was not that under 30 seconds was implausible. It was that the comparison we drew against the methods we cited was unfair, because the two sides were not doing equivalent work per metre. That is precisely a gate-one failure surfacing in the room instead of at the desk, and by then repairing it in place was not an option: it would have meant re-running a controlled benchmark we had not designed and had no time to design before the revision was due.
That is the whole case for scrubbing before submission. A number you catch at your own desk costs you a good-sounding line in a draft. The same number caught by a buyer in a procurement meeting does not just take itself down. It puts a question mark over every other figure in the deck, including the ones that were solid, because the reader has just learned that at least one of your numbers does not survive a single question. An unscopable claim is not neutral. It is contagious, and the self-scrub exists to quarantine it before it can spread.
Two numbers that pass both gates
The discipline is only credible if it also tells you what to keep, not just what to cut. So it is worth walking two of our own surviving numbers through the same two gates, because they show what passing looks like.
The first is a speed figure for the fracture and bed detection model: under 10 seconds to interpret a 5 metre section. Gate one: the scope states in one sentence, because the section length is fixed and named, so "per section" is not ambiguous, and it is one model doing one pass over a defined interval rather than a blend of stages you would have to itemise. Gate two: it traces, because it went on the World Petroleum Technology Congress poster in April 2024, a specific artifact at a named venue anyone can point to. Notice what this figure is not. It is not more impressive than the withdrawn one. Metre for metre, under 10 seconds per 5 metres is a comparable order of magnitude to the number we deleted. It is more defensible, which is the property the gates select for, and a different and better one than loud.
The second is the accuracy record we keep beside it, because a speed claim with no quality number invites the obvious rejoinder that you went fast by doing less. The fracture model reached 75 percent recall at a 5 cm depth band across 16 wells. Gate one: every term is load-bearing and states its own scope, the 5 cm being the depth-matching tolerance the metric is calibrated against rather than a free parameter chosen to flatter the result, and the 16 wells being the actual count measured on. Gate two: it traces to a documented progression from a much smaller and much worse starting point, and it had been through its own review and held. It belongs on the claim wall precisely because it clears the same two gates the speed number could not.
Running the scrub in practice
The self-scrub is cheap to run and easy to skip, and the failure mode is always the same: you run it on the numbers you already trust and wave through the one you love. So the practical rule is to run gate one hardest on your best-looking figure, because that is statistically the one hiding an unstated boundary. If you cannot write its scope in a sentence a stranger would accept, you do not own the number, and no amount of it being true at the time you measured it changes that.
The temptation to skip is strongest exactly when the stakes are highest, in the draft going to a reviewer or the deck going to a buyer, because that is when the loud number feels most load-bearing. That is backwards. The higher the stakes, the more a scope failure costs when someone else finds it, and the more the ten minutes of self-scrub is worth. We would rather retire our own number at our own desk than have a reviewer or a buyer retire it in the room. The figure we caught this way cost us a good-sounding line in one draft. Waving it through would have cost us the credibility of every number next to it the first time someone asked what a second included. That trade is not close.
Limitations
The two-gate self-scrub is a working heuristic drawn from our own practice, not a validated protocol with a controlled study behind it, and it will not catch every kind of bad number: a figure can state its scope cleanly and trace to a source and still be wrong for reasons neither gate tests, such as a flawed measurement everyone agreed to. The gates screen for defensibility under challenge, which is a narrower thing than truth. The two surviving figures used as worked examples carry the usual caveats of any timing or accuracy number. The under-10-seconds-per-5-metre speed figure is a poster-reported number for the fracture and bed model; it depends on hardware and image characteristics we have anonymised here, and it is defensible mainly because its scope is stated, not because it has been independently reproduced across sites. The 75 percent recall at 5 cm is measured on the wells available to the project and inherits their coverage and label conventions; it is not a claim about performance on basins or tools outside that set. The broader point, that an unscopable number is contagious, is an argument from experience rather than a measured effect, and readers should weight it accordingly.