The 1-Pixel = 3 cm Rule: The Resolution Floor Every Depth Metric Hits

Every accuracy number you read about an image-log model is a fraction with a hidden denominator, and the denominator is the resolution of the pixel grid it was scored on. This is easy to forget, because we talk about depth as if it were a continuous quantity measured in metres and centimetres. On the wireline header it is. But the moment an image from one of two different microresistivity imaging tools becomes a binary wireline log file raster — a grid of pixels that a convolutional backbone and a transformer decoder actually see — depth stops being continuous. It is quantised. And in the log files we worked with on a roughly twenty-month engagement with a mid-sized Middle East carbonate operator we partnered with, one pixel of vertical extent corresponds to about 3 centimetres of true depth.

That is the whole post in one sentence: 1 pixel ≈ 3 cm, so a ±3 cm depth error is baked in before any model runs. Everything that follows is an argument for why that fact should change how you read — and how you write — a depth accuracy claim.

Where the 3 cm comes from

A borehole image tool measures formation response on an array of pads and flaps wrapped around the wellbore. As the tool moves up the hole, those button readings are sampled at a fixed depth increment and binned into rows of an image. Each row is a pixel of depth. The button spacing and the logging speed set how fine that increment is, and once the data is written to the binary wireline log file, the increment is frozen. You can interpolate, you can upsample for display, but you cannot recover information the tool never sampled.

On this dataset the vertical sampling worked out to roughly 3 cm per pixel row. That number is not a property of our model, our labels, or our augmentation — it is a property of the raw acquisition. It would be exactly the same for a hand interpretation done by the most experienced petrophysicist in the country, because they, too, are clicking on a pixel grid in the interpretation software or an equivalent. When a geologist places a sinusoid pick, they are choosing a row. The "true" depth of that pick is, at best, the centre of a 3 cm box.

The error floor is physical, not statistical

A ±3 cm depth uncertainty on this data is not a modelling error you can train away with more wells or a better backbone. It is the discretisation of the measurement itself. The best any depth-localisation model — or any human picker — can do is land in the right pixel. Score it tighter than the pixel and you are no longer measuring the model; you are measuring rounding.

Why this is an engineering problem, not a geology footnote

It is tempting to file "1 pixel = 3 cm" under domain trivia and move on. That would be a mistake, because the number propagates directly into three decisions that shape the whole computer-vision pipeline.

It sets the regression target's precision. Our fracture and bedding model — internally GeoBFDT — is a customised Detection Transformer that regresses each sinusoid's depth, dip, and azimuth directly, in one forward pass, with an L1 loss on the parameters. The depth term is normalised against the patch height — a patch is 800 pixels, or 2.2 m of hole — so the smallest depth difference the loss can meaningfully resolve is one pixel-row. Asking the L1 term to chase sub-pixel precision is asking it to fit quantisation noise.

It sets the evaluation tolerance. Standard object detectors score themselves with IoU and average precision, both defined on overlapping boxes. A sinusoid has no box, so we replaced IoU with depth thresholding: a predicted sinusoid counts as a true positive if its picked depth lands within a tolerance band of a real one, and only then are dip and azimuth graded on the matched picks. The entire question of this post is what that band should be — and the answer is anchored to the 3 cm pixel, not to a number that looks impressive in an abstract.

It sets where you spend your modelling budget. Once you accept that depth is pixel-limited, you stop chasing precision the data cannot give and redirect effort to the axes that do have headroom — dip and azimuth, where the ground truth is continuous. That reframing is why, late in the project, we built a separate keypoint-detection variant to fight a residual vertical-shift artifact in depth localisation: depth was the binding axis precisely because the pixel floor made every centimetre count.

The case against a 2 cm tolerance

Here is the concrete tension. When we formalised the true-positive / false-positive / false-negative bookkeeping for the keypoint-detection model, one of the depth-threshold definitions we examined was a tight 2 cm band. It is a seductive number — it reads like rigour, like you are holding the model to a high standard.

But 2 cm is smaller than a single pixel. A model that picks the geologically correct row — the best outcome physically available — can still be scored wrong, because the centre-to-centre distance between its pick and the labelled pick can be a full pixel, and a full pixel is 3 cm, outside a 2 cm window. You are not penalising the model for missing the fracture; you are penalising the label and the prediction for sitting in adjacent boxes of a grid whose box size you do not control. Tighten the screw far enough and you can drive any depth metric to zero while the model does everything right.

“
A 2 cm tolerance scores the pixel grid, not the network. The first scoring band that gives the model credit for landing in the correct row is one that is at least as wide as the pixel itself — and that band is 3 cm.
”

— The honest reading

This is why 3 cm is the honest floor. It is the tightest tolerance at which a correct pick is guaranteed to be counted as correct, because it equals one pixel of depth. Score at 3 cm and you are asking the only fair question: did the model find the fracture in the right place, to the limit of what the data can resolve? Anything tighter is measuring the raster.

What the numbers look like once you read them this way

Frame the metrics against the pixel floor and the model's depth performance suddenly makes sense as a curve rather than a single damning percentage. At the honest 3 cm floor, the combined model's detection F1 is about 65% for fractures and 63% for beddings — genuinely marginal, and we have never pretended otherwise. Loosen the tolerance by less than one extra pixel, to 5 cm, and detection F1 climbs to roughly 75% for fractures and 69% for beddings: the same predictions, re-scored against a band that is still well within the range two human interpreters would disagree by, now clear the bar for structural work.

The geometric axes tell the opposite, reassuring story. Because dip and azimuth are continuous regression targets with no pixel grid underneath them, they are already strong at tight tolerance — dip accuracy near 90% at a 3° window, azimuth near 92% for fractures and 84% for beddings at 15° — all recovered in the same single forward pass. The instrument below lets you pick an axis and step its tolerance; watch how depth is the lever you have to loosen while dip and azimuth are already past the line.

GeoBFDT emits the whole (class, depth, dip, azimuth) tuple in one forward pass — but the three axes are not equally hard. Detection along the depth axis is the binding constraint: at a tight 3 cm window fracture F1 is only ~65% (beddings ~63%) and only clears the useful regime for structural work once tolerance loosens to 5 cm (~75% / ~69%); horizontal wells hold ~55% at 4 cm. The geometric axes the interpreter actually fits sinusoids for are already strong at tight tolerance — dip ~90% at 3°, azimuth ~92% (fractures) / ~84% (beddings) at 15°. Pick an axis and step its tolerance: depth is the lever you loosen, dip and azimuth are already past the line. All accuracies and tolerances are the article's own; the dashed ~70% 'useful regime' line is an illustrative reading aid (the article names no exact F1 cutoff).

The same lens explains a result that looks alarming out of context: on the five horizontal wells, fracture detection scores only about 55% at a 4 cm tolerance — a band that, on stretched, patchier horizontal logs, is again fighting the pixel floor. Held against where it counts, the model still delivers a depth recall around 90% within a 10 cm offset, the band an interpreter actually cares about when correlating a fracture between wells.

The MAE footnote that makes the argument concrete

One more number closes the loop, and it is the one I point people to first. At the 3 cm scoring tolerance, the model's mean absolute depth error is about 1.0–1.2 cm — roughly 1.04–1.06 cm for beddings and 1.19–1.20 cm for fractures.

Sit with that. The average miss is well under half a pixel; the model is, on average, landing inside the correct 3 cm box and then some. The reason F1 at 3 cm still reads "only" 65% is not that the picks are far away — they are sub-pixel close — it is that a hard threshold turns a near-miss across a pixel boundary into a binary failure. The MAE is the continuous truth; the thresholded F1 is its discretised shadow. Together they say what the pixel pitch said at the start: this model operates at the resolution floor of the measurement, and you cannot fairly ask it to go below the floor.

How to read — and write — a depth accuracy claim

If you take one operating rule from this, make it this: before you quote a depth accuracy number, ask what one pixel is worth in centimetres, and refuse to take seriously any tolerance tighter than that. It is the fastest way to tell whether a claim is honest or whether someone has quietly scored their model against rounding error to make a curve look steep.

This is not special pleading for our model — it is the same discipline you apply to any quantised measurement. You do not report a length to the micron off a ruler marked in millimetres. A borehole image is a ruler marked in 3 cm pixels; report depth to that, score depth to that, and the metrics start telling you about geology and the network instead of about the grid. We apply the same floor-first reasoning across the operators we have worked with, in the Middle East and the United States, for one reason: the cheapest way to lose trust in a subsurface model is to advertise a precision the data was never capable of.

Key takeaways

On the binary wireline log image files in this engagement, one pixel of depth ≈ 3 cm, so a ±3 cm depth uncertainty is baked into the raw data before any model runs. It is a property of acquisition and quantisation, not of the model — a human picker on the same grid faces it too.
A 2 cm scoring tolerance is smaller than one pixel: a model that picks the correct row can still be scored wrong because the prediction and label sit in adjacent 3 cm boxes. Sub-pixel tolerances measure the raster, not the network.
3 cm is the honest floor — the tightest tolerance at which a correct pick is guaranteed to count, because it equals one pixel. Depth was evaluated in 3/6/9 cm bands for exactly this reason.
Read this way the curve makes sense: combined-model detection F1 is ~65% (fractures) / ~63% (beddings) at 3 cm and climbs to ~75% / ~69% at 5 cm — same predictions, fairer band — while dip (~90% @3°) and azimuth (~92%/84% @15°) are already strong at tight tolerance because they have no pixel grid beneath them.
The mean absolute depth error at 3 cm is ~1.0-1.2 cm — under half a pixel. The thresholded F1 is the discretised shadow of a sub-pixel-accurate regressor. Before quoting any depth accuracy, ask what one pixel is worth in centimetres and discard any tolerance tighter than that.

The 1-Pixel = 3 cm Rule: The Resolution Floor Every Depth Metric Hits

Where the 3 cm comes from

Why this is an engineering problem, not a geology footnote

The case against a 2 cm tolerance

What the numbers look like once you read them this way

The MAE footnote that makes the argument concrete

How to read — and write — a depth accuracy claim

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on