Static vs Dynamic Image Logs: A 25x Class-Error Drop from Data Choice Alone

There is a comfortable story machine-learning teams like to tell, in which a hard problem yields to a better architecture: a deeper backbone, a smarter attention mechanism, one more clever loss term, and the metric moves. It is also, for a large class of subsurface problems, wrong. In our roughly twenty-month engagement with a mid-sized Middle East carbonate operator, the single largest jump in fracture-detection accuracy we ever measured came from changing nothing in the model at all. Same Detection-Transformer pipeline, same ResNet-10 backbone, same four-layer encoder and decoder, same Hungarian matching loss, same learning rate. We swapped the input from a static image-log representation to a dynamic one, retrained, and watched classification error fall from 63.45% to 2.536% — roughly a twenty-five-fold improvement, from a model that was effectively guessing to one that was deployable. This piece is about why one curve on a high-resolution borehole image log can carry so much of a model's fate, and why the same swap done for vugs would have made things worse.

Two normalisations of the same physics

A high-resolution borehole imaging tool does not record a picture. It records resistivity — an array of button-electrode measurements pressed against the borehole wall — and what reaches a petrophysicist is a normalised false-colour image derived from those raw curves. There are two standard normalisations, and the distinction is not cosmetic.

The static image applies a single colour scale across the whole logged interval — one mapping from resistivity to colour, fixed for hundreds of metres of well. It preserves absolute, formation-scale contrast: a tight, resistive carbonate versus a conductive shale streak reads correctly across the entire log. What it sacrifices is local detail. A hairline fracture inside an already-resistive bed is a small perturbation on top of a large absolute value, and a global colour scale flattens it into the background.

The dynamic image applies a sliding normalisation — a moving window, typically on the order of a metre or two, that rescales contrast locally as it travels down the borehole. Within each window the full colour range is stretched across whatever resistivity span is present. Absolute comparability is lost; what you gain is exactly what a global scale destroys — sharp, local, high-frequency contrast. A faint sinusoid that was a rounding error on the static image becomes a crisp, traceable curve on the dynamic one.

For a human interpreter this is a known trade-off, and good petrophysicists flip between both views by habit. For a machine-learning model it is more consequential: it is a choice of what information the network is even able to see. And in the raw numbers the two representations are not subtle variants of each other. In our data the dynamic channel lived in a clean, bounded 0–255 range; the static curves, before normalisation, sprawled across roughly −10⁴ to 10⁴. These are two different signals with two different dynamic ranges, and a model keys on whichever contrast it is handed.

Why the detector cares so much

The fracture model is a DETR-style set-prediction detector. It emits a fixed set of learned queries, each of which regresses the depth, dip, and azimuth of one sinusoid and decides whether a sinusoid is there at all. The whole apparatus rests on the model finding the trace of the sine wave in the image: its features, its attention, its matching cost are all downstream of contrast along that curve.

Hand that detector a static image and you have asked it to find a low-amplitude signal buried in a representation that, by construction, suppresses local amplitude. The matching loss can only assign queries to fractures it can resolve; the focal classification term can only separate foreground from background where there is contrast to separate on. With the sinusoid faded into the formation, the model hallucinates and misses in roughly equal measure — which is precisely what a 63.45% classification error describes. Hand it the dynamic image, where every fracture is locally stretched to full contrast, and the same network has a clean signal to lock onto. The error collapses to 2.536%.

This generalises far beyond borehole image logs: the form of the input representation can dominate model capacity entirely. An architecture can only recover the structure its input preserves. When the representation throws away the signal, no depth of backbone buys it back; when it surfaces the signal cleanly, even a deliberately small model converges. The instrument below makes the principle tangible — degrade the input and watch the recovered curve fall apart while the architecture sits untouched. The bound moves with the data, not the model.

Segmentation accuracy on raster logs is bounded by source-scan quality and by curve type, not by the architecture. Drag the scan from a clean studio scan to a 4th-generation photocopy of microfiche: noise and fade build on the raster, and the recovered curve degrades — the smooth Gamma Ray trace stays largely locked while the sharp Caliper trace fragments first and worst. The published F1 = 35% (10K mixed archive) is a floor that cleaner inputs lift by +12 F1, not a ceiling. F1 35%/IoU 30%, Pearson r = 0.62 on GR, the +12-curated point and +5K-CALI are the whitepaper's own; the schematic raster, recovered curve and the F1 interpolation are illustrative.

It was not noise, and it was not a fluke

A twenty-five-fold swing invites suspicion, so it is worth being precise. Static-versus-dynamic was a controlled ablation — one variable changed, everything else frozen — and the headline classification error was not the only thing that moved. The Hungarian matching loss, which scores how well predicted sinusoids pair to ground-truth ones, fell from 0.119 to 0.015, and the L1 parameter loss on depth, dip, and azimuth fell from 0.462 to 0.059. Every component of the objective improved together — the whole loss landscape responding to a cleaner signal, exactly as it would if dynamic was genuinely surfacing the geometry the model regresses.

It is also worth situating the result against the other knobs we swept with the same discipline. The well-count sweep was steepest: classification error fell from 93.115% at three wells to 18.370% at six, 1.055% at nine, 0.817% at eleven, and 2.536% for the fractures-only model on the full fourteen-well set. Augmentation was nearly as decisive — switched off, error pinned at a useless 100%; switched on, 2.618%. The static-versus-dynamic swing sits right alongside these, in the same conversation as "how many wells do you have" — not in the footnotes with second-decimal architecture tuning.

The trap: dynamic is not universally better

Here is the part that separates a real data-centric practitioner from someone who has memorised "dynamic wins." It does not always win. The right representation is a function of the feature you are detecting, and for one of the other targets in the same programme the conclusion inverts.

Consider vugs — the dissolution pores that give carbonate reservoirs much of their secondary porosity. A vug is not a thin trace; it is a roughly circular region of low resistivity, an area with an interior. What matters for quantifying it is the contrast of the whole patch against its surroundings and the integrity of its shape. The dynamic normalisation does to a vug exactly what it does to a fracture — except now that is the wrong thing. Its aggressive local stretching amplifies every small-scale texture nearby, including incipient fractures and bedding edges, dressing them up as competing high-contrast features and adding noise around the vug's boundary. For the fracture detector that amplification is the whole point. For the vug pipeline it manufactures false positives and chews at the very circularity the detector relies on.

So in the same engagement, on the same wells, our vug-quantification work deliberately ran on static imagery. The global colour scale that hides a hairline fracture is precisely what keeps a vug reading as a clean, contiguous low-resistivity blob against a stable background. Two features, two physics, two opposite right answers — from identical raw tool data.

What this means for an ML pipeline

The practical lesson is not "use dynamic image logs." It is that input representation is a first-class hyperparameter — arguably the first one you should tune — chosen per feature, from the physics, before you touch the architecture. Three things follow for any team building subsurface CV models.

First, audit the representation before you blame the model. Our largest single accuracy gain was a data decision masquerading, until we ablated it, as a modelling problem. "Is the signal even present in this input?" should come before "is my backbone deep enough?"

Second, treat the representation choice as a controlled ablation, not a default. We only know dynamic beats static by twenty-five-fold for fractures — and that static beats dynamic for vugs — because both were swept with everything else held fixed. A pipeline that hard-codes one normalisation out of habit silently leaves a first-order metric on the table.

Third, watch the raw ranges. The 0–255 dynamic span versus the −10⁴-to-10⁴ static sprawl was a tell: when two representations of the same physics differ by four orders of magnitude, any normalisation or imputation step downstream behaves completely differently on each. Several of our early data-quality flags traced straight back to a static curve silently fed where a dynamic one was assumed.

Across our subsurface engagements — image-log AI for operators in the Middle East and beyond — the pattern holds: the teams that treat data representation as the substance of the problem, not the plumbing beneath it, are the ones whose models ship. The architecture is necessary. The representation is decisive.

Key takeaways

With the model, weights, and training recipe held fixed, switching the input from static to dynamic borehole image logs cut fracture classification error from 63.45% to 2.536% — a roughly 25x improvement driven entirely by the data representation.
Static normalisation uses one global colour scale (preserves formation-scale contrast, hides hairline sinusoids); dynamic normalisation uses a sliding local window (surfaces sharp local contrast). The detector keys on the sinusoid trace, so dynamic gives it a signal to lock onto.
The swing was not an artefact of one metric: the Hungarian matching loss fell 0.119→0.015 and the L1 parameter loss 0.462→0.059 in lockstep — the whole objective improved, as expected only if dynamic genuinely surfaces the geometry.
Representation sits among the first-order knobs: comparable in magnitude to the well-count sweep (93.1%→2.5% across 3→14 wells) and augmentation (100%→2.618%), not the second-decimal architecture tuning.
Dynamic is not universally better. Vug quantification deliberately ran on static imagery, because dynamic's local contrast stretching amplifies surrounding fractures and bedding into false positives and erodes vug circularity. Choose the representation per feature, from the physics.

Static vs Dynamic Image Logs: A 25x Class-Error Drop from Data Choice Alone

Two normalisations of the same physics

Why the detector cares so much

It was not noise, and it was not a fluke

The trap: dynamic is not universally better

What this means for an ML pipeline

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on