Every team that trains models on borehole image logs has a version of this story. Ours, from a roughly twenty-month engagement with a mid-sized Middle East carbonate operator we worked with, is unusually clean because the failure was caught at exactly the right moment — before any gradient had been computed, by a histogram that took thirty seconds to plot. Two wells in the batch carried their high-resolution borehole image logHigh-resolution borehole image log: a wireline tool that produces a high-resolution electrical image of the borehole wall by measuring microresistivity across an array of pad-mounted electrodes. The output is a raster image whose pixel intensities encode resistivity contrast — bright for resistive rock, dark for conductive fractures and fluids. pixel intensities in the range 0–15 instead of the expected 0–255. It is a small number — a factor of sixteen — and it is exactly the kind of thing that destroys a model silently if you do not look for it.
This is a field note about that bug: what it was, why it is so dangerous precisely because it is invisible downstream, how exploratory data analysis surfaced it, and why we let it shrink the dataset rather than paper over it. The headline is unglamorous and worth internalising anyway. In a deep-learning pipeline for image logs, the most expensive defects are not in the model. They are in the rasters you feed it, and they are caught by QC or not at all.
What a borehole image-log raster is supposed to look like
Start with the contract. A high-resolution borehole image log, once processed, is a raster image: a tall strip roughly 690,000 pixels deep by 360 pixels around, where the horizontal axis is azimuth around the borehole and each pixel intensity encodes the resistivity contrast of the rock at that point. A single well's image is around 1.5 GB. By convention — and by the requirements of every pretrained vision backbone you might want to use — those intensities live on an 8-bit scale, 0 to 255: 0 is the most conductive (a fracture, a vug, brine), 255 the most resistive (tight matrix). Our pipeline's normalization step explicitly rescales image-log intensities to that 0–255 range before anything else touches them, because the rest of the stack — the patch extractor, the augmentation chain, the ImageNet-statistics normalizer feeding the convolutional backbone — assumes it.
That assumption is the whole point. Once a raster is on the 0–255 scale, the entire downstream pipeline is invariant to which well it came from. A patch from one interval looks, statistically, like a patch from another. The model learns geology, not gain settings.
Now break the contract. Suppose two wells arrive with their intensities living on 0–15 instead. Visually, on a casual glance, the images can still look like image logs — dark where it should be dark, banded where it should be banded — because the relative structure survives. But the dynamic range has been crushed into the bottom 6% of the scale. Sixteen distinct intensity levels are doing the work of two hundred and fifty-six.
Why this is a silent killer, not a loud one
Here is the trap, and it is worth stating precisely because it is the reason QC has to be upstream of training rather than inferred from results.
A 16× range compression does not throw an exception. It does not produce NaNs. It does not crash the data loader. The patch extractor happily tiles a 0–15 raster into the same overlapping windows it would cut from a clean one. The augmentation chain — color jitter, blur, sharpness, noise — still runs; it just operates on a near-flat input. The backbone's normalizer subtracts the ImageNet mean and divides by its standard deviation as usual, except now it is normalizing a signal with almost no variance, so the post-normalization activations are tiny and nearly constant. The model trains. The loss goes down. Nothing in the run log looks wrong.
What you have actually done is hand the network two wells of near-zero-contrast imagery and label them as if they were full-contrast geology. The sinusoid edges that a fracture-detection model keys on — the bright-to-dark transitions that trace a fracture's path around the borehole — are quantised into a handful of levels and largely washed out. The model cannot learn the feature from those wells, and worse, it is penalised for failing to find fractures the labels insist are there. Two corrupted wells do not just contribute nothing; they inject a consistent, wrong signal into the gradient. In a ten-well batch where data is the scarce resource, that is a meaningful fraction of your supervision quietly pulling the wrong way.
And you would not find out from the metrics. The aggregate F1 would dip a little, the per-well breakdown might look noisy, and a tired team would shrug and blame "hard wells." This is the general law of image-log machine learning: scan and source quality, not architecture, sets the ceiling on what any model can recover. The same dynamic shows up whenever raster quality varies across a dataset — drag the input from a clean source to a degraded one and watch the recoverable accuracy fall, regardless of how good the model is.
The instrument above makes the point with a different curve type, but the mechanism is identical to our pixel-range bug: degrade the raster and the model's achievable accuracy drops, and no amount of architecture search buys it back. A 0–15 well is just a particularly severe, particularly invisible form of "degraded scan."
How EDA caught it: look at the histogram before the loss curve
The fix is embarrassingly simple, which is the lesson. Before any patching, augmentation, or training, our exploratory data analysis stage computes a per-well intensity histogram and the min/max of every raster. It is the first thing the pipeline does after ingest, and it exists precisely to make range violations loud instead of silent.
On the two bad wells, the histogram told the whole story at a glance: every pixel intensity sat between 0 and 15, with a hard ceiling at 15, while the other eight wells in the batch filled the full 0–255 span. There was no ambiguity, no judgement call, no "is this a hard well?" The static image-log range was wrong by construction — sixteen times too narrow, not 0–255 normalized — and a one-line range check flagged it.
Two engineering details make this kind of check trustworthy rather than fragile:
- Mask the sentinels first. Image logs do not cover the full borehole wall — coverage tops out around 80%, and the unmeasured pixels are coded with a −9999 sentinel that the pipeline converts to NaN. If you compute a naive min/max without masking those out, the −9999 floor swamps everything and the histogram lies. Range QC has to run after NaN handling, not before.
- Check the range, not just the appearance. You cannot eyeball a 0–15 well reliably — relative structure survives compression, so a thumbnail can look plausible. The histogram and the explicit
max == 15signature are what make the defect unmissable. Trust the numbers over the rendering.
This is why we treat EDA as a gate, not a courtesy. It is the cheapest line of defence in the entire stack and it catches the most expensive class of error.
Why we excluded rather than rescaled
The obvious objection: a factor of sixteen is invertible. Multiply the 0–15 raster up to 0–255 and carry on, right? We did not, and the reasoning is worth spelling out because "just rescale it" is a tempting trap.
A 0–15 raster is not a 0–255 raster that someone divided by sixteen. It is a raster that was quantised to sixteen levels somewhere upstream — in export, in a tool conversion, in a processing step we did not control. Rescaling restores the axis labels but not the information. You cannot recover 256 levels of resistivity contrast from 16; the intermediate values are gone, and upsampling just spreads sixteen plateaus across a wider axis. Feeding that to the model is still feeding it crushed geology, now wearing the right units. It would defeat the QC that just caught it.
There was also a provenance problem, which is the deeper reason. When two wells arrive with a range that the other eight do not share, the range anomaly is a symptom. We did not know — at QC time — what else about those exports might be off. The conservative, reproducible move in a regulated subsurface workflow is to exclude the wells with the unverified provenance, document why, and proceed on the clean subset. So the batch went from ten wells received to eight usable, and the model was trained on eight rather than letting two corrupt ones contaminate the gradient.
One of the excluded wells earned a second life: because it sat cleanly outside the training distribution, we later reserved it as a transfer-learning robustness test — exactly the held-out, slightly-different well a ten-well batch can never spare from training. The defect that disqualified it from training made it useful for evaluation. That is the kind of upside you only get if QC flags the anomaly explicitly instead of quietly absorbing it.
The general rule for image-log pipelines
This was not a one-off. Across our subsurface engagements — with operators in the Middle East and the United States — the same class of defect recurs because image logs pass through long, multi-vendor processing chains before they reach a model: two different microresistivity imaging tools, different binary wireline log-file exports, different conventions for scale, sentinels, and coverage. At a digital log-format resolution where one pixel is about 3 cm — a built-in ±3 cm depth floor before any model error — you cannot afford to also carry an intensity-axis error you never checked for.
So the rule we enforce, and the one this incident hard-wired into the pipeline, is short:
- Range QC is a build gate, not a notebook. Every raster's intensity range is checked against the expected 0–255 before it is allowed into patching. A well outside the band fails the build; it does not get a warning buried in a log.
- QC runs after sentinel masking. −9999/NaN handling first, statistics second.
- Exclude on unverified provenance. A range anomaly is a symptom; quarantine the well, document it, and train on the clean subset rather than rescaling around the problem.
- The defect is in the data, so the test is on the data. This is the data-engineering layer of an ML system — and it deserves the same test discipline as the model code. A histogram assertion is a unit test for your geology.
The augmentation chain that turned a few hundred labelled patches into thousands — in our case roughly 236 patches expanded to 4,212 — is downstream of all of this. Augmentation multiplies whatever you give it. Give it eight clean wells and it manufactures useful variety; give it ten wells, two of them crushed to 0–15, and it faithfully manufactures thousands of corrupt patches with confident-looking labels. Garbage in, at scale.
The thirty-second histogram is the cheapest insurance you will ever buy against it.
Key takeaways
- Two wells shipped with image-log pixel intensities on a 0–15 scale instead of the expected 0–255 — a 16× dynamic-range compression that crushes sinusoid contrast while leaving the image superficially plausible. The pipeline normalizes image-log intensities to 0–255 precisely so every downstream stage is invariant to the source well; a 0–15 raster silently breaks that contract.
- The defect is dangerous because it is silent: no exception, no NaN, no crash. The data loader, augmentation chain, and backbone normalizer all run normally, the loss still falls, and two corrupted wells inject a consistent wrong signal into the gradient — undiscoverable from metrics alone.
- Exploratory data analysis caught it before training: a per-well intensity histogram with a hard ceiling at 15 flagged both wells against the eight clean ones. Run range QC AFTER masking the −9999 / NaN sentinels (image logs cover only ~80% of the borehole) and trust the histogram over the thumbnail.
- We excluded rather than rescaled — a 0–15 raster is quantised to 16 levels, not merely divided by 16, so the lost contrast is unrecoverable, and the range anomaly signals unverified export provenance. The batch went from 10 wells received to 8 usable; one excluded well was later repurposed as a transfer-learning robustness test.
- Make range QC a build gate, not a notebook cell. The data-engineering layer deserves the same test discipline as model code: a one-line histogram assertion is a unit test for your geology, and augmentation (here ~236 → 4,212 patches) multiplies whatever you feed it — clean or corrupt.