A borehole image log is a photograph of the rock taken from the inside of the hole, and like most photographs taken in a hurry it has missing strips. The tool that makes the picture — a high-resolution borehole image log, or its slimmer compact microresistivity tool cousin — presses a ring of electrode pads against the wall and reads conductivity. The pads do not wrap the full circumference. Where the borehole is wider than the pad array can reach, the log records nothing: vertical bands of null data running the length of the well. Before any detector, classifier, or dip-picking algorithm touches that image, somebody has to decide what goes in those bands. Working a 14-well program of two different microresistivity imaging tools in a Middle East carbonate field, we tested four answers to that question — and the one that won was the one nobody writes papers about.
Why the gaps are there, and why they are not noise
The gaps in an image log are not measurement noise to be denoised. They are a geometry problem. A high-resolution borehole imaging tool carries 192 electrodes arranged in two rows across its pads and flaps — 24 copper electrodes per pad — and covers roughly 80% of the borehole wall when the hole is in gauge. The remaining circumference is simply never measured. When the hole washes out, or the tool is run in a larger diameter than its pads were sized for, the unsampled fraction grows and the null bands widen.
High-resolution borehole image logA wireline electrical borehole imaging tool. A ring of pad-mounted electrodes measures micro-resistivity around the borehole wall, producing a high-resolution unrolled image of the rock fabric. A compact microresistivity variant is slimmer, with fewer pads, for smaller holes.This matters because the features we care about in a carbonate play — bedding planes, open and healed fractures, vugs — appear in the unrolled image as sinusoids. A planar feature cutting a cylindrical borehole projects to a sine wave when you flatten the cylinder into a 2D image; the amplitude encodes the dip and the phase encodes the azimuth. A single fracture is one continuous sinusoid sweeping across the full width of the image. When a pad gap punches a vertical hole through that image, it cuts the sinusoid into pieces. Whatever you fill the gap with is no longer cosmetic — it is now part of the curve a detector will try to trace, and part of the dip and azimuth a geologist will eventually read off.
So the imputation question is unusually well-posed. We are not asking "what is the most plausible pixel value." We are asking "what fill keeps a sine wave a sine wave across a vertical cut." That single criterion — sinusoid continuity — is what separated the four methods.
The four candidates
We benchmarked four families on the dynamic image channel across the early wells in the dataset, a 14-well vertical set imaged with two different microresistivity tools from a Middle East carbonate field:
- 1D linear interpolation — fill each gap row by linearly interpolating between the last valid pixel on the left and the first valid pixel on the right. Cheap, embarrassingly parallel, no training.
- KNN imputation — scikit-learn's
KNNImputerwithn_neighbors = 5: each missing pixel is filled from the mean of its nearest neighbours in feature space, where the neighbours are other rows with similar valid pixels. - Iterative imputation — scikit-learn's
IterativeImputer: model each feature with missing values as a regression on the others and cycle until convergence. - GAN inpainting — a generative adversarial fill (the GAIN formulation), trained to hallucinate plausible texture into the masked region the way image-inpainting networks fill scratched photographs.
On paper the GAN is the sophisticated choice. Inpainting is a solved-looking problem in computer vision, and a generator that has seen enough rock should be able to paint convincing conductivity into a gap. That intuition is exactly where the experiment got interesting.
What the GAN actually did
The GAN filled the gaps with texture that looked right and was wrong. Conductivity patches inside the null band came out locally plausible — speckle, contrast, the visual signature of carbonate — but the fill did not respect the one constraint that mattered. Where a sinusoid entered the gap on the left at one phase and should have exited on the right at the continuing phase, the GAN had no notion that those two stubs belonged to the same curve. It inpainted each gap as an independent texture-completion problem. The sine wave went in, generic rock came out, and the curve was broken on the far side.
This is not a tuning failure; it is a mismatch between the loss and the task. An adversarial inpainting loss rewards local realism — fool a discriminator that judges patches. Sinusoid continuity is a global geometric constraint that spans the entire image width and ties the two sides of every gap together. Nothing in the GAIN objective encoded "the thing crossing this gap is a single planar feature whose phase is fixed by its azimuth." So the GAN produced the most realistic-looking fills and the least usable curves. We parked it.
The realism trap
Why KNN won
KNN imputation preserved sinusoid continuity at the lowest compute cost of any method that worked. The reason is almost banal: by filling each missing pixel from the mean of its five nearest neighbours — rows that already carry the local trend of the curve — KNN interpolates along the structure rather than inventing new structure. It has no generative ambition. Where a sinusoid passes through, the neighbours on either side of the gap encode where the curve is going, and the imputed pixels land on that trajectory. The sine wave stays a sine wave.
It is also cheap enough to be operational, which the iterative imputer was not. The iterative imputer also preserved continuity acceptably but was slow — it cycles regressions to convergence, and on a roughly 1.5 GB image log that is a non-starter for a per-well pipeline. 1D linear interpolation was by far the fastest — on a four-metre interval it averaged 0.115 s against KNN's 2.625 s, and it ran a whole well in roughly 11 s where KNN never finished a whole-well pass at all — but speed bought us artifacts. Linear fills stretch and flatten the curve through wide gaps and leave the vertical-line signature that downstream detectors mistake for real edges.
The quantitative tell showed up when we pushed an imputed image through the dip-and-azimuth pipeline and compared against the operator's ground-truth interpretation. At one fracture at 2697.17 m, the KNN-imputed image yielded a predicted dip/azimuth of 65.88° / 271.55° against a ground truth of 74.89° / 307.47° — close on dip, with a real azimuth gap. The alternatives at the same depth were worse: a 1D-interpolated fill drove the estimate to −19.75° / 331.16° and the iterative fill to −37.17° / 260.81° — negative dips, which are physically meaningless and a direct symptom of the fill bending the curve. KNN was the only fill that kept the geometry inside the realm of the believable.
The twist: the best fill was sometimes no fill
There is a coda that complicates the clean story. Once we moved from the unsupervised, classical pipeline to a supervised transformer detector for sinusoids, we ran a head-to-head between KNN-imputed input and a non-imputed input where the nulls were simply set to zero — left as an honest "no data here" marker. The non-imputed input won.
Why zeros beat fillsA neural detector learns to ignore a consistently-marked sentinel value far more reliably than it learns to discount a plausible-but-fabricated fill. Zeros at the gaps read as 'absence'; a KNN fill reads as 'rock' — and the model has no way to know that rock is fictional.This is not a contradiction — it is a statement about who consumes the fill. The classical sinusoid-fitting pipeline needs a continuous curve to trace with a Hough transform and a least-squares fit; it cannot fit a sine wave that has holes in it, so it needs KNN. A supervised detector with enough labelled examples learns the gap structure itself and is better off seeing an unambiguous "missing" token than a confident fabrication it has to second-guess. For vug detection we landed on a third answer again — filling nulls with the local median colour rather than interpolating, specifically to avoid the imputation generating false vugs at the gap edges. The right fill is a function of the downstream task, not a property of the image.
Key takeaways
- Pad gaps on these microresistivity imaging tools are a geometry problem, not noise: the tool only covers ~80% of the borehole wall, so null bands cut through every sinusoid. The fill becomes part of the curve a detector traces.
- Score imputation on the downstream feature, not on pixel realism. GAN inpainting produced the most realistic-looking fills and the least usable curves — its adversarial loss rewards local realism, but sinusoid continuity is a global geometric constraint it never encoded.
- KNN imputation (n_neighbors=5) won for the classical pipeline: it interpolates along existing structure, preserves sinusoid continuity, and is cheap. The iterative imputer also preserved continuity but was too slow for per-well runs; 1D linear interpolation was fastest (~11s whole-well vs KNN never finishing) but stretched curves and left vertical-line artifacts.
- The quantitative tell: at a 2697.17 m fracture, KNN-imputed dip/azimuth (65.88°/271.55°) stayed physically plausible against ground truth (74.89°/307.47°), while 1D and iterative fills produced negative — meaningless — dips.
- The best fill depends on the consumer. For the supervised transformer detector, leaving gaps as zeros beat KNN imputation; for vug detection, a local-median fill avoided false vugs. There is no universal imputation; there is only the right fill for the next stage.
What we would build next
The honest limitation of every method here is that none of them knows about sinusoids. KNN won by accident of locality, not by design. The principled next step — flagged in our Phase 1 reporting but not yet built at the time — is a masked-autoencoder approach in the spirit of He et al. (2021): mask the pad gaps deliberately during self-supervised pretraining and let the network learn to reconstruct the rock fabric, including the continuation of planar features, from the surrounding context. A model trained to reconstruct masked image-log patches would, unlike the GAN, have a reconstruction objective that directly penalises a broken sinusoid. That is the version of "fill the gap" worth building — but it earns its keep only against a baseline, and the baseline to beat is not the GAN. It is KNN.
(undefined, undefined) ·References
[1] K. He, X. Chen, S. Xie, Y. Li, P. Dollár, R. Girshick. Masked Autoencoders Are Scalable Vision Learners (2021). arXiv:2111.06377. The self-supervised masked-reconstruction recipe referenced as the principled successor to KNN/GAN gap-filling for image logs. https://arxiv.org/abs/2111.06377