Skip to main content

Blog

From One Well and 32 Sinusoids to a Production Fracture Detector

A Detection Transformer that picks fractures on borehole image logs is the visible 15% of the work. The other 85% is data engineering. This is the build log of how a fracture detector grew from a single well with 32 hand-picked sinusoids and 236 patches to a 75,000-patch training corpus — a 132x expansion driven not by a new architecture but by augmentation and overlapping-window sampling. The lesson for any applied-AI team: the dataset, not the paper, was the product.

Quamer Nasimby Quamer Nasim9 min read
EarthScan insight

Every applied-AI team has a moment where the architecture stops being the hard part. Ours arrived early. In a roughly twenty-month engagement with a mid-sized Middle East carbonate operator we partnered with, the model that would eventually pick fractures across kilometres of borehole imagery — a Detection Transformer regressing the depth, dip, and azimuth of each sinusoid — was conceptually settled long before it worked. What stood between a clever notebook and a deployable detector was not a better backbone or a cleverer loss. It was data: how to manufacture, from a single well that a geologist had labelled with 32 sinusoids, enough supervised signal to train a set-prediction transformer that generalises across a field. This piece is the build log of that expansion, and the argument it makes is blunt — the dataset, not the paper, was the product.

The cold-start problem: one well, 32 sinusoids, and a brutal imbalance

Start where we started. The first labelled reservoir interval we had was a single vertical well, a high-resolution borehole image log over roughly sixty metres of carbonate. A geoscientist had hand-picked every fracture in that interval. The total: 32 sinusoids.

To turn an image log into something a detector can train on, you tile it into patches. Our first cut produced 236 patches of 100×360 pixels over that interval — and of those 236, only 19 contained any sinusoid at all. That is the cold-start problem stated in numbers. A set-prediction model has to learn what a fracture looks like and learn to confidently say "nothing here" on the empty patches — but with 217 of 236 patches empty, the foreground-background imbalance is so severe that a naïvely trained model collapses to predicting "no fracture" everywhere and scores beautifully on a metric that means nothing.

There is no architectural trick that fixes 32 examples. You cannot regularise your way out of having seen almost no fractures. The transformer was never the bottleneck; the supervised signal was. And that reframes the whole project. The visible deliverable — the model — is the small part. The load-bearing work is the engineering stack underneath it that turns scarce, confidential, hand-labelled geology into a training corpus large and varied enough for the matcher to converge.

THE 85% UNDER THE MODEL · 6-LAYER STACK~50%of pilots never ship3 / 6 layers load-bearingBuild the stack up — the model is only the capA model is only as production-ready as the weakest layer below it.production ceilingModel — ~15% of the journey⤓ detached — POC purgatoryHPCbuilt · load-bearingData engineeringbuilt · load-bearingData unificationbuilt · load-bearingAI / MLdrift watch is decorativeAgentsunauditablePlatform & deploymentoutside the perimeterbuild linedrag the build line ↑ · column sizing schematicWHY THE PILOT STALLS3 layers missing below the model.Lowest gap — AI / ML:drift watch is decorative.The model can't reach production overan incomplete stack. It joins the ~50%that never ship — a failure of plumbing.The working model is ~15% of the journey.The other ~85% is the six-layer stack —and pilots die where the stack has seams.Own the stack: data + weights stay in your perimeter.~15% model / ~85% stack, the six named layers & the ~50%-never-ship figure are the whitepaper's own · column sizing schematic
Pilots don't stall because the model is weak. The working model is only ~15% of the journey; the other ~85% is a six-layer engineering stack (HPC → Data engineering → Data unification → AI/ML → Agents → Platform/deployment), and a project ships only when every layer below the model is built to production grade. Drag the build line up the load-bearing column: with all six built the model reaches the production ceiling; with any gap below it the model detaches into POC purgatory — the ~50% that never ship. The ~15%/~85% split, the six layers and the ~50% figure are the whitepaper's own; the equal-sixths column sizing is schematic.

Multiplier one: augmentation that respects the geology

The first multiplier was augmentation, and the discipline here is what separated a useful expansion from a noisy one. The naïve move is to augment every patch uniformly. That would have been a mistake: augmenting 217 empty patches ten times over just deepens the imbalance with ten thousand variations of "nothing." So we augmented only the sinusoid-bearing patches — the 19 that actually carried signal — and re-cut the patch geometry to 100×270 pixels (about 0.25 m of borehole) with a vertical stride of 20 pixels, generating 10 augmented variants per sinusoid patch.

The transforms were chosen to model the real variability of image logs, not to invent it: ColorJitter, GaussianBlur, Sharpen, GaussNoise, Emboss, and MedianBlur. These are the perturbations a sinusoid genuinely undergoes across tools, mud systems, and acquisition conditions — brightness and contrast shifts, focus and resolution changes, sensor noise, edge emphasis. Augmentation here is not data-faking; it is a learned invariance prior. We are telling the model: a fracture is still a fracture under all of these, so do not key on any of them.

The arithmetic of that single move:

  • Total patches: 236 → 4,212
  • Sinusoid-bearing patches: 19 → 2,046
  • Individual sinusoids: 32 → 3,565

A greater-than-tenfold expansion of the usable signal, and — because we only multiplied the foreground — a class balance that finally let the Hungarian matching loss find real fractures to assign instead of drowning in no-object patches. The augmentation ablation later quantified exactly how load-bearing this was: with augmentation switched off, the model's classification error pinned at 100% — it learned nothing usable; switched on, it fell to 2.6%. That is not a tuning knob. That is the difference between a model and a non-model.

Multiplier two: overlapping windows, and the 132x jump

Augmentation got us off zero on one well. It did not get us to a field-scale corpus, and it could not, because tiling a log into non-overlapping patches is wasteful in a way that matters when labels are this scarce. A fracture that straddles a patch boundary appears truncated in both halves — the model sees two partial sinusoids instead of one whole one, and a partial sinusoid is an ambiguous training target. Worse, every metre of expensively labelled borehole yields only one view of itself.

The fix was overlapping-window sampling. Instead of stepping the patch window by its full height, we slid it with heavy overlap — initially 90%, later relaxed to 80% — so that every fracture appears, fully framed and centred, in many patches rather than once. We also grew the window itself, from the original 0.25 m (100 px) to 1.8 m (685 px), and ultimately to a 2.2 m (800 px) patch chosen so that more than 95% of fractures and bedding sinusoids fit entirely inside a single patch. The window size was not arbitrary; it was fitted to the distribution of fracture heights in the reservoir, so that the dominant case is always a whole sinusoid, never a clipped one.

This is data engineering at production scale, and it is unglamorous. Each high-resolution borehole image is roughly 690,000 × 360 pixels and about 1.5 GB on disk; overlapping-window sampling at a stride of tens of pixels multiplies the read, decode, normalise, and patch-extract load by more than an order of magnitude. The pipeline that did this — binary wireline log-file ingestion, depth-registration, dynamic-image normalisation, windowed patch extraction, and on-the-fly augmentation — is the system. The model is a config file by comparison.

The payoff is the headline number of this whole arc. Across the experiment progression, the training corpus grew from the original 235 patches on one well to 75,000 patches spanning three wells — a 132-fold expansion — driven entirely by augmentation and overlapping-window sampling rather than by any change to the network. The earliest experiment was a single well, 235 patches, KNN-imputed gaps, a 0.25 m window, no augmentation. The latest was three wells, 75,000 patches, a simpler and faster zero-fill imputation, a 1.8 m window at stride 20, and 5–10× augmentation. Same architecture. Two orders of magnitude more signal.

What the expansion bought, and what it didn't

The data work paid off in convergence, and the well-count ablation shows the curve precisely. Sweeping the number of training wells, classification error falls off a cliff as geology accumulates: 93.1% at 3 wells (essentially guessing), 18.4% at 6 wells, 1.06% at 9 wells. The model is not learning slowly — it is learning suddenly, the moment the corpus crosses a threshold of variety. That non-linearity is the entire justification for the engineering: set prediction is data-hungry, and the difference between a useless detector and a deployable one was a handful of wells' worth of well-engineered patches.

It is worth being precise about what scaled and what did not. More patches from the same well is augmentation — it teaches invariance. More wells is generalisation — it teaches the actual diversity of fractures in the reservoir. The 132x patch expansion bought us a model that converges and a class balance that lets the matching loss work; it did not, by itself, buy out-of-distribution robustness. That came from widening the well count and, later, from the non-overlapping evaluation corpus we held back honestly — 2,291 images across 14 wells for the fractures-only model, 1,492 from 11 wells for the combined model — so that the numbers we reported were measured on patches the augmentation pipeline had never touched. Train on overlapping, augmented patches; evaluate on clean, non-overlapping ones. Conflating the two is the most common way an image-log model flatters itself, and we engineered the split specifically to avoid it.

The lesson for applied-AI teams

If there is one transferable lesson from this build — and it has held across our engagements with operators in the Middle East and the United States — it is to budget for the dataset as the deliverable. The architecture was a known quantity within weeks. The remaining months were a data-engineering programme: imputation strategy (and the decision to drop expensive KNN filling for fast zero-fill once we proved it didn't hurt), patch geometry fitted to the physical fracture-height distribution, augmentation confined to the foreground class, overlapping-window sampling tuned for whole-sinusoid framing, and a clean train/eval split that keeps augmented patches out of the test set.

None of that is in the model card. All of it is why the model works. A Detection Transformer is the visible 15% of an applied-AI build; the 85% beneath it — ingestion, normalisation, augmentation, sampling, evaluation hygiene — is the part that determines whether the project ships or stalls in proof-of-concept. We started with 32 sinusoids. We finished with a production fracture detector. The distance between those two facts is data engineering, not deep learning.

Key takeaways

  1. The cold start was brutal: one well, 32 hand-picked sinusoids, 236 patches of which only 19 carried any signal. No architecture fixes that — the bottleneck was supervised data, not the model.
  2. Augmentation confined to the foreground class (10 variants per sinusoid patch via ColorJitter/GaussianBlur/Sharpen/GaussNoise/Emboss/MedianBlur) grew the set >10x — 236→4,212 patches, 19→2,046 sinusoid patches, 32→3,565 sinusoids — and fixed the class imbalance instead of deepening it.
  3. Overlapping-window sampling (90%→80% overlap) with a window fitted to fracture height (0.25 m→1.8 m→2.2 m/800 px, where >95% of sinusoids fit) scaled the corpus 235→75,000 patches, a 132x expansion with no change to the network.
  4. The well-count ablation shows why it mattered: classification error fell 93.1% (3 wells) → 1.06% (9 wells), and augmentation off-vs-on was 100% → 2.6%. Set prediction is data-hungry; convergence is sudden once variety crosses a threshold.
  5. Evaluation hygiene was engineered in: train on overlapping, augmented patches; report on clean non-overlapping corpora (2,291 images/14 wells fractures-only; 1,492/11 wells combined). The dataset, not the paper, was the product — the visible model is ~15% of an applied-AI build.
Go to Top

© 2026 Copyright. Earthscan