Skip to main content

Case Study

Deep Learning for Fracture Detection in Borehole Images

A DETR-based model achieved 85% sensitivity for fracture detection in Middle East carbonate reservoirs, cutting manual interpretation to validator-ready output.

Tannistha Maitiby Tannistha Maiti
Case study

A global energy operator in the Middle East turned 14 wells of manually picked borehole image logs into a DETR-based fracture and bed detection model that achieved 85% sensitivity at reservoir scale — replacing per-well manual sinusoid picking with end-to-end transformer inference that predicts depth, dip, and azimuth in a single forward pass.

At a glance

Three metrics frame the shift from manual interpretation to transformer-based detection.

~85%
at 8 cm depth threshold

Fracture detection sensitivity

>95%
at 4° threshold

Dip accuracy

2,291
fracture patches, 14 wells

Training corpus

The challenge

Manual identification of fractures and bedding surfaces from borehole image logs — FMI and CMI data — sits at the bottleneck of every carbonate reservoir characterisation workflow. Each sinusoid crossing an image unwrap represents a planar feature intersecting the wellbore; extracting its depth, dip, and azimuth requires an interpreter to pick the trace, fit a curve, and calculate geometry. Across hundreds of metres and dozens of wells, this is slow, subjective, and inconsistent.

The operator faced a mature Oman carbonate reservoir with 14 vertical wells and no scalable path to consistent dip and azimuth extraction at reservoir scale. Existing supervised learning approaches — Mask R-CNN, U-Net segmentation — struggled with overlapping fractures in a single image patch and required mask creation as a preprocessing step. The team needed an end-to-end architecture that could handle multiple intersecting sinusoids, predict geometry directly, and generalise beyond the training wells without per-interpreter calibration.

What we did

We built a Detection Transformer (DETR) model with a ResNet-10 backbone and transformer encoder-decoder that processes borehole image patches and directly predicts sinusoid depth, dip, and azimuth — no mask creation, no post-processing curve fits, no heuristic fusion of overlapping detections.

The training corpus comprised 2,291 image patches for fractures and 1,492 for beds, extracted from 14 vertical wells in the Oman carbonate reservoir. To overcome data scarcity, we applied augmentation via colour jitter, Gaussian noise, blur, and sharpness transforms — expanding the effective training set while preserving sinusoid geometry.

DETR fracture detection pipeline

  1. Image patch extraction

    2,291 fracture patches, 1,492 bed patches from 14 wells

  2. Augmentation

    Colour jitter, noise, blur, sharpness — geometry-preserving transforms

  3. ResNet-10 backbone

    Feature extraction from image unwrap

  4. Transformer encoder-decoder

    Object queries predict depth, dip, azimuth per sinusoid

  5. Bipartite matching

    Hungarian assignment to ground-truth picks during training

The DETR architecture handles multiple intersecting fractures in a single image patch by design: the transformer decoder attends to the full feature map and emits a set of object queries, each predicting a bounding box (depth extent) plus regression outputs for dip and azimuth. Bipartite matching during training assigns each predicted query to a ground-truth sinusoid, eliminating the need for non-maximum suppression or hand-tuned overlap thresholds.

We trained separate models for fractures and beds, measuring performance at multiple depth and angle thresholds to capture interpreter tolerance. At an 8 cm depth threshold — the distance within which a predicted sinusoid must fall to count as a true positive — the fracture model achieved approximately 85% sensitivity. Dip accuracy exceeded 95% at a 4° threshold, and azimuth accuracy reached 93% at 20°, outperforming prior Mask R-CNN baselines limited to single-fracture-per-image detection.

Bed detection reached an F1 score of approximately 72% at a 6 cm depth threshold, with azimuth mean absolute error of around 20° and dip accuracy stabilising at 96% beyond a 3° threshold. The lower F1 reflects the sparser labelling of bedding surfaces in the training corpus — fractures were the primary interpretation target — but the geometric accuracy remains within interpreter tolerance for reservoir modelling.

The outcome

At an 8 cm depth threshold — the window within which interpreters consider a pick 'correct' — the fracture model delivered approximately 85% sensitivity, meaning it recalled 85 out of every 100 manually labelled fractures. Dip accuracy crossed 95% at a 4° threshold, and azimuth accuracy reached 93% at 20°, both well within the tolerance bands that feed reservoir models and stress field inversions.

~85%
8 cm depth threshold

Fracture sensitivity

>95%
4° threshold

Dip accuracy

~93%
20° threshold

Azimuth accuracy

Bed detection hit an F1 of 72% at 6 cm depth threshold, with azimuth mean absolute error around 20° and dip accuracy at 96% beyond 3°. The sparser bed labels in the training set — a reflection of the interpreter's focus on fracture networks — limited recall, but the geometric predictions remain fit for purpose.

Critically, the DETR architecture handled multiple intersecting fractures in a single image patch without mask preprocessing or heuristic fusion. Where Mask R-CNN required one fracture per crop and post-processing to merge overlapping detections, the transformer decoder emitted a set of sinusoid predictions in a single forward pass, each with depth, dip, and azimuth ready for geological interpretation.

Why DETR beats segmentation for sinusoids

Set-based prediction via transformer object queries eliminates the mask-creation bottleneck and handles overlapping fractures natively — no post-processing, no heuristic merge logic, no single-fracture-per-crop constraint.

What this unlocked

The immediate unlock was speed: inference runs at sub-second latency per image patch on GPU, turning a days-long manual picking campaign into a validator-reviewed output in hours. The asset team now processes new wells as they come online, feeding fracture and bed picks into reservoir models within the same interpretation cycle instead of waiting weeks for senior interpreter availability.

Consistency is the second unlock. Manual picking varies interpreter-to-interpreter, well-to-well, even shift-to-shift for the same person. The DETR model applies a uniform decision boundary trained on 14 wells of consensus labels, eliminating the drift that complicates cross-well correlation and stress field inversion.

The third unlock is scalability beyond the training reservoir. Transfer learning from the Oman carbonate model to a second field — different lithology, different image acquisition settings — required fine-tuning on fewer than 500 patches. The transformer architecture's attention mechanism generalises sinusoid geometry across image quality variations better than convolutional baselines, reducing the labelling cost for every new deployment.

Lessons and next steps

Three lessons emerged from moving DETR into production on borehole image interpretation.

Three lessons from deploying DETR on borehole images

  1. Geometry-preserving augmentation (colour jitter, noise, blur) expands small training sets without distorting sinusoid shape.
  2. End-to-end prediction of depth, dip, and azimuth eliminates segmentation masks, curve fitting, and post-processing fusion — each a source of error and tuning debt.
  3. Interpreter tolerance (8 cm depth, 4° dip) defines the validator-ready threshold; hitting it at 85% sensitivity and 95% dip accuracy moves the model from research to production.

First, data scarcity is tractable when augmentation preserves geometry. Colour jitter, noise, and blur expanded the effective training set without distorting sinusoid shape, letting a 2,291-patch corpus generalise across 14 wells and transfer to adjacent fields with minimal fine-tuning.

Second, end-to-end prediction beats segmentation-then-fit pipelines for structured geological features. Predicting depth, dip, and azimuth directly from transformer object queries eliminated mask creation, curve fitting, and post-processing fusion — each a source of error propagation and hyperparameter tuning debt.

Third, interpreter tolerance defines the performance threshold that matters. An 8 cm depth window and 4° dip threshold are not arbitrary: they reflect the geological uncertainty baked into reservoir models. Hitting 85% sensitivity and 95% dip accuracy within those windows means the model output is validator-ready, not research-stage.

Interpretation workflow: manual vs DETR

Before

Days per well

After

Hours per well

Next steps focus on multi-well joint inference — treating a pad or field as a single graph where cross-well fracture correlations constrain per-well predictions — and active learning loops that flag low-confidence picks for human review, turning every correction into a training sample. The bottleneck in borehole image interpretation isn't data volume — it's the assumption that a human must sit between every image and every insight. End-to-end transformer models break that assumption, turning sinusoid picking from a per-well manual task into a scalable, validator-ready output.

References

1 Caruana et al. (2020). Detection Transformers (DETR): End-to-End Object Detection with Transformers. ECCV 2020. https://arxiv.org/abs/2005.12872

2 He et al. (2017). Mask R-CNN. ICCV 2017. https://arxiv.org/abs/1703.06870

3 Benchmark performance metrics (85% sensitivity, 95% dip accuracy, 93% azimuth accuracy) derived from internal validation on 14-well Oman carbonate dataset, 2,291 fracture patches.

Go to Top

© 2026 Copyright. Earthscan