Why DBSCAN and Hierarchical Clustering Couldn't Separate Sinusoids — and Supervised DETR Could

A fracture on a borehole image log is a sinusoid. Unroll the cylindrical wall of a well into a flat strip and a planar feature cutting the borehole traces a clean sine wave — amplitude encodes dip, phase encodes azimuth. So when a machine-learning team first looks at the problem, the obvious framing is unsupervised: the bright and dark pixels that make up each sinusoid are self-similar, they form coherent curves against a noisy carbonate background, and clustering is exactly the family of algorithms built to find coherent groups in unlabelled data without anyone hand-drawing a single training mask. No labels, no annotation budget, no waiting on a geologist. It is the cheapest possible hypothesis, and it is the right one to test first.

In a roughly twenty-month engagement with a mid-sized Middle East NOC carbonate operator we partnered with, that is precisely the hypothesis we tested in Phase 1. The whole program was staged deliberately: Phase 1 unsupervised, Phase 2 supervised, Phase 3 well-to-well correlation. The point of an unsupervised Phase 1 was to find out, honestly and early, whether the geology would yield to a label-free approach before committing to the expense of a labelled supervised pipeline. It did not. This piece is the engineering post-mortem of why DBSCAN and hierarchical clustering could not reliably separate sinusoids on high-resolution borehole resistivity imagery — and why their failure is what justified, rather than merely preceded, the pivot to a supervised Detection Transformer.

Framing the image as points: the DBSCAN setup

The first move in any clustering approach is to decide what a "data point" is. For density-based clustering with DBSCAN, we embedded each image-log pixel as a three-dimensional point: an azimuthal coordinate X in [0, 359] (the 360 columns of the unrolled borehole), a vertical coordinate Y running [0, size_of_image] down the well, and the pixel intensity Z in [0, 255]. DBSCAN then grows clusters by walking from a seed point to every neighbour within a radius eps that has at least min_sample neighbours of its own — no cluster count required up front, and an explicit notion of "noise" for points that belong to nothing. On paper this is a near-perfect match for sinusoids: a fracture is a dense, connected ribbon of similar-intensity pixels snaking across the strip, and everything else is background.

The trouble is that the entire behaviour of DBSCAN lives in those two hyperparameters, and on borehole imagery the two have no stable joint setting. Set eps slightly too small and a single sinusoid shatters into a dozen disconnected fragments wherever the conductive trace momentarily thins or the imager pad loses contact; set it slightly larger and the sinusoid fuses with the lithological background or with the next sinusoid crossing it. Carbonate image logs are heterogeneous along the well — texture, contrast, and fracture density all drift with depth — so an eps/min_sample pair tuned on one interval falls apart on the next.

We did not infer this from a handful of runs. The team trained and tested more than 10,000 different parameter combinations searching for a setting that generalised, and the fine-tuning of eps and min_sample was, in the contemporaneous engineering notes, described bluntly as "very unstable." That phrase is the whole story: not "underperforming," but unstable — small parameter moves producing wildly different segmentations, with no basin of attraction around a good answer. An algorithm whose output is that sensitive to hyperparameters you cannot fix in advance is not a production candidate. It is a research curiosity.

Hierarchical clustering: the count you cannot know, and the memory you do not have

If density-based clustering is too unstable, the textbook next stop is hierarchical (agglomerative) clustering — build a tree of merges and cut it at the level that yields the structure you want. We implemented it as a graph problem, merging via a Kruskal-style minimum-spanning-tree procedure down to k connected components. On a clean toy example it behaves: with k = 2, it cleanly resolves two sinusoids in a patch.

Two hard problems killed it for production. The first is the k problem, and it is fundamental rather than incidental. Hierarchical clustering needs the number of clusters chosen before you can read off an answer — but the number of sinusoids in a patch is exactly the unknown the model is supposed to discover. Guess k too low and two distinct fractures are forced into one component; guess it too high and a single sinusoid is broken into more than one region. Every patch down a well has a different true count, so there is no global k, and no unsupervised way to set it per-patch without already knowing the answer. We saw the same pathology with k-means on the same data: at k = 4 a clustered patch would split one sinusoid across multiple labels. The method cannot count, and counting is the job.

The second problem was brutally practical: it does not scale. Agglomerative clustering on full-resolution image-log patches is quadratic in the number of points, and the borehole pixel grid is enormous — a single image log often runs more than 200 metres down the well at 360 columns wide, so the point count balloons. The implementation ran out of memory even on a 50 GB Google Colab machine, and where it did run, it only ever managed small segments and resisted any attempt to automate it across a whole well. A method that cannot fit a single well into 50 GB of RAM, and that needs a parameter you cannot know, is not a pipeline you can run over 80-plus wells.

For completeness, the team also tried CNN-based unsupervised segmentation — a differentiable feature-clustering network with a spatial-continuity loss, run for 1,000 iterations with the label-continuity weight mu set to 1. It was better-behaved than DBSCAN, but it bottomed out at a ceiling of two to three segmented classes — vugs, sinusoids, and background lithology — which is enough to highlight where features are but nowhere near enough to separate and parameterise individual fractures with their own dip and azimuth. The unanimous Phase-1 conclusion was that complex geological features were simply not separable by unsupervised methods, and the program should move to supervised learning. That conclusion is what makes the pivot defensible rather than fashionable.

Why these are not tuning failures

It is tempting to read all of this as "they didn't tune hard enough." More than 10,000 DBSCAN runs is a direct rebuttal — the instability is structural, not a search-budget artifact. The deeper point is that all three unsupervised methods fail for the same reason, and it is a reason about the problem, not the implementation.

Unsupervised clustering optimises for pixel-space similarity: it groups pixels that look alike. But a fracture is not defined by pixel similarity — it is defined by geological semantics and geometry. A sinusoid is "one fracture" because it is one planar surface intersecting the borehole, even though its pixels vary in brightness, even though it is crossed by other fractures, even though it momentarily disappears behind a tool artifact. Two crossing conjugate fractures are two objects to a geologist and one tangled blob to a density cluster. Clustering has no concept of "object," no concept of "count," and no way to regress the continuous physical parameters — depth, dip, azimuth — that a petrophysicist actually reports. It can find texture. It cannot find geology.

That gap is not closed by better hyperparameters. It is closed by changing the learning paradigm: give the model examples of what a geologist means by "a fracture" and let it learn the mapping from pixels to parameterised objects. That is supervised object detection — and it is exactly the bridge the project crossed at the Phase 1 to Phase 2 boundary.

Pilots don't stall because the model is weak. The working model is only ~15% of the journey; the other ~85% is a six-layer engineering stack (HPC → Data engineering → Data unification → AI/ML → Agents → Platform/deployment), and a project ships only when every layer below the model is built to production grade. Drag the build line up the load-bearing column: with all six built the model reaches the production ceiling; with any gap below it the model detaches into POC purgatory — the ~50% that never ship. The ~15%/~85% split, the six layers and the ~50% figure are the whitepaper's own; the equal-sixths column sizing is schematic.

The pivot: supervised set prediction with a Detection Transformer

Phase 2 reframed sinusoid picking as supervised set prediction and built the fracture model — internally, GeoBFDT — on the Detection Transformer (DETR) architecture: a ResNet backbone feeding a transformer encoder–decoder that emits a fixed-size set of object queries, each resolving to one candidate sinusoid with its own class, depth, dip, and azimuth. Crucially, DETR's bipartite (Hungarian) matching loss does the one thing every unsupervised method could not — it lets the model count. The number of real fractures per patch is learned, not assumed; surplus queries are trained to predict "no object," so a patch with three sinusoids and a patch with zero pass through identical machinery and come out correctly populated. The k problem that sank hierarchical clustering simply does not exist when the model learns object cardinality end-to-end.

The difference is not subtle, and the ablation makes it concrete. Sweeping the number of labelled training wells, classification error falls off a cliff as geology accumulates: with only 3 wells the model is essentially guessing at 93.1% error; by 9 wells it is 1.06%; the full 14-well fractures-only model lands at a 2.54% classification error. There is no unsupervised analogue to that curve — clustering does not get better as you feed it more wells, because it never learns what a fracture is. Supervised DETR does, and the only ingredient that buys is labelled data, which is precisely what an unsupervised Phase 1 was right to try to avoid — and right to abandon once the geology proved it could not.

This is the engineering lesson worth carrying out of the engagement: the cost of an unsupervised attempt is not wasted when it is run as a genuine experiment with a falsifiable exit. Phase 1 paid for itself by proving — across more than 10,000 DBSCAN configurations, a memory-bound hierarchical implementation, and a class-capped CNN segmenter — that the label-free hypothesis could not be made to work, which is exactly the evidence you need to justify the annotation budget that supervised detection demands. The pivot was not a retreat. It was the result of the experiment.

Key takeaways

Sinusoid picking looks unsupervised — fractures are self-similar pixel ribbons — so clustering is the correct first hypothesis to test. In this engagement it was tested seriously and falsified.
DBSCAN was unworkable: with each image-log pixel embedded as (X∈[0,359], Y∈[0,image size], Z∈[0,255]), more than 10,000 eps/min_sample combinations still produced 'very unstable' segmentations because carbonate image-log texture drifts down the well, so no global hyperparameter generalises.
Hierarchical clustering failed on two fundamentals: it needs the cluster count k chosen in advance (but the per-patch sinusoid count is the unknown — wrong k breaks one sinusoid into multiple regions), and it is quadratic in pixels, running out of memory even on a 50 GB Google Colab machine. CNN-based unsupervised segmentation capped at 2–3 classes — enough to highlight features, not separate and parameterise them.
All three fail for one reason: clustering optimises pixel-space similarity, but a fracture is defined by geological semantics and geometry. Clustering has no concept of object, count, or continuous dip/azimuth parameters. That is a paradigm gap, not a tuning gap — which is why 10,000+ DBSCAN runs could not close it.
The pivot to a supervised Detection Transformer (GeoBFDT) closed the gap by learning object cardinality and physical parameters end-to-end via Hungarian set-prediction. Classification error fell from 93.1% (3 wells) to 1.06% (9 wells) to 2.54% (14 wells) — a curve clustering has no analogue to. Running the unsupervised phase as a falsifiable experiment is what justified the labelling budget supervised detection required.

Why DBSCAN and Hierarchical Clustering Couldn't Separate Sinusoids — and Supervised DETR Could

Framing the image as points: the DBSCAN setup

Hierarchical clustering: the count you cannot know, and the memory you do not have

Why these are not tuning failures

The pivot: supervised set prediction with a Detection Transformer

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on