Eleven Classes in the Field, Three in the Model: Designing a Label Schema From an Operator's Taxonomy

The interpretation report we were trying to replace ran to forty-two pages for a single vertical well. Somewhere past the speed-correction QC and the histogram-equalisation notes sits a legend: eleven dip categories, each with its own colour-coded tadpole. Bedding, Deformed Bedding, Erosional Surface, then the fracture families the operator's geologists were mandated to distinguish by hand, CCF, DCF, CRF, DRF, and the fault and stress classes, RF, CF, PF, Breakout, Induced. Every pick carried one of those eleven labels. That legend is the schema the client paid for, and the obvious first instinct on a machine-learning team is to inherit it: match the taxonomy the domain expert uses, and the model speaks the operator's language.

We did not inherit it. The model we shipped trained on four classes, and the reason is arithmetic, not opinion.

Reading the taxonomy off a single well

Take the census the vendor report gives for one well. Across three carbonate formations the interpreter placed 910 bedding dips, split 164, 248 and 498. That is a healthy population; bedding is everywhere, and a bedding class trains fine. The fracture census is where the schema starts to break. In that same well the report logs one continuous conductive fracture, 335 discontinuous conductive fractures, three continuous resistive fractures, and 67 discontinuous resistive fractures, 406 fractures in total. Then nineteen faults: seven resistive, nine conductive, three possible.

Look at the two continuity extremes. One continuous conductive fracture. Three continuous resistive fractures. Those are single-digit populations in the well that carries the densest human labelling in the whole dataset, and we had only fourteen vertical wells, eleven usable for the combined bedding-plus-fracture models. A category producing one example per well produces a handful across the programme. You cannot hold out a test well for it, you cannot augment your way to a decision boundary, and a detector that has seen a class four times will memorise those four crops and predict it almost nowhere.

The rule underneath is unglamorous. Fourteen wells cannot support an eleven-way taxonomy: the number of trainable classes is set by the rarest class you insist on keeping, and if you keep continuous resistive fractures, the rarest class has three members. The nomenclature was designed for a human reading one well at a time, where telling a continuous resistive fracture from a discontinuous one is a matter of eyesight and costs nothing. Ported to a budget of fourteen wells, the same distinction is a class-imbalance trap.

The fold that made the schema trainable

So we collapsed. The eleven field categories fold into four classes the data can actually populate.

Bedding, Deformed Bedding and Erosional Surface become one bedding class, backed by those 910 dips. The four fracture categories fold along one axis rather than two: we keep the conductive-versus-resistive split, which is physically meaningful and separately populated, and drop the continuity split that produced the single-digit counts. Continuous and discontinuous conductive fractures become one conductive-fracture class with 336 examples; the resistive pair become one class with 70. Faults and the stress-indicator categories, Breakout and Induced, fold into a fourth class. Four classes, each with a population a network can learn a boundary from, instead of eleven with a long tail that exists on paper and nowhere in the gradient.

The instrument below is that collapse drawn as a map. Left column: the eleven mandated categories with the one-well census beside each. Right column: the four trainable classes. The strands carry each category into its destination, sized by count, and the two that sit below any sensible trainability floor, the continuous conductive fracture at one and the continuous resistive at three, are the elements the exhibit argues with.

The operator's mandated dip nomenclature distinguishes 11 categories; the model trained on four. The left column lists each mandated category with its one-well census count from the vendor borehole-image interpretation report, and each strand folds it into the trainable class on the right, sized by that count. Two fracture categories, one continuous conductive (N=1) and three continuous resistive (N=3), sit far below any trainability floor on a 14-well dataset and are flagged orange: inheriting them as standalone classes would have starved the model. Folding continuous and discontinuous fractures of the same conductivity into one class pools 336 conductive and 70 resistive examples instead. The toggle reveals what each fold recovers after the fact: conductive versus resistive is readable from image intensity, continuous versus discontinuous from trace completeness, so the distinctions the geologist drew by hand are recoverable from the pixels rather than permanently lost. Every census count is sourced from the vendor report; the trainability floor is an editorial reading of that census, not a fixed threshold.

The map makes the imbalance impossible to wave away. The bedding strand is a rope; the continuous-resistive strand is a thread. If those threads had stayed as standalone labels, the classification head would have carried an output for a class it saw three times, the loss would have ignored it, and every metric we reported would have quietly been an average over classes that were never learnable in the first place. Collapsing is not a compromise on fidelity. It is the schema that lets the fidelity numbers mean something.

What the fold actually costs

The honest question a reviewer asks next is what we threw away. The operator's geologists distinguished continuous from discontinuous fractures, and conductive from resistive, for real reasons: those axes carry information about connectivity and fill. If the model only ever emits four classes, is that information gone? Mostly it is not, and that is what made the collapse defensible rather than lossy. Both axes the fold appears to erase are recoverable from the image after detection, without a per-class label.

Conductive versus resistive is a property of intensity. On a resistivity image, conductive features read dark and resistive features read bright; that is the physics of the measurement, not a labelling convention. Once the model has localised a fracture as a sinusoid, the mean intensity along that trace tells you which side of the split it sits on. We kept that axis as a class because it was cheaply separable and separately populated, but even had we not, it was recoverable from the pixels the sinusoid runs through.

Continuity is a property of trace completeness. A continuous fracture draws a full sinusoid; a discontinuous one draws a broken arc with gaps. The fraction of the azimuthal sweep that carries signal measures that directly off the fitted trace. We folded continuity out of the label schema precisely because it did not need to be a label; it is readable from the geometry the detector already produces.

Breakout and induced fractures are the one genuinely different case, and they fold cleanly for another reason: they are not sinusoids. Both run parallel to the borehole axis rather than crossing it, so they do not fit the sinusoid form the picker regresses. Keeping them out of the fracture classes is correct on its own terms, not a data-budget concession.

So the fold from eleven to four is reversible where it matters. The two fracture axes the geologist drew by hand survive as post-hoc reads on intensity and trace completeness; the stress indicators leave the sinusoid vocabulary because they were never sinusoids. What remains is a class schema sized to the data, with the fine distinctions available on demand rather than baked into labels too sparse to train.

Why inheriting the nomenclature would have sunk the model

The failure mode we avoided is a common one. Had we shipped the eleven-class taxonomy verbatim, the model would have looked reasonable in code and failed in the way that is hardest to catch. Training would run and the loss would fall. The confusion matrix would show near-perfect performance on bedding and the two populous fracture classes, and empty or noisy rows for the four rare ones. Averaged, the headline accuracy would still read well, because the rare classes barely move a mean. A reviewer skimming the top-line number would sign it off, and the first time the model met a continuous resistive fracture that mattered, it would have nothing to say.

The schema decides which questions the model is even allowed to get wrong. Choosing four classes over eleven was not a loss of ambition; it was refusing to claim a resolution the fourteen-well dataset could not support. The operator's nomenclature was the right tool for a human interpreting one well for a fee, and the wrong target vocabulary for a model that has to generalise across a field from a few hundred fractures. The census made that unarguable before we wrote a line of training code.

Limitations

The census counts anchoring this piece come from one vertical well's interpretation report, the most densely labelled well in the programme; other wells carry different and generally sparser fracture populations, so the single-well numbers illustrate the imbalance rather than define the whole-dataset distribution. The trainability floor drawn in the instrument is an editorial reading of that census, not a fixed statistical threshold; the true minimum viable class size depends on augmentation, the choice of ResNet backbone, and how much cross-well transfer you can rely on. The claim that conductive-versus-resistive and continuity are recoverable post-hoc holds where the image quality supports an intensity read and the trace is cleanly fitted; on corrupted or low-dynamic-range logs, where pixel values arrive quantised, neither read is reliable and the fold does lose information until the log is repaired. Finally, the four-class schema is specific to this dataset size and this taxonomy; a larger multi-well programme could justify a finer vocabulary, and the right number of classes should always be recomputed against the census, not inherited from ours any more than from the operator's.

References

[1] D. V. Lofts and P. Bourke, "The recognition of artefacts from acoustic and resistivity borehole imaging devices," Geological Society, London, Special Publications, vol. 159, 1999. https://www.lyellcollection.org/

Eleven Classes in the Field, Three in the Model: Designing a Label Schema From an Operator's Taxonomy

Reading the taxonomy off a single well

The fold that made the schema trainable

What the fold actually costs

Why inheriting the nomenclature would have sunk the model

Limitations

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on