Skip to main content

Case Study

Binary to Multiclass: Rethinking the Segmentation Target

We spent weeks pushing a curve segmenter that would not push. The target was a stack of independent binary masks, one per curve, each scored on its own, and the best of them stalled at an F1 of 0.55 while the other two sat at 0.37 and 0.26. The wall was not the optimiser or the data volume. It was the shape of the label. This is our account of the reformulation that broke the plateau: replacing three separate binary masks with a single three-class target so that every pixel belonged to exactly one class, the curves competed for it, and the boundary between two traces became something the loss could hold instead of ignore.

Case study

The segmenter had stopped improving and we had run out of things to blame. VeerNet was reading scanned well logs, tracing the curves off the paper into vectors, and on the curve-heavy sheets it had settled onto a number and refused to move. The best of the curve masks scored an F1 of 0.55. The other two sat at 0.37 and 0.26. We had cycled the optimiser, reweighted the loss, and grown the training set, and the plateau held through all of it. The mistake we eventually found was not in any of those knobs. It was in the shape of the thing we were asking the network to predict.

We wrote up the diagnosis of that plateau separately, in the companion account of how the recall-precision signature revealed the binary framing to be the wrong problem statement, and we will not re-run the metric autopsy here. This piece takes the diagnosis as settled and asks the next question: given that the target shape was the defect, what is it about the shape that a stack of binary masks gets wrong, and what does the fix actually cost. The answer is a story about loss geometry, about who owns the boundary between two traces, and about the training budget we spent to buy a target the objective could hold.

The defect was in the geometry of the loss

The first version framed segmentation as a set of independent binary decisions. For a sheet with two curves we produced three masks, one per class of stroke, and each mask answered a single yes-or-no question at every pixel: is this pixel part of my curve or not. Three masks, three separate sigmoid outputs, three separate binary losses summed together. On paper this is a reasonable place to start, and it matches how a lot of detection pipelines decompose a scene into per-object masks [1]. Reweighting the thin foreground bought recall and capped the masks at 0.55, 0.37, and 0.26, and the companion piece walks through why that trade is forced. What matters here is the geometry underneath the trade.

Three independent binary masks share no rule about who owns a pixel. Each mask lives in its own loss with its own background, and the objective is a sum of three terms that never reference one another. Nothing in that sum says a pixel belongs to at most one curve. Two masks can both light up on the same stroke, and because each term is computed as if the other mask did not exist, neither is charged for the overlap. The loss surface has no ridge separating curve one from curve two, because the two never appear in the same expression. They are parallel problems that happen to share a backbone, not a single problem with a shared constraint.

That absent constraint is not a tuning gap, it is a missing term. You cannot reweight your way to a rule the loss does not contain. The place where two curves run close together, cross, or nearly touch, which is exactly the place that matters for reading a log correctly, is precisely where the summed binary objective is silent. Where the traces separate cleanly the three masks agree by default and the framing looks fine. Where they collide, the framing has nothing to say, and the collision is the entire job.

One question with three answers

The reformulation was to stop predicting a stack of independent masks and predict a single target instead. Rather than three binary maps, the network now produces one map with three mutually exclusive classes: background, curve one, curve two. Every pixel is assigned to exactly one of them through a softmax, and the loss scores the whole assignment at once rather than three yes-or-no calls in isolation. This is the ordinary multiclass framing of dense prediction, the same target shape that fully convolutional segmentation and the encoder-decoder architectures built on it assume [1] [2], and moving to it changed the objective in a way that no amount of reweighting on the binary side could.

The change is that the classes now compete. Because a pixel can go to only one class, giving it to curve one is the same act as taking it away from curve two and from background, and the loss feels all three consequences of a single decision. The boundary between two adjacent traces stops being invisible and becomes a thing the objective has a stake in, because putting the boundary in the wrong place is now a cost paid on both sides at once. That is the property the independent masks never had and could not be tuned into having. The imbalance did not vanish, and we still leaned on region-overlap losses of the Dice and Tversky family, and on focal weighting, to keep the thin foreground classes from being swamped [3] [4]. But those tools were now working with the grain of the target instead of against it: they were shaping how a single coherent assignment traded errors, not trying to reconcile three separate assignments after the fact.

The exhibit below is the whole argument in one frame. On the left, the three binary masks sit under the plateau line at 0.55, and no mask clears it. On the right is the reframed target: one column split into three exclusive classes, one softmax answering one question. Toggle the plateau line and confirm for yourself that under the old framing the ceiling was real.

SEGMENTATION TARGET DESIGN · WHY THE F1 STOPPED MOVING0.55best binary mask F1 - the plateauThree masks scored apart hit a wall; one three-class target let the curves separateBINARY REGIME · THREE MASKS, EACH SCORED ALONE0.00.20.40.6F10.55best mask0.37second0.26thirdplateau 0.55No mask clears the line. Scored apart, the masksshare no rule for who owns a pixel.MULTICLASS REGIME · ONE TARGET, 3 EXCLUSIVE CLASSESbackgroundcurve-1curve-2Every pixel gets exactly one class, so the twocurves compete and stay separated.WHAT THE REFRAMED TARGET COST TO TRAIN15,000multiclass instances10 hrtrain time @ 50 epochs3 classesone exclusive targetplateau line: showntoggle to confirm no binary mask clears 0.55sourced: binary F1 0.55 / 0.37 / 0.26; 3-class single target; 15,000 instances; 10 hr at 50 epochs. no illustrative inputs.
The segmentation target, redrawn. LEFT: the binary regime scored three curve masks independently, and the best of them reached F1 0.55 while the other two sat at 0.37 and 0.26. The orange line is the plateau at that best mask, and it is the only element that argues: no mask clears it, because masks scored apart share no rule for which curve owns a given pixel. RIGHT: the reframed target is a single softmax with three mutually exclusive classes (background, curve-1, curve-2), drawn as one column partitioned three ways. Because every pixel now belongs to exactly one class, the two curves compete for each pixel and the boundary between them becomes something the loss can hold, which is what let the scores separate and stay separated rather than stall. Beneath sits the training cost of that regime: 15,000 multiclass instances trained for 10 hours across 50 epochs. Every plotted number traces to the engagement archive; there are no illustrative inputs. Toggle the plateau line to confirm no binary mask ever crosses 0.55.

What it cost and what it bought

Nothing about this was free, and the ledger is worth reading in full. The binary regime trained on two thousand instances and finished fifty epochs in about two hours. The single three-class target ran on fifteen thousand instances and took about ten hours for the same fifty epochs. That is seven and a half times the data and five times the wall-clock, and the two figures move together because the reformulation is what makes the larger corpus worth having: separating one curve from another, rather than each curve from background alone, needs far more crossings and near-misses in the training set for the objective to learn the seam. We paid that willingly, because the binary plateau was not a budget problem and could not be spent out of. More epochs and more data on the old target shape would have bought a slightly better version of the same stalled model. The reformulation bought a different model, and the cost line is the price of a target the loss could actually hold.

What we got for the cost was separation that stayed separated. Under the old framing, improvement on one mask tended to come at the expense of another, because the masks were coupled only through the shared backbone and not through the loss, so there was no consistent pressure holding the curves apart where they met. Under the single target, the per-class scores stopped behaving like three unrelated numbers stranded at a wall and started behaving like one system that could be pushed, because the objective finally had a reason to keep curve one and curve two distinct rather than letting both drift toward whatever the shared features found easiest.

Why the target shape is the real design decision

The lesson is not specific to well logs. When a segmentation model plateaus and the usual levers do nothing, the first suspect should be the shape of the label, not the capacity of the network. A target made of independent binary masks quietly encodes an assumption that the classes do not interact, and for anything where the interesting behaviour lives at the boundaries between classes, that assumption is false in the one place it matters most. Curves in a well log interact constantly. They cross, they crowd, they run parallel a hair apart, and the entire value of the digitiser is getting those exact situations right.

Framing the target as a single multiclass assignment did not add a clever module or a new loss nobody had seen. It changed what question the model was being asked, from three separate calls that could not see each other into one call that had to be internally consistent. The plateau at 0.55 was the sound of three answers that never had to agree. Once the target forced them to be one answer, the wall stopped being a wall.

Limitations

This account describes a target reformulation on one operator's raster-log archive, and the specific numbers, the 0.55 plateau on the best binary mask and the 0.37 and 0.26 on the others, are from that engagement and should not be read as benchmarks for segmentation in general. The multiclass framing here used three classes because the sheets in scope carried at most two curves plus background; a sheet with more overlapping traces would need more classes and would not necessarily behave the same way, since mutual exclusivity gets harder to satisfy as classes crowd. The training figures, two thousand instances at roughly two hours for the binary regime and fifteen thousand instances at roughly ten hours for the multiclass regime, both across fifty epochs, reflect our data and hardware at the time and are not a claim about what the reformulation costs anywhere else. Finally, the point of this piece is the target shape, not the surrounding pipeline. Reassembly of multi-page scans, curve tracing from the mask, and depth indexing are separate stages with their own failure modes, and moving from binary to multiclass fixed the plateau in the segmenter without touching any of them.

References

  1. Long, J., Shelhamer, E., and Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. CVPR 2015. https://arxiv.org/abs/1411.4038

  2. Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI 2015. https://arxiv.org/abs/1505.04597

  3. Milletari, F., Navab, N., and Ahmadi, S. (2016). V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. 3DV 2016. https://arxiv.org/abs/1606.04797

  4. Lin, T., Goyal, P., Girshick, R., He, K., and Dollar, P. (2017). Focal Loss for Dense Object Detection. ICCV 2017. https://arxiv.org/abs/1708.02002

Go to Top

© 2026 Copyright. Earthscan