Skip to main content

Blog

Mask to Centreline: The Geometry of Reading a Curve's Position

A segmentation network does not hand you a curve. It hands you a probability band a few pixels wide at every depth row, and the deliverable a petrophysicist actually reads is one horizontal pixel coordinate per row. This piece is about the reduction in between: how a thick predicted mask collapses to a single position per depth, why argmax-of-column with a local centroid refinement is the right operator for a one-to-three-pixel ink trace, and why per-column position resolution, not how much of the mask overlaps the truth, is the thing that decides whether the recovered curve is usable. The sharpest evidence is a number that looks like failure and is not: a peak intersection-over-union of 0.51 on the multiclass set sitting next to a position error of 0.0277 on curve 1.

EarthScan insight

There is a quiet assumption buried in most segmentation write-ups for line and curve extraction, and it is worth dragging into the light before anything else. The assumption is that a good mask is the goal. For a well log it is not. The petrophysicist downstream does not consume a mask; they consume a curve, which is to say one horizontal pixel coordinate for every depth row on the scan. A mask is a band, two or three pixels thick on a clean trace and fatter than that wherever the network hedged. Somewhere between the band the network predicts and the coordinate the geologist reads, a reduction has to happen. That reduction is the actual product of the pipeline, and it is where the interesting geometry lives.

This piece is about that step, the collapse from a thick predicted mask to a single position per depth, in the context of VeerNet, the encoder-decoder EarthScan built to digitise raster well logs from scanned paper. The reason it deserves its own treatment is that the step is usually skipped. The literature on dense prediction is enormous and excellent, and we will lean on it, but it overwhelmingly stops at the mask and reports overlap. The half-step from mask to centreline is treated as plumbing. On a one-to-three-pixel ink trace it is not plumbing. It is the part of the system that determines whether the overlap number you are proud of, or ashamed of, translates into a curve anyone can use.

What a segmentation head actually emits

A convolutional segmentation network, whether the classic fully convolutional design [2] or the encoder-decoder with skip connections that became the default for thin structures [1], does not emit a line. It emits a per-pixel probability map, one channel per class. For our multiclass setting that means three channels, background and two curves, and at every pixel a softmax over them. Run it on a column of the scan at a fixed depth and you get a little profile: probability rising as you cross into the predicted trace, peaking somewhere near where the ink is, falling away on the other side. The width of that profile is the width of the band. Its shape is the network's confidence about exactly where the curve sits in that row.

Nothing about that map is a coordinate yet. The map is a field of beliefs over a two-dimensional grid, and the curve is a one-dimensional object embedded in it. Recovering the curve means choosing, for each depth row, the single column the curve passes through. The naive answer, take every pixel the network labels as foreground, gives you back the band, not the line. You have to pick.

The dense probability map is the input to the reduction, and the quality of the reduction is a separate question from the quality of the map. You can have a fuzzy, fat, low-overlap map and still read a clean position out of it, or a crisp map and read a wobbly one, depending entirely on how you do the picking. That separation is the whole reason the half-step earns a discussion.

Argmax is the floor, and a centroid is the polish

The simplest defensible reduction is argmax along the column: for a depth row, take the column with the highest foreground probability and call that the curve's position. This is exactly the convention the pose-estimation community settled on for turning a predicted heatmap into a keypoint, where the location of a joint is read off as the argmax of its heatmap [3]. It is robust, it is cheap, and it has one obvious flaw on our problem. Argmax is quantised to the pixel grid. The true curve does not respect pixel boundaries; it can pass at a sub-pixel position between two columns, and the argmax can only ever return one of them. On a log that is sampled and rescaled downstream, that quantisation noise accumulates.

The fix is to refine the argmax with a local probability-weighted centroid: take the argmax column, look at a small window of columns around it, and compute the centre of mass of the probability in that window. The result lands between pixels, tracking the sub-pixel position the band's mass actually implies. This is the same instinct that motivated soft-argmax in pose regression, where the hard argmax is replaced by a probability-weighted expectation over positions so the output is differentiable and continuous rather than snapped to the grid [4]. We are not training through it the way that work does; we apply it as a deterministic post-step. The shared idea is that the band carries sub-pixel information in the distribution of its mass, and a centroid reads that information where a hard maximum throws it away.

There is older lineage here too, from outside deep learning entirely. Classical curvilinear-structure detection extracted a line's centre at sub-pixel precision from the curvature of the intensity profile across it, fitting the position from the second-derivative response rather than thresholding [5]. The deep-learning band is softer and learned rather than hand-derived, but the geometric problem it solves is identical: a thick response has to be collapsed to its ridge, and the ridge sits where the profile's structure says it does, not at an arbitrary pixel.

MASK BAND PER ROW, RESOLVED TO ONE POSITION0.51peak multiclass IoU on this setPOOR OVERLAP, CLEAN CENTRELINEWalk down the depth rows; watch a thick band collapse to one positionArgmax of the column profile, plus a local centroid, gives one x.predicted banddepth rowsscan columns (horizontal pixel position)shallowdeeprow 6Column probability profile for row 6argmax in orangeBand is 12 columns wide here; the reduction still lands within2.3 columns of the truth tick. Overlap is poor, position is not.Position error on the recovered curvemean absolute error of the resolved position, lower is better0.000.070.140.0277curve 10.1241curve 2Tversky loss, multiclass; the deliverable metric, not IoUPosition MAE 0.0277 (curve 1) and 0.1241 (curve 2) and peak IoU 0.51 are sourced; the band geometry & per-row profiles are illustrative
A thick predicted mask is not a curve. The segmentation head emits a probability band a few columns wide at every depth row, and the deliverable a petrophysicist needs is a single horizontal position per row. Walk the slider down the depth rows: for the selected row the band view shows the predicted band (teal), the illustrative ground-truth column (the pale tick), and the one position the reduction lands on (orange marker), found by taking the argmax of the per-column probability profile and refining it with a local probability-weighted centroid. The dashed orange line is the resolved centreline through every row, which is the thing that ships. The strip below the slider is the column probability profile for the selected row, with the argmax column in orange. The right-hand panel reads the consequence on the metric that matters: the mean absolute error of the recovered position, measured at 0.0277 for curve 1 and 0.1241 for curve 2 under the Tversky loss on the multiclass set, where the peak intersection-over-union was only 0.51. That pairing is the whole argument: an IoU of 0.51 means band and truth overlap poorly, yet the per-row position error stays small, because what fixes a curve's position is where the band's probability mass sits, not how much of the band overlaps. The two MAE values and the IoU figure are sourced from the engagement archive; the band geometry, the per-row profiles and the row count are illustrative.

Why overlap is the wrong yardstick for this band

The instrument above is built to make one comparison concrete, because it is the comparison that surprises people. Intersection-over-union, the overlap between the predicted band and the ground-truth band, was a sober 0.51 at its peak on our multiclass set. Read as a segmentation grade that is poor. Half the union of the two bands is not shared. If you stopped at the mask and reported that number you would conclude the model had failed.

It had not, and the reason is geometric. IoU on a thin structure is brutally sensitive to a defect that does not move the curve's position at all. A predicted band that is one pixel too fat on each side, or shifted by a single pixel, loses a large fraction of its overlap because the foreground is so small that every pixel is a large share of the union. Walk the slider down the rows in the instrument and watch it directly: the band overlaps the ground-truth tick poorly, yet the resolved position, the argmax refined by the local centroid, lands within a column or so of the truth. The error that ruins IoU is band thickness and band wobble. The quantity that determines a usable curve is where the band's centre of mass sits. Those are different quantities, and on a one-pixel trace they come apart violently.

That is why the metric we actually live or die by is not overlap but position error: the mean absolute error of the recovered position against the reference curve. Under the Tversky-trained model on the multiclass set, that error was 0.0277 on curve 1 and 0.1241 on curve 2, on a normalised scale where the full plotted range is one. A position error under three percent of the track width, sitting next to an IoU of 0.51, is the entire argument of this piece in two numbers. The mask looks mediocre and the centreline is clean, because the reduction reads position from mass and is largely indifferent to the band fat that the overlap metric punishes.

Where the second curve gives the geometry away

The gap between the two curves is itself instructive, and it is honest to dwell on it rather than average it away. Curve 1 resolved to 0.0277 and curve 2 to 0.1241, more than four times worse on the same reduction with the same model. Nothing about the argmax-plus-centroid operator changed between them. What changed is the band. Where two curves run close together, cross, or where one is fainter ink on the original scan, the predicted profile for the weaker curve broadens and can develop a second, competing lobe from the neighbour. Argmax then occasionally locks onto the wrong lobe, and the local centroid, computed in a window that now contains mass from both, gets dragged toward the interloper. The position error is not telling you the reduction is bad. It is telling you the band for curve 2 is ambiguous in exactly the rows where two traces fight for the same columns, and the reduction faithfully reports that ambiguity instead of hiding it.

This is the useful read of a per-curve position metric that a single overlap number cannot give you. IoU collapses both curves into one overlap figure and loses the fact that one curve is essentially solved and the other is solved everywhere except the contested rows. The position error, split per curve, points you straight at the failure mode worth fixing, which is band separation where curves converge, not the reduction operator and not the segmentation in the easy rows.

The half-step deserves its own engineering

This work changed how we build these pipelines in one concrete way: we stopped treating the mask-to-curve reduction as an afterthought to be bolted on once the segmentation was tuned. The reduction has its own design space, its own failure modes, and its own metric, and it is the stage closest to the thing the customer reads. A network that produces a fat, low-overlap band can still yield a production-grade curve if the reduction reads position from the band's mass with sub-pixel care; a network that produces a tighter band can still yield a wobbly curve if the reduction snaps to the grid and gets confused where curves converge. Tuning the segmentation to chase a higher IoU, when the IoU ceiling on a one-pixel trace is structurally low, is effort spent on the wrong number. The leverage was in the half-step, and naming it as a first-class stage with its own argmax-plus-centroid reduction and its own per-curve position error is what let us trust a 0.51 overlap and ship the curve underneath it.

What the geometry forced us to change

  1. A segmentation head emits a dense per-pixel probability map, not a curve. The deliverable a petrophysicist reads is one horizontal pixel position per depth row, so a separate reduction step has to collapse a multi-pixel predicted band to a single coordinate; that reduction is the actual product of the pipeline, not plumbing.
  2. The right operator for a one-to-three-pixel ink trace is argmax along the column for robustness, refined by a local probability-weighted centroid for sub-pixel position. Argmax is quantised to the grid; the centroid reads the sub-pixel position the band's mass implies, the same instinct behind soft-argmax in pose regression and classical sub-pixel ridge extraction.
  3. Intersection-over-union is the wrong yardstick for a thin band. Peak IoU was only 0.51 on the multiclass set, which reads as failure, yet the recovered position error was 0.0277 on curve 1, under three percent of the track width. IoU punishes band thickness and one-pixel shift; position depends on where the band's centre of mass sits, and on a one-pixel trace those quantities come apart.
  4. Per-curve position error exposes what a single overlap number hides. Curve 1 at 0.0277 versus curve 2 at 0.1241 on the same reduction means curve 2's band is ambiguous in the rows where two traces converge, dragging the argmax and centroid toward a competing lobe; the metric points at band separation as the failure to fix, not the reduction operator.
  5. Chasing a higher IoU is chasing the wrong number when the overlap ceiling on a one-pixel trace is structurally low. The leverage is in the mask-to-curve reduction, which deserves to be engineered as a first-class stage with its own operator and its own per-curve position metric.

References

[1] Ronneberger, O., Fischer, P., and Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI (2015). The encoder-decoder with skip connections that became the default for segmenting thin structures. https://arxiv.org/abs/1505.04597

[2] Long, J., Shelhamer, E., and Darrell, T. Fully Convolutional Networks for Semantic Segmentation. CVPR (2015). The dense per-pixel prediction formulation a segmentation head inherits. https://arxiv.org/abs/1411.4038

[3] Newell, A., Yang, K., and Deng, J. Stacked Hourglass Networks for Human Pose Estimation. ECCV (2016). The heatmap-argmax convention for turning a dense map into a single coordinate. https://arxiv.org/abs/1603.06937

[4] Sun, X., Xiao, B., Wei, F., Liang, S., and Wei, Y. Integral Human Pose Regression. ECCV (2018). Soft-argmax as a differentiable, sub-pixel replacement for the hard argmax over a heatmap. https://arxiv.org/abs/1711.08229

[5] Steger, C. An Unbiased Detector of Curvilinear Structures. IEEE TPAMI (1998). Sub-pixel centreline extraction from the second-derivative response of a line profile. https://ieeexplore.ieee.org/document/659930

Go to Top

© 2026 Copyright. Earthscan