What Curve Digitisation Taught Us About Every Chart-Extraction Problem

For most of an engagement we described our work, to ourselves and to the operator, as well-log digitisation. The brief was concrete: take an onshore operator's archive of scanned raster well logs, the kind that live as flat image files instead of numeric curves, and turn the plotted lines back into depth-indexed data a petrophysicist can actually compute on. We built a system for it, we called the architecture VeerNet, and we trained it on synthetic logs plus public scans, including the large public corpus of imaged logs that NeuraLog and the Texas Railroad Commission archive have made available to anyone who asks. It worked. The thing we did not expect is that, partway through, the problem stopped looking like a well-log problem at all.

What we were really doing was lifting a plotted curve off a picture and reconstructing the numbers behind it. A well log is one instance of that. So is a scanned scientific figure, a photographed stock chart, a fever curve on a paper medical record, a topographic contour, an ECG strip, a sensor trace someone printed in 2003 and threw in a filing cabinet. The shapes differ, the axes differ, the failure modes differ, but the spine of the pipeline is the same. This is a look back at that spine: the four steps our well-log work decomposed into, and an honest account of which of our results carry to any raster-to-data task and which are specific to the well log we happened to be reading.

The pipeline, factored into four moves

Strip away the petrophysics and the VeerNet plumbing and what is left is four operations, performed in order, each feeding the next.

The first is segmentation: decide, for every pixel, whether it belongs to a curve or to the background of grid lines, labels, smudges, and paper texture. This is the step where the learning lives, and it is where the field's modern lineage runs straight through the encoder-decoder with skip connections [6]. The second is mask-to-centreline reduction: a segmentation mask is a blob a few pixels thick, but a curve is a function, one value per horizontal position, so the blob has to collapse to a single trace. The third is axis indexing: a centreline in pixel coordinates is meaningless until it is anchored to the plot's axes, which for a well log means a depth axis down the page and a measurement scale across it. The fourth is validation on a shared grid: a reconstructed curve and a reference curve almost never land on the same sample positions, so before you can score one against the other you have to resample both onto a common set of points.

We did not invent that decomposition. The classical raster-to-vector world has had a version of it for decades, from Hough's parameter-space line voting [1] through the polar parameterisation that made it practical [2], through the thinning algorithms that reduce a stroke to a one-pixel skeleton [3]. The map-processing community wrote down essentially the same factoring for scanned maps [5]. What changed under our hands was which rung the difficulty migrated to once a learned segmenter replaced the hand-tuned front end, and that migration is the most transferable thing we learned.

Rung one: segmentation is where the metric lies to you

Here is the result that surprised us most and that we now expect to see on any chart-extraction problem. The segmentation rung, measured the obvious way, looks weak. Our best curve-mask intersection-over-union peaked at 0.51. On a natural-image benchmark a 0.51 IoU would be a mediocre result and you would go back and retrain. On a hairline curve it is something else entirely.

The reason is geometry, not modelling. A well-log curve is roughly one pixel wide. IoU is the area of the predicted-and-true overlap divided by the area of the predicted-or-true union, and when the true region is a near-zero-area filament, being off by a single pixel laterally drops the overlap toward zero even though the curve is, to any human eye, correctly traced. The metric is punishing the model for a positional error so small it has no consequence downstream. We watched a mask that looked perfect against the paper post an IoU that, read cold, would have failed a review.

That is the first lesson, and it travels completely: on thin-structure extraction, pixel-overlap scores understate the model and you must not optimise them directly. The recipe transfers to any chart with thin plotted lines. The specific number, 0.51, does not; it is a property of how thin our particular curves were and how the union shrinks around them. Two teams extracting different chart styles will get different IoUs from equally good models, purely because of line width. Reporting the IoU without that caveat is how good chart extractors get killed in review for the wrong reason.

Rung two: the most portable step is the dullest one

Reducing a thick mask to one value per column is the rung that looks least glamorous and travels best. We took, for each horizontal position, the vertical location the model was most confident was curve, which for a single-curve track is close to a per-column argmax over the mask, and for multi-curve tracks becomes a per-class version of the same idea. The output is a clean function: one depth, one value, no thickness.

Nothing about that operation is well-log-specific. It is the learned descendant of the thinning algorithms the vectorization literature has argued about for forty years, where the open question was whether to thin the mask before tracing it or trace the thick mask directly [4]. We landed on the soft version, reduce by confidence rather than by morphological thinning, and it carries to any raster chart whose curves are functions of the horizontal axis. The one place it does not carry is to charts where a line doubles back on itself, a closed contour or a phase-space loop, where there is no single value per column to take. For monotone-in-x plots, which covers most scientific and instrument charts, this rung is essentially free transfer.

Rung three: the axis is where the domain sneaks back in

The third rung is where the general problem stops being general. A centreline in pixels is just a shape; to recover numbers you have to know what the axes mean. For a well log this is unusually kind: the depth axis is monotone, runs top to bottom, and is the same physical quantity on every log in the archive. We could lean on that structure hard, and we did.

That kindness is exactly the part that does not transfer. A general raster chart can have a logarithmic axis, a date axis with irregular ticks, a broken axis, two different y-axes on the same frame, or a legend that has to be read before any line can be attributed to a series. Hough's old line-voting machinery [2] and the digital-map surveys [5] both spend most of their effort precisely here, on georeferencing and axis recovery, because that is where the world's irregularity lives. Our depth-indexing code is competent and almost useless to a team reading a financial chart, because their axis problem is harder and shaped nothing like ours. When we abstract our pipeline, this is the rung we tell people to budget the most for and reuse the least.

Rung four: validate on a grid both curves agree on

The last rung is the one we are most confident hands over wholesale, and it is also where our headline accuracy numbers come from. A predicted curve and a reference curve are sampled at different depths, so a naive point-by-point comparison is comparing values that do not correspond to the same place. Our fix was deliberately boring: interpolate both curves onto a common set of 300 depth points spanning the overlap, then compute error point-for-point on that shared grid. That single discipline is what made our validation honest, and it is what let us report a peak R-squared of 0.9891 and a lowest mean absolute error of 0.0132 without those numbers being an artefact of lucky sample alignment.

The instrument below lays the four rungs out as a ladder and lets you slide from our specific well-log instance toward any raster chart, watching which rung's result holds its confidence and which falls away. It is the argument of this whole retrospective in one control: the shape-level rungs travel, the domain-level rungs do not.

The raster well-log pipeline factors into four rungs that recur in any chart-extraction problem: segment the ink, reduce each mask to a one-pixel centreline, index that centreline against the plot axes, and validate the reconstructed series on a shared grid. Each rung carries a real result from the engagement: segmentation peaked at 0.51 IoU on a hairline curve, spline validation reached a peak R-squared of 0.9891 and a lowest MAE of 0.0132, and predicted and reference curves were resampled onto 300 shared depth points for an apples-to-apples comparison. Pick a rung to read its number, then drag the generality lever from this well-log instance toward any raster chart. The shape-level rungs (centreline, validation) hold their confidence as you generalise; the rungs that lean on the well log's specifics (pixel overlap on a hairline curve, a clean monotone depth axis) lose confidence first, which is exactly why the recipe travels while some of the numbers stay local. The four numbers and the 300 grid points are sourced from the engagement archive; the generality lever and the per-rung transfer confidences are an illustrative reading aid, not a measured quantity.

The validation discipline transfers to every chart-extraction problem we can think of, because every one of them eventually has to compare a reconstruction to a reference and every one of them has the same sample-misalignment trap. ChartOCR and the general chart-reading systems that surround it [8] make the same move under different names; resampling onto a shared support before scoring is not a well-log idea, it is a measurement idea. The number of grid points is a tuning knob, not a transferable constant, but the rule is iron: never report a reconstruction error computed on mismatched samples, because it will be optimistic and you will not know by how much.

What the abstraction is worth

The practical payoff of seeing our work as four rungs rather than one well-log pipeline is that it tells a new team where to spend. Two of the four rungs, centreline reduction and grid-based validation, are close to free transfer; lift them almost as written. One rung, segmentation, transfers as a recipe but not as a number, and the single most common mistake we now warn against is judging a thin-curve segmenter by its raw IoU. One rung, axis indexing, barely transfers at all and is where each new chart family will demand fresh work. A plan that treats all four as equally reusable will over-invest in the portable steps and get ambushed by the axis.

The classical baseline we replaced, the gridlines-elimination digitiser built specifically for well-log graphs [7], is a useful mirror here. It was excellent at exactly one chart family and embodied its axis assumptions so deeply that it could not be pointed at anything else. Our learned pipeline is the opposite shape: three of its rungs are domain-agnostic and one is domain-bound, which is why it generalises where the classical tool could not, and also why the axis rung is the part we keep rewriting. If we had understood that division of labour at the start, we would have structured the codebase around it on day one. We understand it now because a well log taught it to us, one hairline curve at a time.

Limitations

This retrospective is a generalisation from a single engagement, and generalisations from one case earn only so much trust. The four-rung factoring is our reading of our own pipeline, not a result anyone measured, and a different chart-extraction problem could surface a fifth rung we never needed, legend parsing and series attribution being the obvious candidate for charts more crowded than a well log. The transfer verdicts in the instrument are an illustrative reading aid, a model of how much of a lesson we believe travels, not a quantity we computed; only the printed engagement numbers, the 0.51 IoU, the 0.9891 R-squared, the 0.0132 mean absolute error, and the 300 grid points, are measured. Those numbers themselves come from our own validation regime on our own data and should be read as the ceiling we reached on this archive, not as benchmarks anyone else can expect to match on a different chart family with different line widths and a different axis. Finally, the claim that segmentation IoU understates thin-curve models is well supported by the geometry but is a statement about a metric, not a guarantee about a model; a genuinely bad segmenter will also post a low IoU, and telling the two cases apart still requires looking at the reconstructed curve, not just the score.

References

[1] Hough, P. V. C. Method and Means for Recognizing Complex Patterns. U.S. Patent 3,069,654 (1962). https://patents.google.com/patent/US3069654A/en

[2] Duda, R. O., and Hart, P. E. Use of the Hough Transformation to Detect Lines and Curves in Pictures. Communications of the ACM, 15(1), 11 to 15 (1972). https://dl.acm.org/doi/10.1145/361237.361242

[3] Zhang, T. Y., and Suen, C. Y. A Fast Parallel Algorithm for Thinning Digital Patterns. Communications of the ACM, 27(3), 236 to 239 (1984). https://dl.acm.org/doi/10.1145/357994.358023

[4] Tombre, K., Ah-Soon, C., Dosch, P., Masini, G., and Tabbone, S. Stable and Robust Vectorization: To Thin or not to Thin. Graphics Recognition (GREC), Springer LNCS (2000). https://members.loria.fr/KTombre/tombre-icpr00.pdf

[5] Chiang, Y.-Y., Leyk, S., and Knoblock, C. A. A Survey of Digital Map Processing Techniques. ACM Computing Surveys, 47(1), Article 1 (2014). https://dl.acm.org/doi/10.1145/2557423

[6] Ronneberger, O., Fischer, P., and Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI (2015). https://arxiv.org/abs/1505.04597

[7] Yuan, B., and Yang, Q. Digitization of Well-Logging Parameter Graphs Based on a Gridlines-Elimination Approach. Journal of Petroleum Exploration and Production Technology (2019). https://doi.org/10.1007/s13202-019-0625-x

[8] Luo, J., Li, Z., Wang, J., and Lin, C.-Y. ChartOCR: Data Extraction from Charts Images via a Deep Hybrid Framework. IEEE Winter Conference on Applications of Computer Vision (WACV) 2021. https://openaccess.thecvf.com/content/WACV2021/html/Luo_ChartOCR_Data_Extraction_From_Charts_Images_via_a_Deep_Hybrid_WACV_2021_paper.html

What Curve Digitisation Taught Us About Every Chart-Extraction Problem

The pipeline, factored into four moves

Rung one: segmentation is where the metric lies to you

Rung two: the most portable step is the dullest one

Rung three: the axis is where the domain sneaks back in

Rung four: validate on a grid both curves agree on

What the abstraction is worth

Limitations

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on