Engineering Drawing Interpretation: A Survey of Symbol and Line Recognition

Abstract

An engineering drawing is two documents on one sheet. One is a set of discrete marks that stand for things: a valve, a fitting, a tag, a title-block glyph, each a bounded shape drawn from a known vocabulary. The other is a web of continuous lines that connect those marks: pipes, wires, leaders, section traces, each a long thin run of foreground pixels with no fixed extent. This survey reads the graphics-recognition literature as the study of those two documents and keeps the split deliberate, because the two halves have different natural methods and different failure modes. The symbol half runs from template matching and constraint graphs to learned detectors; the line half runs from vector tracing to pixel-level segmentation. The claim we build toward is narrow: the line half is the same thin-structure segmentation problem our own raster well-log digitiser was built to solve, where a few-pixel curve must be recovered against a background that is easy precisely because it is everything else. We anchor that claim to one measurement from our multiclass log run, in which the background mask reaches an intersection-over-union of 0.94 while the two thin curves reach only 0.26 and 0.21 under the same three-class model trained with an overlap loss. The survey positions our thin-structure work next to the drawing-recognition field rather than re-benchmarking it.

Graphics recognition named symbols as a problem in their own right well before deep learning arrived. Cordella and Vento's survey collected the techniques and framed the task in the classification sense: given a bounded region known to hold a mark from a vocabulary, decide which mark it is [1]. Llados and colleagues extended that framing and, importantly for us, drew the same line this survey draws, separating symbol recognition from the analysis of the lines and structures that hold the symbols together [2]. The continuous-line half has an equally old and separate lineage. Fletcher and Kasturi's method for separating text strings from mixed text-and-graphics images is a canonical statement that the two layers must be pulled apart before either can be read [3], and Ah-Soon and Tombre's architectural work shows how tightly they couple in practice: their constraint network recognises a symbol partly by the wall lines it interrupts [4]. The coupling makes the split a methodological convenience rather than a physical separation, but it does not remove it.

When dense learning arrived it landed hardest on the line half. Fully convolutional networks made per-pixel labelling an end-to-end learned task [5], and the U-Net encoder-decoder became the default backbone for recovering fine detail, because its skip connections carry the high-resolution information a thin structure needs back across the bottleneck [6]. Both are the natural tools for tracing a line as a segmentation mask. That formulation then meets the imbalance that defines the problem: a pipe or a curve is a small fraction of the pixels, so a network on a pixel-counting loss can score well by predicting background almost everywhere. Sudre and colleagues addressed this with a generalised overlap loss for highly unbalanced segmentation [7], and the convolutional block attention module of Woo and colleagues reweights features across channels and space so the network emphasises the thin foreground it would otherwise smooth over [8]; in our own pipeline we run it with a channel reduction ratio of 8 and a spatial kernel of 7. The engineering-drawing community held the split explicit throughout: Moreno-Garcia and colleagues keep symbol detection and connector tracing as separate sub-problems with separate evaluations [9], the same partition this survey adopts.

Method

This is a structured reading of the published graphics-recognition and dense-segmentation literature, organised so the survey's one claim stays testable. We fixed a partition first: every method is placed by its target, discrete symbols or continuous lines, and by its era, classical rule-and-vector methods or learned segmentation. For the symbol half we read the classical statements of the problem [1] [2] [3] [4] and treat learned detection as the later era of the same target. For the line half we read the dense-prediction backbones that made segmentation the natural formulation [5] [6], the loss and attention machinery that make it work against imbalance [7] [8], and the engineering-drawing survey that keeps connectors as their own sub-problem [9]. For each method we asked what the target is, what is matched or labelled to recover it, and how it behaves when the foreground is thin.

To anchor the line half to a real task we read it against one measured reference from our own archive: a multiclass segmentation of a scanned well log into three classes, a background and two thin curves, trained with an overlap loss. The numbers we quote, a background-mask intersection-over-union of 0.94 against curve-mask values of 0.26 and 0.21, are real and used as a worked illustration of the thin-structure penalty, not as a benchmark of any drawing-recognition method. The exhibit below is built on the same footing: the era-by-target placement of each method family is an illustrative taxonomy, while the IoU bars it argues from are the sourced anchor.

Two documents on one sheet

The split is worth insisting on because the two halves reward opposite instincts. A symbol is compact and drawn from a closed vocabulary, so the useful prior is shape identity: match the region against templates, or describe it with a graph of primitives and recognise the graph [1] [2] [4]. Error on a symbol is a local misclassification, one wrong label on one bounded object. A line has no fixed extent and no vocabulary; it is defined by continuity. The useful prior is connectivity, not identity, and error on a line propagates: a pipe traced through the wrong junction connects the wrong equipment, and the mistake travels. That is why the classical line methods are tracers and linkers rather than classifiers [3], and the learned line methods are dense segmenters that label every pixel [5] [6]. The coupling between the halves is real, and a whole-drawing reader must reconcile the marks with the network they sit in [4], but it does not collapse the split: the two targets are recovered by different operations and judged by different metrics, which is why the surveys keep them apart [1] [2] [9].

The line is a thin-structure problem

Here is the load-bearing observation. Once the line half is posed as segmentation, it is the same problem as recovering a curve from a raster well log. In both cases the foreground is a long thin run of pixels, a fraction of a percent of the image; in both cases the background is easy because it is simply the absence of the line; and in both cases the naive loss is happy to declare almost everything background and still score well. The engineering-drawing literature meets this with connector-specific handling [9], and the segmentation literature meets it with imbalance-aware losses [7] and attention that lifts the thin foreground [8]. These are not two toolkits. They are one toolkit applied to two drawings.

A survey map of engineering-drawing interpretation, laid out as a field of two rows. The upper row is discrete symbol recognition, a detection and classification task on bounded shapes; the lower row is continuous-line tracing, a thin-structure segmentation task. Left to right is the era, from classical vector tracing to learned segmentation, and the plotted method families cluster along that progression. The difficulty bars below carry the argument with the one sourced anchor: on our own multiclass well-log run, the fat background mask reaches an intersection-over-union of 0.94, while the two thin curves reach only 0.26 and 0.21, a gap of 0.71 against the mean of the two curve masks that is the thin-structure penalty in a single number. Drag the difficulty lever to set an IoU threshold and watch the fat background clear it long before either thin curve does, which is the same imbalance an engineering drawing's leader lines and pipes present to a tracer. The orange elements are the only ones that argue: the thin-curve bars and the threshold they struggle to reach. The three-class formulation, the two curves per log, the background and curve IoU values, and the CBAM reduction ratio of 8 with a spatial kernel of 7 are sourced from the engagement archive; the era-by-target placement of each method family is an illustrative taxonomy of the published field, not a measured ranking.

The exhibit lays the field out as two rows, symbols above and continuous lines below, with era running left to right, and then shows the thin-structure penalty as a measured gap. The background mask stands at an intersection-over-union of 0.94 and the two thin curves at 0.26 and 0.21, so dragging a difficulty threshold across the bars makes the background clear it while both curves are still failing. The era-by-target placement of the method families is an illustrative taxonomy, flagged as such on the canvas; only the IoU values, the three-class formulation, the two-curve count, and the attention settings are sourced.

The gap, not the absolute numbers, is the finding. A background IoU of 0.94 is what an easy, everywhere class scores almost by default. The curve values of 0.26 and 0.21 are the real difficulty, and they sit low for a structural reason: a thin foreground carries little area to overlap, so every pixel of slop costs a large share of the score. An engineering drawing's leader lines and pipes present the identical arithmetic to a tracer. What should travel between the two domains is not our curve IoU, which is specific to our data and loss, but the shape of the penalty, which is a property of thin structures wherever they are drawn.

What the split predicts for practice

Reading the two halves side by side yields a division of effort the surveys imply and our own work confirms. On the symbol half the leverage is the vocabulary and the shape prior, and the classical framing survives into the learned era largely intact [1] [2]. On the line half the leverage is entirely in imbalance and continuity: the backbone must preserve fine detail [6], the loss must refuse to let the background dominate [7], and attention must keep the thin foreground from being averaged out [8]. A team that ports a symbol-detection mindset onto the line half, or the reverse, misallocates its effort, because the two targets do not fail the same way.

For us the payoff is transfer in the correct direction. Our thin-structure segmentation of well-log curves and the continuous-line half of drawing recognition are the same problem, so the losses, the attention settings, and the imbalance-aware evaluation we built for one are the right starting point for the other. The symbol half is genuinely a different task and should be treated as one. Naming the split is what lets a practitioner reuse what transfers and rebuild only what does not.

Limitations

This is a survey and inherits a survey's limits. It synthesises how the published literature treats symbols and continuous lines and does not re-implement or re-measure any drawing-recognition method it discusses. Where it quotes numbers, those are the real metrics of a single multiclass run from one engagement and one architecture, a background-mask intersection-over-union of 0.94 against curve-mask values of 0.26 and 0.21 in a three-class, two-curve formulation trained with an overlap loss, used as a worked illustration of the thin-structure penalty rather than as a benchmark of drawing line tracing, which we did not run. The claim that the well-log line and the drawing line are the same problem is an argument from the shared structure of the task, a thin foreground against an easy background, not a cross-domain experiment; we have not trained our model on drawings or a drawing model on logs. The exhibit's era-by-target placement of method families is an illustrative taxonomy, flagged as such on the canvas, while only the IoU anchor and the formulation facts are sourced. The survey scopes itself to the canonical statements of the two halves and the dense-prediction machinery the line half depends on, and it stops at the close of its own quarter, so later refinements are out of frame. Take this as a map of why the two halves want different methods, and why one of them is already a problem the thin-structure segmentation community has been solving, not as a substitute for measuring either task on your own data.

References

[1] Cordella, L. P., and Vento, M. Symbol recognition in documents: a collection of techniques? International Journal on Document Analysis and Recognition, 3(2), 73-88 (2000). A survey framing symbol recognition as a distinct task within graphics recognition. https://doi.org/10.1007/s100320000036

[2] Llados, J., Valveny, E., Sanchez, G., and Marti, E. Symbol Recognition: Current Advances and Perspectives. Graphics Recognition Algorithms and Applications, LNCS 2390, Springer, 104-128 (2002). Positions symbol recognition against line and structure analysis in technical drawings. https://doi.org/10.1007/3-540-45868-9_9

[3] Fletcher, L. A., and Kasturi, R. A robust algorithm for text string separation from mixed text/graphics images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(6), 910-918 (1988). A classical method for separating symbols and text from the continuous-line graphics around them. https://doi.org/10.1109/34.9112

[4] Ah-Soon, C., and Tombre, K. Architectural symbol recognition using a network of constraints. Pattern Recognition Letters, 22(2), 231-248 (2001). A constraint-graph approach that reads symbols against the line structure of architectural drawings. https://doi.org/10.1016/S0167-8655(00)00097-1

[5] Long, J., Shelhamer, E., and Darrell, T. Fully Convolutional Networks for Semantic Segmentation. CVPR (2015). The shift that turned dense pixel labelling, including thin-structure labelling, into an end-to-end learned task. https://arxiv.org/abs/1411.4038

[6] Ronneberger, O., Fischer, P., and Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI (2015). The encoder-decoder with skip connections that most thin-structure segmentation still builds on. https://arxiv.org/abs/1505.04597

[7] Sudre, C. H., Li, W., Vercauteren, T., Ourselin, S., and Cardoso, M. J. Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. DLMIA/ML-CDS at MICCAI, LNCS 10553, 240-248 (2017). An overlap loss that addresses the thin-foreground versus fat-background imbalance directly. https://arxiv.org/abs/1707.03237

[8] Woo, S., Park, J., Lee, J.-Y., and Kweon, I. S. CBAM: Convolutional Block Attention Module. ECCV (2018). A channel-and-spatial attention block that helps a network emphasise the thin foreground it would otherwise average away. https://arxiv.org/abs/1807.06521

[9] Moreno-Garcia, C. F., Elyan, E., and Jayne, C. New trends on digitisation of complex engineering drawings. Neural Computing and Applications, 31, 1695-1712 (2019). A survey that keeps symbol detection and connector tracing as separate sub-problems. https://doi.org/10.1007/s00521-018-3583-1

Engineering Drawing Interpretation: A Survey of Symbol and Line Recognition

Abstract

Method

Two documents on one sheet

The line is a thin-structure problem

What the split predicts for practice

Limitations

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on