A customer rarely arrives with a problem that sounds like ours. They arrive with a folder. Inside it are scanned well logs, often greyscale, often skewed, sometimes a photograph of a photocopy, and every one of them is a picture of numbers rather than the numbers themselves. What they want back is mundane and exact: a LAS file, the depth-indexed text format that every petrophysical tool will open without complaint [1]. Between the folder they upload and the file they trust sits a process, and the most honest way to describe that process is not to quote a single accuracy figure. It is to draw the funnel.
We learned to talk about the deployment this way because a single number always flatters or maligns the wrong thing. Tell an operator the model hits a peak R-squared of 0.9891 and they hear a promise about every scan; the truth is that the peak is a best case on a clean curve, and plenty of scans never reach the step where that number is even measured. The funnel keeps everyone honest. It says: of the scans you uploaded, this fraction had a readable header, this fraction held a curve we could segment, this fraction vectorised into a clean trace, and this fraction cleared a validation bar strict enough that you would stake a decision on the result. Reading a deployment stage by stage rather than as one conversion rate is older than our product and not our idea [2]; it is simply the right idea for a tool that turns one file format into another.
The four buttons, and what each one can reject
The product surface is deliberately small. An interpreter clicks four things: upload a scan, run segmentation, vectorise the mask into a curve, and validate then export the LAS. The four-step dashboard is the whole motion, and it is self-serve on purpose, because the alternative is a services engagement that does not scale past the first customer. What matters for the funnel is that each of those four steps is also a place a scan can fall out, and the reasons differ at each one.
The first step detects the track layout from the header. A raster log is organised into tracks, the vertical lanes that carry the gamma ray, the resistivity curves, the density and neutron porosity, and the model has to read the header well enough to know which lane is which before it segments anything. A scan with a torn header, a non-standard layout, or ink so faded the labels are unreadable can stall right here. Nothing downstream is salvageable if the tool cannot tell a porosity curve from a caliper, so this is the cheapest place to reject a scan and the most frustrating, because the curve data underneath may be perfectly intact.
The second step segments the curve pixels. This is the part most people picture when they imagine the work: the model looks at the greyscale image and produces a mask that says, these pixels are curve one, these are curve two, these are grid and noise. Thin one-to-three-pixel traces against a busy background are genuinely hard, and a scan that survived header detection can still lose a curve here if the trace is broken, overprinted, or buried under a grid the model cannot subtract. Segmentation is where the accuracy debate usually lives, and it is exactly one stage of four.
The third step vectorises the mask into a depth-indexed trace. A mask is a region of lit pixels; a LAS file needs a single value at each depth. Turning a band of pixels into one continuous line means resolving the centreline, bridging small gaps, and rejecting the places where the mask forked or vanished. A scan can pass segmentation with a plausible-looking mask and still fail to vectorise cleanly if the trace crosses itself or the depth axis is too distorted to register.
The fourth step is the one we treat as a dial rather than a gate with a fixed setting. It validates the vectorised trace against a quality bar and, if it clears, exports the LAS. This is where the real metrics surface for the user to judge, not where they are hidden until they are perfect, and the strictness of the bar is a choice. Set it loose and more scans become LAS files, some of them weaker fits. Set it strict and fewer files come out, but every one of them sits near the best fit we can produce. The discipline of catching bad values before a model output is trusted is not optional in petrophysics [4], and the validation step is where we make that discipline visible and adjustable.
Walk the funnel, then drag the last step
The instrument below puts the four steps in order and shows where an operator's uploaded batch thins out as it moves through them. The first three stages carry a fixed, illustrative retention shape; the fourth stage is live. Drag the validation acceptance bar, or use the arrow keys, and watch two numbers move in opposite directions: the share of the uploaded batch that becomes a delivered LAS file, and the fit that every delivered file is guaranteed to clear.
The thing the funnel makes visible is the trade nobody likes to say out loud. A digitisation tool can be configured to deliver almost everything or to deliver only what it is sure of, and those are different products dressed in the same four buttons. Push the bar toward strict and the delivered fraction shrinks while the quality floor rises toward the peak R-squared of 0.9891 and the lowest mean absolute error of 0.0132 we measured on the cleanest curves. Pull it toward loose and the funnel widens at the bottom, but some of the files that drop out carry fits that an interpreter would have wanted to catch. There is no setting that gives you both, and pretending otherwise is how a digitisation vendor loses a customer on the second batch.
The last step is a dial, and the customer should hold it
The first three stages of the funnel are mostly about whether the tool can read a scan at all: a readable header, a segmentable curve, a vectorisable mask. The fourth stage is different in kind, because it is a policy, not a capability. How strict the validation bar is decides how much of the batch becomes a trusted LAS file and how high the fit on every delivered file has to be, and those two move against each other. The right setting is not a fixed property of the model; it is a choice the customer makes about how much they are willing to review by hand versus how much they are willing to leave behind, and the honest product hands them that dial instead of fixing it silently.
Eight scans, three letters, and a very large archive behind them
Numbers in a funnel are only as good as the evidence under them, so it is worth being precise about what is measured and what is illustrative. The four-step flow is the product's real shape. The validation-step quality figures, the peak R-squared of 0.9891 and the lowest mean absolute error of 0.0132, are measured results. The reference archive the system was built against holds 7,781 LAS files paired with 136,771 scanned TIF images, which is the asymmetry that makes the whole exercise worth doing: there is roughly seventeen times as much paper as there is digitised text, and the funnel is the machine for closing that gap one upload at a time.
What we deliberately do not inflate is the count of scans we have personally walked end to end. Eight real scans were validated through the full pipeline, and three pilot customers signed letters of intent on the strength of that demonstration. Eight is a small number, and saying so is the point. The retention shape in the instrument is an illustrative schematic of where scans drop off in onboarding, not a claim that we have processed thousands of customer files; the corner figures, the eight, the three, the 7,781 and the 136,771, are the ones we stand behind. A funnel drawn on eight validated scans and a reference archive is an argument about the shape of the process, and it is honest precisely because it does not borrow volume it has not earned.
What the operator hears when you say it this way
The reason we onboard customers through a funnel rather than a headline accuracy number is that it changes the conversation from a promise to a plan. An operator who is told the tool is ninety-nine percent accurate will measure it against that on the first folder and be disappointed by the scan with the torn header that never made it to segmentation. An operator who is shown the funnel knows before they upload that some scans will be rejected early for reasons that have nothing to do with the model's skill at reading a curve, that the middle stages are where the hard pixel work happens, and that the final bar is theirs to set. The same eight validated scans become evidence for a process rather than a benchmark to be beaten, and the gap between 136,771 scanned images and 7,781 digitised files becomes a roadmap instead of a reproach.
The survey work on machine learning across upstream oil and gas keeps making the same quiet point, which is that the analytics everyone wants sits downstream of the data preparation almost nobody budgets for [3]. Raster-log digitisation is that unglamorous preparation, and the funnel is how we keep it honest. It does not let a peak number stand in for a process, it does not let a vendor hide the strictness dial, and it does not pretend that eight validated scans are eight thousand. A folder of scanned PDFs goes in, a smaller set of LAS files the customer will actually trust comes out, and the four steps between them are the entire story, told at the resolution it deserves.
Reading a digitisation deployment as a funnel
- A digitisation deployment is honestly described as a four-step funnel, not a single accuracy number: of the raster scans an operator uploads, one fraction clears header track detection, one clears pixel segmentation, one vectorises into a clean depth trace, and one clears the validation bar that turns it into a LAS file the customer trusts.
- Each of the four steps can reject a scan for a different reason. Step one fails on unreadable headers, step two on broken or overprinted thin curves, step three on masks that will not resolve to a single centreline, and step four on fits that do not clear the validation bar. The accuracy debate that dominates most conversations is only the second of four stages.
- The final step is a dial, not a fixed gate. A stricter validation bar delivers fewer LAS files but raises the fit every delivered file must clear, toward the measured peak R-squared of 0.9891 and lowest mean absolute error of 0.0132; a looser bar delivers more and lets weaker fits through. No setting gives both, and the customer should hold the dial.
- The evidence is stated precisely and not inflated: a four-step dashboard, eight real scans validated end to end, three letters of intent from pilot customers, and a reference archive of 7,781 LAS files against 136,771 scanned TIF images. The per-stage retention shape is illustrative; the corner figures are measured.
- Onboarding through a funnel changes the customer conversation from a promise to a plan. It names where scans drop off before they upload, frames the seventeen-to-one gap between scanned images and digitised files as a roadmap, and turns a small set of validated scans into evidence for a process rather than a benchmark to be beaten.
A scanned log is a picture of a decision someone made decades ago, and the customer wants the decision back as numbers, not as an image of numbers. The funnel is simply the most truthful way to tell them how much of their folder will make that journey, and to let them decide, with the bar in their own hand, how sure they want each file to be before it carries their name into a workflow.
References
[1] Canadian Well Logging Society. LAS Version 2.0: A Floppy Disk Standard for Log Data. CWLS Floppy Disk Committee (1992, with later 3.0 revisions). The community specification for the Log ASCII Standard, the depth-indexed text format that a digitised raster log must become before any petrophysical workflow will read it, which is why the end of the onboarding funnel is a LAS file and not just a curve overlay. https://www.cwls.org/products/
[2] McClure, D. Startup Metrics for Pirates: AARRR. 500 Startups (2007, widely reprinted). The acquisition, activation, retention, referral and revenue funnel framing that a deployment should be read stage by stage, with a conversion rate measured at each step rather than collapsed into one number, which is the lens this piece applies to scans clearing each digitisation stage. https://500hats.typepad.com/500blogs/2007/09/startup-metrics.html
[3] Koroteev, D., and Tekic, Z. Artificial Intelligence in Oil and Gas Upstream: Trends, Challenges, and Scenarios for the Future. Energy and AI, Volume 3 (2021). The survey of where machine learning is being adopted across upstream oil and gas, which situates raster-log digitisation among the data-preparation tasks an operator must clear before the analytics it actually wants. https://www.sciencedirect.com/science/article/pii/S2666546820300446
[4] McDonald, A. Data Quality Considerations for Petrophysical Machine Learning Models. Petrophysics, SPWLA (2021). The practitioner argument that missing, mislabelled and out-of-range values in well-log data must be caught before a model is trusted, which is the discipline the validation step at the end of the funnel enforces on every delivered LAS file. https://onepetro.org/petrophysics/article/62/06/585/471941