Beyond the wellbore: an AI feasibility note on lignin and bio-inspired materials

Abstract

Most of EarthScan's published work is in the wellbore — raster log digitisation, well-to-well correlation, and curve correction at scale. This is a note about a parallel research thread: applying deep learning to atomic-scale modelling of lignin, the second-most-abundant biopolymer on Earth and a structurally complex feedstock that has been chronically under-utilised in the energy transition.

This is a feasibility note, not a results paper — written from the perspective of having scoped the problem and decided where the leverage is. If you're a CTO, VP of technology, or research lead asking "where should AI sit in our biomass / bio-materials roadmap?", this should clarify the technical surface area and the data realities you'll hit.

Why lignin

Lignin is what holds wood together. After cellulose, it's the most abundant natural polymer on the planet — roughly 100 million tonnes per year is produced as a waste stream from paper and ethanol manufacturing. The market hovers around $0.9–1.0 billion today and is forecast to grow at a low-single-digit CAGR through 2026.

Most of that lignin is incinerated for low-grade heat. Less than 2% is sold as a structural input — into dispersants, adhesives, surfactants, and a small (but growing) sliver of advanced uses: high-performance carbon fibre, vanillin, phenolic resins, polyurethane substitutes.

The economics tell you something is broken. The chemistry tells you why: lignin is a heterogeneous, branched polyaromatic macromolecule whose structure depends on species, soil, climate, age, and processing route (kraft, organosolv, ligno-sulphonate). There is no single "lignin" to design around — there's a vast structural family, and the properties you care about for any given downstream application sit in the tail of that distribution.

That heterogeneity has historically been the moat. It's also exactly where AI starts to have a real edge.

Where AI fits — two concrete tasks

We scoped two specific tasks that, if cracked, change the lignin economics:

1. Quantify Higher Heating Value (HHV) without a calorimeter

HHV is the energy released per unit mass of fuel when fully burned — the single most-asked question for any biomass: how much energy does this batch hold? Today the answer comes from a bomb calorimeter run, which is precise but slow, expensive, and per-sample.

The shortcut, demonstrated repeatedly in the literature, is to predict HHV directly from proximate and ultimate analyses (volatile matter, fixed carbon, ash, ultimate C/H/N/O/S composition) — analyses that are cheaper, faster, and more readily available across an existing processing pipeline. Classical regressions get to acceptable error bands; deep learning consistently does better, especially in the long tail of unusual feedstocks.

For us, the interesting variant is temporal HHV prediction — using the time series of a gasification process to predict CH₄, CO, CO₂, H₂, and HHV outputs concurrently. That turns HHV from a static lab metric into a control-room signal.

2. Atomic-scale design of bio-inspired materials

This is the more speculative half, and the one with the bigger ceiling.

Lignin's value beyond combustion comes from its functional groups — the polyaromatic backbone is a starting structure that, with the right modifications, becomes a polyurethane substitute, a carbon-fibre precursor, a controlled-release pesticide carrier. Discovering those modifications has historically meant atomic-scale modelling of potential energy surfaces (PES) — costly, slow, and prone to combinatorial blow-up.

The shift in the last few years is that deep neural networks can approximate the PES function directly. Once you have a fast neural-network surrogate, the inner loop of materials design (propose candidate → evaluate energetics → iterate) collapses by orders of magnitude. Combine that with multimodal inputs — combining quantum-chemistry datasets, microstructural images, and processing- condition logs — and you have something that resembles an actual research accelerator rather than a tool that helps existing chemists slightly.

What's actually hard

The brochure version of "AI for materials" rarely survives contact with the data. Three honest constraints from the scoping work:

Data is the bottleneck, not models. The published lignin datasets are small, fragmented across labs and processing routes, and inconsistent in their analytical methodology. Any model trained on one lab's organosolv lignin generalises poorly to another lab's kraft lignin without thoughtful domain adaptation. The expensive part of this work is not the deep learning — it's the data pipeline that ingests, normalises, and labels samples across multiple sources of truth.

The expensive parts of materials science are still expensive. Neural surrogates for PES are fast at evaluation time, but training them still requires high-quality quantum-chemistry data that someone had to produce. The cost shifts; it doesn't vanish. A realistic deployment buys you 10–100× speedup in the inner loop, conditional on having paid the upstream training cost (or borrowed it from a foundation model that someone else trained).

Validation isn't optional. A neural network can suggest a candidate material; that candidate still has to be synthesised and characterised in a lab. Closing that loop — propose → synthesise → characterise → feed back — is where pilots become products. Most projects stall because the synthesis side runs out of budget before the model has seen enough negative examples.

Why this matters for the energy transition

The EU's renewable-energy targets are not negotiable; the path to hitting them is. Forest biomass, agricultural residues, and pulp-mill side streams are abundant, geographically distributed, and politically easier to deploy than new offshore wind. The bottleneck is value extraction — turning a heterogeneous waste stream into a predictable, priceable input for downstream chemistry.

That's a data-and-decision problem before it's a chemistry problem.

If a paper mill could (a) characterise its lignin output stream in real time, (b) predict the HHV and chemical fingerprint of each batch, and (c) route batches to the right downstream buyer (carbon-fibre, adhesives, biofuels) automatically — the unit economics of the entire biorefinery shift. None of that requires a single new molecule. It requires the data plumbing and the prediction layer that sits between process sensors and commercial routing.

Which sounds, frankly, exactly like what we've spent the last few years building for the wellbore.

Where this leaves us

Project MIMEA, the feasibility study this note draws from, was explicitly a go / no-go exercise. We scoped the technical surface, the data realities, and the commercial fit — and concluded that the opportunity is real but the right sequencing is HHV-first (narrow, measurable, deployable in 6–12 months) and bio-inspired materials second (broader ceiling but longer cycle and synthesis- dependent).

We don't have a lignin product on the EarthScan roadmap today. We do have a research thread, and the in-flight portfolio (subsurface AI, multimodal processing, agentic workflows for energy operators) is closer to this work than it might look. Most of the techniques that matter for biomass — multimodal models, neural surrogates for expensive simulators, careful domain adaptation across heterogeneous data sources — are exactly the techniques we use on raster logs and well-to-well correlation.

If you're at an energy operator, biorefinery, or pulp/paper company thinking about where AI fits in your biomass roadmap, the conversation is one we'd genuinely enjoy having.

Notes & references

This note draws on a 2021 SNN feasibility study by Real AI B.V. (the research arm in DeepKapha's group, which also produces EarthScan). A more comprehensive whitepaper covering the methodology and the full go/no-go reasoning is in preparation. Key external references for the technical framing:

Ghugare, S. B., Tiwary, S., Elangovan, V., & Tambe, S. S. (2014). Prediction of higher heating value of solid biomass fuels using artificial intelligence formalisms. BioEnergy Research, 7(2), 681–692.
Valim, I. C., Rego, A. S., Queiroz, A., Brant, V., Neto, A. A., Vilani, C., & Santos, B. F. (2018). Use of Artificial Intelligence to Experimental Conditions Identification in the Process of Delignification of Sugarcane Bagasse from Supercritical Carbon Dioxide. In Computer Aided Chemical Engineering (Vol. 43, pp. 1469–1474).
Hiraide, K., Hirayama, K., Endo, K., & Muramatsu, M. (2021). Application of deep learning to inverse design of phase separation structure in polymer alloy. Computational Materials Science, 190, 110278.
Zhai, C., Li, T., Shi, H., & Yeo, J. (2020). Discovery and design of soft polymeric bio-inspired materials with multiscale simulations and artificial intelligence. Journal of Materials Chemistry B, 8(31), 6562–6587.
Lignin market sizing — Global Market Insights (2020), Europe Lignin Market Research Report (2020).