Operators are spending $2.4 billion annually on subsurface studies that take 6–18 weeks — seismic foundation models now compress that to hours, on the same hardware.
Why this matters
For three decades, seismic interpretation has been a per-basin, per-project, per-interpreter craft. A team licenses a workstation, loads a 3D survey, trains a bespoke classifier on a few thousand hand-picked labels, and ships a horizon map six months later. The work is good. It is also unrepeatable, uninspectable, and expensive — Wood Mackenzie pegs the global subsurface-study spend at roughly $2.4 billion per year [1].
That economic model is breaking. Pre-trained transformers on 3D seismic volumes are doing for the subsurface what GPT-class models did for text: one model, many tasks, fine-tuned in hours instead of trained from scratch in months. The interesting wrinkle for the executive reader is that the bottleneck has shifted. It is no longer the geoscience. It is the data pipeline — and the person who controls the pipeline is the Chief Data Officer.
The CDO who controls the seismic training pipeline controls the exploration edge — that is now a data infrastructure decision, not a geoscience one.
The current state
The numbers tell a consistent story. Subsurface studies still dominate exploration budgets, and the cycle time has barely moved in twenty years. Interpreters spend weeks on tasks that the underlying math could do in minutes if the data were addressable at cloud scale — which, until recently, it was not.
The subsurface-study economics, today
Annual global subsurface-study spend [1]
Typical interpretation cycle per survey [2]
Interpreter FTE reduction per project, AI-assisted [3]
Raised by seismic-AI vendors in 2024 [4]
Seismic data is stored in SEG-Y, a tape-era format from 1975 that assumes sequential read on a single machine. Loading a single 3D survey into a distributed training job means custom shims, brittle ETL, and I/O costs that swamp the GPU bill. That is why, despite every major operator running an AI program for the better part of a decade, almost none have trained a foundation-scale model on their own seismic library.
What changed
Three things happened in parallel, and they only matter together.
First, pre-trained transformers proved out on 3D seismic volumes. Rather than training a fresh CNN per basin, teams now fine-tune a single backbone — pre-trained on hundreds of public and licensed surveys — for horizon picking, fault detection, facies classification, and salt-body segmentation. Generalisation across basins, which used to be the wall every academic paper hit, is now the default behaviour.
Second, MDIO — a cloud-native, chunked, Zarr-backed seismic format — replaced SEG-Y for training workloads. The I/O bottleneck that made distributed seismic training economically impossible has been solved at the file-system layer. A 1 TB survey that took hours to stage on a workstation now streams to a GPU cluster on demand.
Third, global fine-tune workflows have collapsed interpretation cycles from weeks to hours. Equinor's 2025 pilot reported sub-four-hour cycle time on surveys that previously took six to eighteen weeks [2]. The fine-tune happens on a fraction of the labels a bespoke model would have needed, because the backbone already knows what a fault looks like.
The new seismic interpretation stack
Pre-trained backbone
Transformer trained on hundreds of 3D volumes — basin-agnostic features.
MDIO storage layer
Cloud-native, chunked seismic — streams directly to GPU.
Global fine-tune
Hours, not weeks. Small label budget. Reproducible.
Asset-team delivery
Horizons, faults, facies, salt — single model, multiple heads.
Implications for the executive
The implication is uncomfortable for a lot of org charts. If a foundation model fine-tuned on your seismic library out-performs a bespoke per-basin classifier — and it does — then the competitive moat is no longer the interpretation team. It is the training corpus, the labelling discipline, and the pipeline that gets data from tape archive to GPU. All three of those are CDO problems, not chief-geoscientist problems.

The reframe
This does not eliminate the geoscientist. It re-scopes the role. dGB Earth Sciences reports a 40% reduction in interpreter FTE per project on AI-assisted workflows [3] — the remaining 60% spend their time on geological reasoning, prospect ranking, and edge cases the model flags as low-confidence. That is a higher-leverage job, and a smaller team doing it.
The capital is following. Seismic-AI vendors raised roughly $180M in 2024 [4], and the round sizes are accelerating into 2025 as operators move from pilots to production. The operators who win the next exploration cycle will be the ones who treated their seismic archive as a training asset rather than a project artifact.
What happens next
The CDO who controls the seismic training pipeline controls the exploration edge — and that is a data infrastructure decision, not a geoscience one. Three concrete moves separate the operators who will benefit from the operators who will watch from the sidelines.
What happens next
Inventory the corpus
Catalog every 3D survey, every interpretation, every label. Treat them as training assets, not project deliverables.
Migrate to MDIO
Move active surveys off SEG-Y and onto a cloud-native, chunked format. The I/O cost difference at training scale is order-of-magnitude.
Run a fine-tune pilot
Pick one basin, one task, one backbone. Benchmark cycle time against your current bespoke workflow. The result will tell you what the next budget cycle looks like.
References
[1] Wood Mackenzie. Global subsurface study spend, upstream exploration benchmark (2024).
[2] Equinor. Seismic foundation-model pilot, internal cycle-time report (2025).
[3] dGB Earth Sciences. AI-assisted interpretation productivity benchmark.
[4] Pitchbook. Seismic-AI vendor funding totals (2024).