Skip to main content

Case Study

From weeks to days: an NOC's agentic seismic-interpretation loop

A Southeast Asian national oil company compressed exploration interpretation from six weeks to four days using a foundation-model agent that proposes horizon picks, flags uncertainty, and learns from geoscientist corrections.

Tannistha Maitiby Tannistha Maiti
Case study

A national oil company in Southeast Asia cut exploration interpretation cycle time from six weeks per prospect to four days — and reduced geoscientist hours by 72% — by deploying an agentic interpretation loop that proposes horizon picks, flags low-confidence zones, and learns from every correction. The bottleneck wasn't compute; it was turning scarce expert judgment into a scalable, consistent asset.

At a glance

Three metrics tell the story of what changed when a human-in-the-loop agent joined the interpretation workflow.

6 weeks → 4 days

Interpretation cycle time

−72%

Geoscientist hours per prospect

+38 pts

Cross-team pick consistency

The challenge

The national oil company operates a diverse exploration portfolio across six basins. Every new prospect begins with a seismic-interpretation exercise: a geoscientist loads a 3D volume, picks key horizons, maps fault networks, and estimates structure. The work is deliberate, detail-heavy, and resistant to shortcuts. A typical prospect took six to eight weeks from acquisition handover to a completed interpretation ready for portfolio ranking.

Time was the obvious problem. With a pipeline of twenty-plus prospects in queue and a geoscience team of eleven, the backlog stretched into quarters. But the deeper issue was consistency. Three geoscientists interpreting the same volume would produce three subtly different fault maps and horizon surfaces. When leadership tried to rank prospects basin-wide, the variance in structural closure estimates made apples-to-apples comparison impossible.

The exploration manager framed the brief plainly: we need to interpret faster, and we need the interpretations to agree with each other when the geology is the same. Manual quality control wasn't scaling, and hiring more geoscientists wouldn't solve the consistency gap.

What we did

We built an agentic interpretation loop — a foundation model fine-tuned on the basin's own labelled seismic volumes, wrapped in a review agent that proposes picks, assesses its own confidence, and routes uncertain sections back to a human geoscientist.

The system works in three passes. First, the model ingests a 3D seismic volume and generates a complete interpretation: horizon surfaces for five key stratigraphic markers and a fault-network skeleton. Second, the review agent scores every inline and crossline pick for geologic plausibility — does the horizon stay continuous across faults, does the dip magnitude match adjacent sections, are there polarity flips that suggest mis-ties. Any segment scoring below a confidence threshold gets flagged. Third, a geoscientist reviews only the flagged zones, accepts or corrects the picks, and commits the edits. Those corrections flow back into the training loop, so the model learns the interpreter's judgment over time.

The fine-tuning corpus came from twelve legacy prospects the team had already interpreted by hand — roughly 140 labelled inlines and 95 crosslines per volume. We augmented with synthetic fault geometries to teach the model how to handle throw variability and relay ramps, features under-represented in the labeled set.

The three-pass interpretation loop

  1. Model proposes full interpretation

    Horizons + fault network for the entire 3D volume

  2. Review agent scores confidence

    Flags low-plausibility picks and routing uncertain zones to human review

  3. Geoscientist corrects flagged zones

    Edits feed back into fine-tuning for the next prospect

The outcome

The first production prospect took nine days end-to-end — the geoscientist spent four hours reviewing flags, made corrections on 11% of the proposed picks, and signed off. By the third prospect, cycle time dropped to four days and review hours to ninety minutes. The model had learned the interpreter's style: where she expected horizons to pinch out near salt diapirs, how she handled ambiguous fault terminations.

Cross-team consistency improved by 38 percentage points, measured as the mean overlap coefficient between independently reviewed interpretations of the same test volume. The variance that used to make portfolio ranking subjective now sat within the noise floor of seismic resolution itself.

Geoscientist hours per prospect fell 72%. The senior interpreter described the change as freeing her to do geology again instead of digitizing reflectors. She now spends her time on the 20% of picks where geologic judgment matters — near-salt deformation, subtle stratigraphic onlap — and lets the agent handle the mechanical 80%.

Where the time went

The 72% reduction in geoscientist hours didn't come from faster clicking. It came from eliminating the first-pass drudgery — loading the volume, scrolling through 400 inlines, placing seed points, interpolating between lines — and starting the human review with a complete, plausible interpretation already on screen.

What this unlocked

Faster interpretation was the visible win, but the downstream changes mattered more. With cycle time compressed, the exploration team could run sensitivity cases — re-interpret a prospect under different velocity models, test alternative fault correlations — without burning weeks of calendar time. Portfolio ranking moved from annual to quarterly, and the geoscience team could feed updated risk assessments into capital allocation before drilling schedules locked.

The consistency gains made cross-basin comparison credible for the first time. Leadership could finally ask which basin offers the best risk-adjusted return and trust that the structural-closure estimates feeding that answer were built on the same interpretation rules.

The learning loop meant the model improved with use. Six months in, the flag rate dropped from 11% to 4%. The agent had internalized not just seismic character but the interpreter's decision heuristics — when to honor a weak reflector, when to call a fault sealed. The geoscientist's corrections became the training data that made her own reviews faster next time.

Lessons and next steps

Three lessons stand out. First, confidence scoring is not optional. An agent that proposes picks without flagging uncertainty forces the human to review everything, collapsing back to manual QC at scale. The review agent's ability to say I'm unsure here is what made the time savings real.

Second, the fine-tuning corpus needs to match the production task. Seismic foundation models pre-trained on global datasets are fluent in seismic texture, but they don't know how this basin's carbonates behave near basement highs or how this team handles fault-shadow artifacts. Twelve hand-labeled prospects were enough to teach basin-specific and interpreter-specific priors.

Third, the win wasn't replacing the geoscientist. It was giving them a tireless first-pass partner so their judgment went to the 20% that needed it. The senior interpreter still signs every interpretation. She just doesn't spend forty hours getting to the signature line anymore.

The exploration team is now extending the agent to well ties — correlating seismic horizons to formation tops in wellbore data — and to amplitude-variation-with-offset attribute analysis. The same human-in-the-loop pattern applies: the agent proposes, flags uncertainty, and learns from corrections. The bottleneck is shifting from how fast we can interpret to how fast we can decide what to drill.

Before and after

The metric that matters most to the exploration manager is simple: how long from seismic handover to a portfolio-ready interpretation. The agentic loop collapsed that timeline by an order of magnitude, and the time saved went straight into better decisions.

Interpretation cycle time per prospect

Before

6 weeks

After

4 days

References

1 Outcome metrics (interpretation cycle time, geoscientist hours, consistency improvement) provided by client exploration team during post-deployment review, Q3 2024.

2 Cross-team consistency measured as mean overlap coefficient (Jaccard index) of horizon surfaces interpreted independently by three geoscientists on a 20 km² test volume, before and after agent deployment.

EarthScan
Continuous AI for explorers

info@earthscan.io

Go to Top

© 2026 Copyright. Earthscan