Shipping a Product Website and Analytics While the Model Trained

The mistake we almost made was one of ordering. On a sixteen-week accelerated envelope, split into four project blocks, it is very easy to decide that the website is what you build once the model works. The model was VeerNet, our curve-segmentation network for raster well logs, and its multiclass builds trained in cycles of about ten hours, five hundred and fifty minutes for fifty epochs on the fifteen-thousand-log synthetic set. If the marketing site and its analytics were a downstream task of the model, then every one of those training cycles was a thing the go-to-market surface had to wait behind. Add up four blocks of that, plus the reruns you always need when a loss function surprises you, and the site would have started late, shipped later, and arrived at launch as an afterthought that nobody had measured. We chose instead to make the site its own track, with its own owners and its own schedule, running beside the training the whole way. This is the account of that choice and what it actually took.

Why the website could not be a downstream task

There is a version of a product launch where the sequence is clean: prove the model, then present it. It falls apart the moment the clock is real. Training is not a step you clear once; it is a recurring, GPU-bound event that reappears every block, and each recurrence is long enough to feel in a schedule. If the site inherits that cadence, it inherits the waiting. Worse, the waiting is not even productive, because the model getting better does not teach you how to explain it, how to lay out a pricing page, or which events on a landing page predict that a reader will ask for a pilot. Those are separate problems with separate expertise, and they do not get solved faster by staring at a training curve.

So we treated the go-to-market surface as a parallel deliverable with a hard rule attached: nothing on the site track was allowed to declare a dependency on a training run finishing. The site needed real copy, a working information architecture, performance that did not embarrass us, and analytics that could tell a true story about who arrived and what they did. Not one of those needs the model to have converged. They need decisions, and decisions do not queue behind epochs.

Two tracks, one launch

The exhibit below is the schedule as we ran it, and as we could have run it. Two swimlanes cross the sixteen-week envelope. The upper lane is the model: four blocks, each carrying its ten-hour training cycle, the long serial events. The lower lane is the site and its analytics instrumentation, staffed at six full-time people on the accelerated track, running continuously across the same weeks. The lever is the whole argument. Leave it on Parallel and the two lanes converge on one launch line at week sixteen, which is what happened. Flip it to Serial, force the site to wait until the last training cycle clears, and the launch line slides out to the right. The orange band that opens is the go-to-market time we would have burned had we let the website be a serial dependency of the model instead of a track of its own.

Two swimlanes over the 16-week accelerated envelope. The MODEL lane carries the four project blocks and their discrete ~10-hour (550-minute) multiclass training cycles, the long GPU-bound events. The SITE lane carries the marketing-site build and its analytics event instrumentation, staffed at 6 FTE, running continuously and independently across the same weeks. The scheduling lever is the argument: in Parallel the two lanes run side by side and converge on one launch line at week 16, the real schedule; flip to Serial and the site work is forced to queue until the last training cycle has cleared, and the orange band that opens up is the only element that argues, measuring the go-to-market weeks that would have been lost had the site been a serial dependency of the model rather than its own track. The 16-week envelope, the four blocks, the 6 FTE, and the 550-minute training cycle are sourced from the engagement archive; the within-week placement of each block and each site segment, and the size of the serial spill, are illustrative scheduling shape, not measured durations.

The point of the picture is not that parallelism is clever. It is that the two kinds of work are genuinely independent, and the only thing that ties them together is the launch date. Recognising that early is what let six people build a real site without ever asking the GPU queue for permission.

What shipped on the site track

The website was not a placeholder. It carried the honest version of what VeerNet did: raster well logs in, depth-indexed curves out, with the real performance numbers where a buyer would look for them rather than buried. Writing that copy was work we could do the day we understood the pipeline, months before the final checkpoint existed, because the story of the product does not change when the fourth decimal of an accuracy metric does. When later training blocks moved a number, we changed a number on a page. The structure around it, the page that argued the product was worth a pilot, had been standing and load-tested for weeks.

Performance was a first-class item on this track, not a cleanup at the end. A site that a technical buyer bounces off because it is slow has failed regardless of how good the model behind it is, so we held the front end to the field-oriented web performance metrics that were becoming the shared language for a healthy page: the largest contentful paint, the responsiveness to input, and the visual stability of the layout as it loaded [1]. Those are engineering targets you can hit in week four. They do not improve by waiting for a model.

Instrumenting before there was traffic to measure

The analytics were the part most people expect to defer, and deferring them is the classic own goal. If you wait to instrument until you have traffic, your first weeks of the most interesting visitors arrive unmeasured, and you spend the launch reconstructing what you should have simply recorded. We built the event instrumentation as a deliverable in its own right, in the middle stretch of the site track, well before there was meaningful traffic to observe.

The discipline that mattered was not the tooling; it was the taxonomy. An analytics layer is only as trustworthy as the definitions underneath it, and a sloppy event schema produces numbers that look precise and mean nothing. We wrote a small, deliberate tracking plan first: a named set of events for the moments that actually signalled intent on a page like ours, a page view, a scroll to the performance section, a click into the request-a-pilot flow, each with a fixed set of properties and one agreed definition, so that a count meant the same thing every time it was read [3]. Naming those events forced the harder conversation about what a good outcome on the site even was, and having that conversation in week eight rather than after launch is most of the value.

Two decisions kept the instrumentation honest. First, we treated the event stream as the source of truth and derived every report from it, rather than logging pre-aggregated counts we could never re-cut later; when you keep the raw events and compute views on top, a new question is a new query instead of a lost month [4]. Second, we built the plan so that the moment we did want to compare two versions of a page, the measurement was already trustworthy, because the events and their definitions had been fixed in advance rather than invented to fit a result [2]. We were not running experiments at launch, but we refused to instrument in a way that would have made honest experiments impossible later.

What the parallel track cost, and what it bought

Running the site as its own track is not free. It needs its own people, and those six were not available to accelerate the model; that is a real allocation, made on purpose, because a converged model that nobody can find or evaluate is not a shipped product. It also needs coordination discipline, since the one genuine coupling, the launch date, has to be defended from both sides. The model track cannot quietly slip the date because a loss function misbehaved, and the site track cannot quietly slip it because copy took longer than expected. Keeping that single shared commitment honest was more of the management work than anything technical.

What it bought was that launch week was not a scramble. The model finished its last block and there was a site to point it at, a page that told the truth about it, a front end fast enough to keep a technical reader, and an analytics layer already recording the visitors who mattered from the first hour. None of that had waited on a training run, because none of it ever could have been improved by waiting. The order we almost accepted, prove the model and then present it, would have delivered a good model into a launch with no measured way for the market to reach or judge it. Two tracks converging on one date delivered both at once, and the only reason it worked is that we stopped treating the website as the thing that comes after.

Limitations

This is a delivery account, not a controlled study, and it should be read as one. The sixteen-week envelope, the four project blocks, the six-person accelerated staffing, and the roughly ten-hour training cycle are drawn from the engagement archive; the within-week placement of each block and each site segment in the exhibit is illustrative scheduling shape, chosen to make the parallel structure legible, not a logged Gantt chart. The serial spill the lever shows is a counterfactual, an estimate of what waiting would have cost, not a schedule we actually ran, so its exact width is illustrative rather than measured. We did not run a formal A/B test at launch, so no experimental effect sizes are claimed here; the argument for the analytics work is that it made trustworthy measurement possible, not that we have a lift number to report. Finally, this piece is scoped to the marketing site and its instrumentation. The self-serve digitisation product and the shared internal demo were their own efforts on their own timelines, and nothing here should be read as describing those surfaces.

References

Google. (2020). Web Vitals: essential metrics for a healthy site. web.dev. https://web.dev/articles/vitals
Kohavi, R., Tang, D., and Xu, Y. (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press. https://experimentguide.com/
Amplitude. (2021). The Amplitude Guide to Behavioral Analytics and Event Taxonomy Design. https://amplitude.com/blog/data-taxonomy-playbook
Kleppmann, M. (2017). Designing Data-Intensive Applications. O'Reilly Media. https://dataintensive.net/

Shipping a Product Website and Analytics While the Model Trained

Why the website could not be a downstream task

Two tracks, one launch

What shipped on the site track

Instrumenting before there was traffic to measure

What the parallel track cost, and what it bought

Limitations

References

Continuous AI for explorers

About Earthscan

Products

Legal

Follow us on