There is a kind of review that does not reject your paper and does not accept it either. It hands you a list. Reviewer 3 of our fracture-detection manuscript, the Detection-Transformer work from a roughly twenty-month engagement with a mid-sized Middle East carbonate operator, sent back five essential revisions and thirty-three additional comments. Read once, it felt like death by a thousand cuts. Read a second time, it was the most useful letter the paper received, because almost every comment pointed at one failure: we had written for people who already knew what we knew.
A separate reviewer on the same paper recommended outright rejection over validation design and closed code, and reversing that vote was its own story, told in our peer-review rescue write-up. This piece is about the other reviewer, the one who never used the word unpublishable and did more to improve the manuscript than anyone. Reviewer 3 read every line, and the thirty-three comments sort cleanly into three demands.
Demand one: I cannot read your figures
The first essential revision was blunt. The figure text was impossible to read, the labels were unclear, and the styling changed from one figure to the next as if several people had made them at different times, which is exactly what had happened. A reviewer who cannot read a figure cannot check the claim the figure is supposed to support, so an illegible figure is not a style problem. It is an evidence problem.
We rebuilt every figure in high definition, made the axis and panel labels legible at print size, and forced one consistent style across all of them. None of that changed a single result. All of it changed whether a result could be verified by someone who was not in the room when we made it. A figure is a load-bearing part of the argument, and if the reader has to squint, the argument is not on the page yet.
Demand two: be quantitative
The second demand arrived not as one comment but as the same two words returned again and again. Be quantitative. Wherever we had written that one configuration performed better, or that the method handled a case well, or that results were promising, Reviewer 3 asked for the number. It is a humbling exercise to grep your own manuscript for adjectives and find how many claims rest on them.
The conclusion took the heaviest rewrite. We had summarised the ablation in prose; the reviewer wanted the finding stated as a measured fact. So the conclusion now says the specific thing the numbers say: a fractures-only dataset outperforms the combined fracture-and-bedding dataset for fracture detection, while bedding detection is better on the combined data. That is a directional, checkable claim, not a warm sentence about promise.
The sharpest edge of this demand was a deletion. The abstract had carried a line asserting that increasing the number of wells improves performance. It read well and it was not earned. The ablation did not establish a monotone well-count relationship at the scale we had, and the sensitivity test that could have supported it needs a peak our small dataset would not produce, which we noted as future work rather than a present claim. So the line came out. Being quantitative sometimes means adding a number, and sometimes it means removing a sentence you cannot back.
Demand three: this is elementary, why is it relevant
The third demand felt personal and turned out to be the most valuable. Against several passages the reviewer wrote a version of the same question: this is elementary, why is it relevant. Underneath the sting was a precise instruction. We had left the connective tissue of the method implicit, assuming a reader would fill it in, and a good reviewer refuses to. Every place that comment landed, a disclosure was missing.
The largest gap was the input itself. The paper detects fractures from borehole-image logs, but a fracture reveals itself in more than the image, and we had never named the indirect indicators the interpretation leans on. So we named them as a set: gamma ray, formation density, sonic, flushed-zone resistivity, and caliper.
Naming that set changed how relevant the method looked, because it made explicit what a human interpreter actually reads. Two more disclosures followed from the same pressure. We stated plainly that the method works on both conductive and resistive fractures, and that because the mud type is not present in the data, the method is mud-type-agnostic by design: it keys on the sinusoidal discontinuity pattern of a fracture rather than on any assumption about the drilling fluid. Both facts were true of the method all along. Neither was on the page until a reviewer asked why the elementary parts mattered.
Then came the two disclosures I am most glad we were forced to make, because both were the difference between a number and an explained number.
The patch height was one. Our pipeline slices each image log into overlapping patches of a fixed height, and we had used 800 pixels without saying why. Reviewer 3 wanted the reason. The reason is physical: 800 pixels corresponds to about 2.2 metres of depth, and nearly all fracture sinusoid heights fall within 2 metres, so a patch that tall almost always contains a whole feature rather than a clipped one. Once written down, an arbitrary-looking constant became a design decision anyone could check against the sinusoid-height distribution.
The second was a spike we had shown but not explained. One figure carried a set of amplitude spikes at shifted values that looked, fairly, like a bug. It was a data problem, and the reviewer's pressure made us diagnose it in print. In that one problematic well, converting the static image to a dynamic one introduced an information loss: the pixel values spanned only 0 to 15 instead of the expected 0 to 255. A range that narrow is corruption, and the spike was its signature. Naming the mechanism turned a suspicious figure into a documented quality-control finding.
Behind all of this sat the biggest single ask: rebuild the methodology section so a reader could reproduce it. Not a code drop, which a confidentiality agreement forbids, but the full written specification. The section now carries the complete architecture, the training and inference workflow end to end, every loss-function equation rather than a reference to them, and the hyperparameter list. That is the sense in which thirty-three comments improve a paper. They do not soften it. They make it say what it was quietly assuming.
What adversarial review is actually for
It is easy to read a thirty-three-comment letter as hostility. It is more accurate to read it as free labour from the one person paid nothing to care whether your paper is right. Reviewer 3 never argued that our model was wrong. Every comment argued that our manuscript was underspecified, and the fixes were disclosures, quantifications, and figure rebuilds, not retractions.
The paper that came out the other side is stronger in a way that has nothing to do with the model weights. The figures can be read. The claims carry numbers or they were cut. The inputs are named, the design constants are justified, and a corrupted well is explained instead of hidden. For industrial AI work done behind a confidentiality wall, where you cannot lean on an open-code remedy to buy a reviewer's trust, this is most of what you have: a manuscript that survives a line-by-line read by someone determined to find where you waved your hands.
Limitations
This is one reviewer's letter on one paper, not evidence about peer review in general. The three-demand grouping is our own reading of Reviewer 3's comments, useful for telling the story but not a formal taxonomy. The fixes described here improved the manuscript's clarity and completeness; they did not change the underlying results, and a reader should not read figure rebuilds or added disclosures as new experiments. The account is anonymised under the same agreement that governs the work: the operator, the field, and the specific wells are held back, while the log names, patch dimensions, pixel-range values, and comment counts are reported as they stood in the review correspondence.