Skip to main content

Research

Domain Randomization Since 2017: What Transferred and What Did Not

A field retrospective on domain randomization as a sim-to-real strategy, five years after the idea was named. We survey which randomization axes generalised across robotics, autonomous driving, and document and scientific imaging, and which stayed domain-specific, crediting the public literature from 2017 onward. We then ground the survey in our own procedural well-log generator, which renders synthetic raster logs across a 3,200 to 12,800 pixel width range and reads curves off real scans it never saw in training. The pattern that holds: randomize the nuisance, anchor the signal.

The EarthScan Teamby The EarthScan Team12 min read
Research

Abstract

Domain randomization is the idea that you can train a model on cheap synthetic data and have it work on real inputs, provided the simulator varies its nuisance parameters widely enough that the real world looks like just another sample from the training distribution. The phrase was coined in robotics in 2017, and in the five years since it has been tried, with uneven success, across autonomous driving, manipulation, document understanding, and scientific imaging. This note asks a narrow question of that body of work: which randomization axes actually transferred across domains, and which turned out to be specific to the field that introduced them. We survey the public literature from 2017 onward, sort the axes into ones that generalised and ones that did not, and credit each idea to its origin. We then ground the survey in our own procedural well-log generator, which renders synthetic raster logs across a 3,200 to 12,800 pixel width range and 480 to 640 pixel height range, produces 20,000 two-curve logs and a final 15,000-curve training set, and reads curves off scanned paper logs it never encountered. The recurring rule across every domain is the same: randomize the nuisance hard, leave the signal alone.

The name comes from a 2017 robotics paper that trained an object detector entirely in a simulator with randomized textures, lighting, camera position, and clutter, then transferred it to a real robot arm without a single real training image (Tobin et al., 2017). The argument was almost philosophical: if the simulator varies its appearance so wildly that no two renders look alike, the network stops treating any particular rendering style as signal and learns only the structure that survives the variation. The real world, with its one fixed but unseen appearance, becomes one more draw from a distribution the network has already learned to ignore.

The idea had a near-simultaneous twin in autonomous flight, where a quadrotor learned a collision-avoidance policy from randomized synthetic hallways and flew through real corridors it had never seen (Sadeghi and Levine, 2017). Within a year the recipe crossed into perception for driving, where randomized synthetic scenes with varied object pose, texture, and background trained car detectors that held up on real road imagery (Tremblay et al., 2018). These three are the visual-appearance lineage: randomize what the pixels look like, keep the geometry of the thing you care about fixed.

A second lineage randomized something the appearance papers did not touch: the physics. Sim-to-real control transfer randomized the dynamics of the simulated robot itself, its masses, friction coefficients, and actuator delays, so the learned policy became robust to the inevitable gap between a simulator's physics and a real motor's (Peng et al., 2018). The dexterous-manipulation work pushed this further by making the randomization itself adaptive, widening each parameter's range automatically as the policy got stronger rather than fixing the ranges by hand (OpenAI et al., 2018). That move, treating the randomization schedule as a thing to be learned, is the most portable single idea in the whole literature, and we return to it below.

The much older root, predating the name, is in image segmentation. The U-Net paper trained on a handful of biomedical images and manufactured the variation it lacked through elastic deformation, randomizing the geometry of the training images to teach invariance to the deformations a microscope introduces (Ronneberger et al., 2015). Document understanding made the synthetic-source argument cleanest of all: render text from corpora through randomized fonts, colours, and backgrounds, and you get an inexhaustible, perfectly labelled detection corpus because you drew every glyph yourself (Gupta et al., 2016). Scientific-document imagery, the well log included, sits squarely in this last category: the artefact is itself a rendering of structured source data through a known drawing process, so the supervision can be rendered rather than collected.

Method

We surveyed the sim-to-real literature published through the first half of 2022 and classified the randomization axes each paper varied into a small set of categories shared across domains: geometry of the target object, sensor or scanning noise, lighting and contrast, perspective and camera placement, background and clutter, and underlying physics or dynamics. For each axis we recorded whether the paper's transfer result depended on it and whether other domains reported the same dependence. An axis "transferred" in our reading when at least two unrelated domains found that randomizing it was necessary for sim-to-real success; it stayed domain-specific when only the originating field needed it.

We then instantiated the survey's conclusions in a concrete generator of our own so the abstract axes have measured ground truth behind them. The generator is a procedural well-log renderer. It samples a curve geometry, draws it onto a synthetic paper grid at print resolution, applies a degradation model that imitates the scanning channel, and emits the rendered raster together with the pixel-exact mask of every curve. Nothing is hand-traced. The renderer spans the real archive's dimensional range deliberately: synthetic widths from 3,200 to 12,800 pixels and heights from 480 to 640 pixels, the ragged scales a real scan bench actually produces rather than tidy fixed tiles. We rendered a 20,000-log two-curve multiclass corpus and distilled a final training set of 15,000 synthetic curves, two curves per log, three output classes counting background, holding an 80 percent train and validation split throughout. The model reading these renders is VeerNet, our encoder-decoder convolutional network with a transformer attention stage on the bottleneck; here it serves only as the instrument that measures whether each randomization axis paid off.

The classification is the survey instrument's spine. Each axis carries a transfer-likelihood we read off the cross-domain evidence, and each is anchored to the concrete range our own generator varies it across. The instrument below lets a reader weight the axes by how much randomization budget to spend on each and watch the resulting transfer-likelihood respond, with every range pinned to the generator's real numbers.

Results

The survey separates cleanly into axes that crossed domains and axes that stayed home. Three transferred almost universally. Geometry randomization, varying the shape and placement of the target, was load-bearing everywhere from robot grasping to text detection to our own curve rendering, because the geometry of the target is precisely the thing the model must learn to recognise under any appearance. Sensor and scanning noise transferred because every acquisition channel, a camera, a lidar, a flatbed scanner, imposes its own corruption, and a model that has only seen clean renders fails the instant it meets a real sensor. Perspective and lighting transferred in every vision domain for the same reason: they are the canonical nuisances, varying wildly in the real world while carrying no information about the target.

Two axes stayed domain-specific. Physics and dynamics randomization was decisive for control policies (Peng et al., 2018) and irrelevant to a static-image segmenter, because a scanned log has no dynamics to get wrong. Heavy background and clutter randomization mattered for detection in cluttered scenes and far less for document imagery, where the background is a known printed grid rather than an unconstrained natural scene. The one meta-axis that transferred best of all was not an appearance parameter at all but the scheduling idea from automatic domain randomization (OpenAI et al., 2018): widen the ranges as the model strengthens. Holding the geometry and noise ranges wide while keeping the signal definition fixed is exactly what let our generator's renders stand in for real scans.

The instrument makes the trade-off operable. Each row is a randomization axis; weighting it spends synthetic budget on that axis, and the aggregate transfer-likelihood bar responds, anchored to the ranges our generator actually sweeps.

DOMAIN RANDOMIZATION SINCE 2017 · WHAT TRANSFERRED78aggregate transfer-likelihoodWeight each randomization axis; watch what crosses domainsBase likelihoods credit the public sim-to-real literature, 2017 onward.drag a lever / Arrow keys · teal = axes that transferred · orange = the aggregate the weights argueAXIS · VERDICT · ANCHORBUDGET WEIGHTCONTRIBGeometry of targetTRANSFERREDcurve shape sampled per lognonefull64Scanning noiseTRANSFERREDdegradation model on the scannonefull53Perspective / lightingTRANSFERREDcontrast + skew per rendernonefull42Adaptive scheduleTRANSFERREDwide ranges over 20,000 logsnonefull54Background / clutterDOMAIN-SPECIFICprinted grid, low clutternonefull10Physics / dynamicsDOMAIN-SPECIFICnot applicable to imagesnonefull2WHERE THE WEIGHTS LANDAGGREGATE TRANSFER-LIKELIHOOD78/ 100spending on transferred axes lifts it;domain-specific axes do notAnchored to our generatorREALwidth 3,200-12,800 pxheight 480-640 px20,000 logs, 2/log15,000 curves80% train split0 hand labelsSourced: generator ranges 3200-12800px width, 480-640px height, 20,000 logs, 2 curves/log, 15,000 final curves, 80%train split. Per-axis base likelihoods + the weight-to-aggregate mapping encode the survey verdict, schematic.
A survey instrument, not a result of ours. Five years of sim-to-real work converged on a small set of randomization axes, and only part of that portfolio crossed domains. Each row is one axis with a base cross-domain transfer-likelihood read off the public literature from 2017 onward: geometry of the target, scanning noise, perspective and lighting, and the adaptive-schedule meta-strategy transferred almost everywhere, while background and clutter and physics and dynamics stayed specific to robotics and natural-scene detection. Weight how much synthetic budget to spend on each axis with its lever, and the aggregate transfer-likelihood bar responds: spending on a transferred axis lifts it, spending on a domain-specific one barely moves it. Every axis is anchored to the concrete range our own procedural well-log generator sweeps it across. Sourced: the generator's width range of 3,200 to 12,800 pixels, height range of 480 to 640 pixels, 20,000 two-curve logs generated, 15,000 final synthetic curves, 2 curves per log, and the 80 percent train and validation split. The per-axis base likelihoods and the weight-to-aggregate mapping encode the survey's cross-domain verdict and are schematic, flagged on the canvas.

Grounded in our own numbers, the pattern holds. A VeerNet trained only on these procedurally randomized logs reads curves off real raster scans at high fidelity, the best per-curve reconstruction reaching a coefficient of determination of 0.9891. The generalisation did not come from rendering prettier logs; it came from rendering across the 3,200 to 12,800 pixel width range and the 480 to 640 pixel height range so widely that a real scan's particular dimensions were already inside the training support. That is the 2017 argument (Tobin et al., 2017), restated for scientific documents: randomize the nuisance until the real input is unremarkable.

Randomization axisCross-domain verdictOur generator's anchor
Geometry of targetTransferredcurve shape sampled per log
Sensor / scanning noiseTransferreddegradation model imitates the scan
Perspective / lightingTransferredcontrast and skew varied per render
Background / clutterDomain-specificprinted grid, low clutter
Physics / dynamicsDomain-specificnot applicable to static images
Adaptive schedule (meta)Transferred bestwide ranges held over 20,000 logs

Discussion

The five-year retrospective lands on a distinction the original papers gestured at but rarely stated outright: domain randomization is not one technique but a portfolio, and only part of the portfolio crosses domains. The transferable part is the appearance and acquisition nuisance, geometry, noise, perspective, lighting, plus the meta-strategy of widening ranges as competence grows. The non-transferable part is whatever is physically specific to the source domain, dynamics for robots, clutter for natural scenes. A team adopting domain randomization in a new field should expect to import the nuisance axes wholesale and to re-derive the domain-specific ones from scratch.

Where our work sits in that map is at the document and scientific-imaging end, which is the friendliest possible setting for the strategy because the artefact is synthetic by construction. We did not have to imitate an open-ended natural world; we had to imitate a flatbed scanner reading a printed grid, a far narrower and more knowable channel. That is why the appearance axes transferred so cleanly into our generator and why we could skip the physics and clutter axes entirely. The honest framing is that the well log is an easy case for sim-to-real, and the value of saying so is that it tells the next team where the strategy is cheapest to adopt: anywhere the input is itself a rendering of known source data.

The one caution the survey raises for our own setting is the registration-versus-reconstruction split familiar from thin-structure segmentation. Wide randomization buys robust curve reconstruction, the axis the petrophysicist is graded on, far more reliably than it buys pixel-exact mask overlap on a one-pixel-wide curve. Domain randomization fixes the reality gap of appearance; it does not repeal the geometry of measuring overlap on near-zero-area targets. The two failure modes are independent, and conflating them is the most common misreading of a synthetic-data result we encountered in the survey.

Limitations

This is a survey, and a survey inherits its sources' blind spots. The cross-domain verdicts above are our reading of the public literature through the first half of 2022, not a controlled meta-analysis, and the line between an axis that transferred and one that stayed local is a judgement call where two domains disagree. The grounding numbers come from a single procedural generator with one fixed degradation model, so the axes it exercises are the ones we chose to render; corruptions we never wrote down, torn margins, coffee stains, hand annotations across a curve, sit outside the synthetic support no matter how wide the ranges. And the well log is, by our own argument, an easy case. The verdicts may read differently for a domain whose real inputs are not renderings of known source data.

References

[1] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, P. Abbeel. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. IROS 2017. https://arxiv.org/abs/1703.06907

[2] F. Sadeghi, S. Levine. CAD2RL: Real Single-Image Flight without a Single Real Image. RSS 2017. https://arxiv.org/abs/1611.04201

[3] J. Tremblay et al. Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization. CVPR Workshops 2018. https://arxiv.org/abs/1804.06516

[4] X. B. Peng, M. Andrychowicz, W. Zaremba, P. Abbeel. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. ICRA 2018. https://arxiv.org/abs/1710.06537

[5] OpenAI et al. Learning Dexterous In-Hand Manipulation. IJRR 2020 (automatic domain randomization, preprint 2018). https://arxiv.org/abs/1808.00177

[6] O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI 2015. https://arxiv.org/abs/1505.04597

[7] A. Gupta, A. Vedaldi, A. Zisserman. Synthetic Data for Text Localisation in Natural Images. CVPR 2016. https://arxiv.org/abs/1604.06646

[8] B. Yuan, Q. Yang. Digitization of Well-Logging Parameter Graphs Based on a Gridlines-Elimination Approach. J. Pet. Explor. Prod. Technol., 2019. http://www.jsoftware.us/show-409-JSW15423.html

[9] A. McDonald. Using the missingno Python Library to Identify and Visualise Missing Data Prior to Machine Learning. Towards Data Science, 2021. https://towardsdatascience.com/using-the-missingno-python-library-to-identify-and-visualise-missing-data-prior-to-machine-learning-34c8c5b5f009

Go to Top

© 2026 Copyright. Earthscan