sFWI: physics-informed score-based generative modeling for robust full waveform inversion

Download the Audio (Right-click, Save-As)

If you've ever gone in for an ultrasound, you'll know how incredibly difficult they can be to interpret. A shifting mass of gray lines and curves, light areas, dark areas, and only the faintest hint of organs or shapes that a human eye can naturally recognize. This is why it takes a skilled technician to tell you what you're looking at: that's the fluid pooling in your knee, that's the left ventricle of your heart, or that's your baby's right arm. But even for an experienced ultrasound tech, there is still interpretation involved, and two specialists can often disagree about what they're seeing on the screen.

The ambiguity comes from the nature of the device. It works by sending high-frequency sound waves into the body, at just the right frequency to reach the tissue being examined without being absorbed too quickly, or passing through without useful reflection. As those waves move through you, they hit boundaries between different materials: fluid, muscle, bone, fat, air, and soft tissue. Some of the wave energy keeps traveling forward, some of it scatters, and some of it reflects back toward the probe. All of those returning echoes are then measured, timed, and converted into estimates of where those reflections must have come from. What you're left with is a grainy, black-and-white image: the ambiguity on the screen reflecting the uncertainty and imprecision of the estimates that created it.

Now, imagine that instead of a doctor trying to find a torn tendon, you're a geophysicist trying to figure out whether a particular tract of land is sitting on something valuable. You want to know if, deep beneath the surface somewhere, there's a reservoir of oil or gas. How would you figure that out?

Drill a hole as a test? Too expensive. And too limited, because one hole only tells you what is happening in one narrow column of earth.
Drill lots of holes? Even more expensive.
Look at the surface geology? Useful, but not enough.

No, if you're like most exploration teams, you'll use what we call FWI: full waveform inversion. It works on much the same principle as ultrasound. Except instead of sending down sound waves, you're sending down seismic waves. And instead of those waves traveling into a body and reflecting off organs and tissue, those waves are traveling through the Earth and reflecting off layers of rock, sediment, salt, faults, fluids, and other underground structures. But still, the same basic idea.

And just like ultrasound, the results are open to interpretation. When you perform this kind of seismic survey, you get back a mountain of data: waveforms recorded across many receivers, showing when the waves arrived, how strong they were, how their phase shifted, how they reflected, how they bent, and how they scattered as they moved through the subsurface. And it's your job to figure out what it all means. What's under there? And is it worth further exploration?

In this paper, the authors are arguing that traditional FWI is doing this part wrong. The waves are going into the ground correctly. They're coming back and being sensed correctly. But the interpretation of the data is off. Just like a second ultrasound tech coming in and contradicting what the first tech said, these authors are coming in and saying that there's a better, more accurate way to interpret the "picture" that FWI is producing. On today's episode, we'll walk through how the original FWI works, where the authors say it's lacking, and what they're suggesting to improve it. Let's dive in.

Traditional FWI begins with a provisional map of the subsurface, usually a velocity model that says how quickly seismic waves are expected to move through each part of the volume. That starting model might come from older surveys, well logs, lower-resolution seismic processing, or just geologic assumptions. From there, it asks a question:

'If this were a real patch of earth, and if we fired seismic waves into it from known locations, what would the receivers at the surface record?'

To answer that, it numerically solves the wave equation through the provisional map. A simulated wavefield travels through faster and slower regions, bends as velocity changes, reflects at sharp contrasts, scatters from complex structures, and eventually produces synthetic seismic traces. Those traces are the algorithm's prediction of what the survey should measure if the current provisional guess is actually correct. Then FWI measures how different these synthetic traces are from the actual observed waveforms. And it looks for places that don't line up. This mismatch could be in the form of timing errors, amplitude errors, phase errors, or differences in waveform shape.

You're only going to get one of two outcomes: either the synthetic traces and the observed waveforms line up (within a margin of error), or they don't.

If they do, then your provisional map was fine, and you already know what's under the ground.
If they don't, (that is: if the observed differences are meaningful), then you need to adjust your provisional model to better reflect reality. You need to nudge it in the right direction, and run your computations again.

And this process just loops. It applies a small update to the provisional model, runs the simulation again, compares the new synthetic data to the real data again, and keeps repeating this process, over and over. Each iteration moving the model closer to a structure whose simulated waveforms match the observed recordings.

In this paper, the authors' central criticism is that this traditional loop depends too much on the initial model. FWI does not search the entire space of possible underground structures equally. It begins from one guess and then follows local gradient information from that starting point. If the starting model is already close to reality, this can work well. But if the starting model is too far away, the simulated waves may be misaligned with the observed waves by a full cycle or more. And at that point, the inversion can match the wrong part of the waveform, reduce the numerical mismatch in a misleading way, and converge toward an incorrect structure. This is called the "cycle-skipping" problem. It's one of the reasons that FWI is described as nonlinear and "non-convex". The error surface does not look like one smooth bowl with the true answer at the bottom. It contains many local minima, many places where the algorithm can get stuck even though a better global explanation exists elsewhere. This paper is their fix.

So what do they do differently? Well, their proposed solution is to stop treating FWI as a search for one answer and instead treat it as a sampling problem. Rather than finding the one true structure, the goal becomes estimating the range of plausible subsurface models that are consistent with the observed data. To do that, their solution (called sFWI) uses a score-based diffusion model as a learned geological prior. During training, the model learns the structure of plausible velocity models: the kinds of layers, contrasts, textures, and spatial patterns that occur in the training distribution. And then at inference time, it starts from random noise and gradually transforms that noise into candidate models that look geologically plausible.

But geological plausibility alone is not enough. So their method also combines the learned prior with wave physics. Candidate models are passed through a forward wave simulator to generate synthetic seismic records, and those records are compared against the observed data. The sampling process is then guided toward models that remain plausible under the learned geological prior while also producing waveforms that agree with the measurement. This is the key conceptual shift:

We go from a single reconstruction pipeline that tries to converge on a single point.
To a diffusion model that supplies a realistic model space, paired with a physics operator that supplies the observational constraint.

The authors then add two mechanisms to make this more practical. First is group score search (GSS), which is used as the global search step. Instead of blindly generating random candidate models until one happens to match the data, GSS organizes plausible velocity models into structurally similar groups, compares how those groups behave in the seismic data domain, and directs the search toward the group most likely to explain the observation. This is how the method avoids the fragile initial-model dependency of traditional FWI. Second, DAPS-FWI performs the refinement step. Once GSS has found candidates in a plausible basin, DAPS-FWI alternates between diffusion-based denoising and physics-informed updates, repeatedly pulling the candidate model toward both the learned geological prior and the observed waveform data. The final output is, therefore, not just one reconstruction presented as the answer. It's a basket of plausible models. This makes uncertainty a desired part of the result, and is why they describe sFWI broadly as a move from deterministic inversion toward uncertainty-aware sampling.

If you want to go deeper, make sure you download the paper. The authors include a full mathematical derivation of their refinement algorithm, the architecture of their score network, the protocol of the benchmarks they ran against different baselines, and their complete statistical analysis of the results.