Signal ProcessingPhase-Based AnalysisAlgorithm 9 April 2026

How Vibration Amplification Works: Phase, Pyramids
and the Science of Seeing Motion

VibraVizja® produces amplified videos in which surface vibration at selected frequencies is made visible to the human eye. This article explains, step by step, exactly how the algorithm achieves that — from the first mathematical transformation applied to each video frame to the final reconstructed output. No prior knowledge of image processing is assumed.

SP
Santiago Pighin
Lead Scientist · VibraVizja®

Most vibration measurement tools produce a number: an overall level, a spectrum, a trend over time. VibraVizja® produces something different — a video in which the motion of every visible surface has been amplified to a degree that makes it perceptible to the human eye, at the specific frequencies you select. The engineering result is familiar: a frequency spectrum, a resonance frequency, an operational deflection shape. What is different is that all of this is extracted from video, without contact, and resolved spatially across the full field of view.

The method is called phase-based video motion amplification. It builds on foundational work by Wadhwa, Rubinstein, Durand and Freeman at MIT, published in 2013, and on Simoncelli's complex steerable pyramid transform developed in the early 1990s. The core insight of that research — which the VibraVizja® algorithm applies to industrial vibration diagnostics — is this: instead of tracking how pixel brightness changes over time, which amplifies noise as aggressively as it amplifies signal, the algorithm tracks how the phase of local image structure changes. Phase turns out to be a far more robust signal for measuring displacement.

'Pixel brightness tells you what a surface looks like. Phase tells you where it is. If you want to measure how something moves, phase is the right signal to track.'

DECOMPOSE1STEERABLEPYRAMIDscales × orientationsMEASURE2PHASETRACKINGφ(x,y,t) per subbandANALYSE3FFT +TEMPORAL FILTERselect freq bandAMPLIFY4PHASEAMPLIFICATIONΔφ → α · ΔφRECONSTRUCT5INVERSEPYRAMIDreassemble frames
The five-step pipeline: each frame is decomposed, phase is tracked and transformed, filtered to the frequency band of interest, amplified, then reconstructed

Step 1 — Decomposing the Image

Each frame of the video is first transformed using a structure called a complex steerable pyramid. Think of it as a bank of spatially-tuned filters applied simultaneously at multiple scales and multiple orientations. Conceptually it is similar to an octave-band filter bank in acoustic analysis: just as an acoustic filter bank separates a sound into frequency bands, the steerable pyramid separates an image into spatial-frequency bands — but in two dimensions, capturing both scale (from fine detail to coarse structure) and orientation (horizontal, diagonal, vertical).

What makes this pyramid particularly useful is that its filters come in quadrature pairs — an even-symmetric (cosine-like) filter and an odd-symmetric (sine-like) filter at the same scale and orientation. Combining the output of these two filters at each pixel location produces a complex-valued number. That complex number has two components that carry fundamentally different kinds of information: its magnitude describes how strongly a particular spatial pattern is present at that location; its phase describes where that pattern is positioned in space. The phase is what the algorithm is interested in.

Step 2 — Why Phase Encodes Motion

The connection between phase and displacement comes from a well-established result in signal processing called the Fourier shift theorem. It states that shifting a signal in space is equivalent to adding a linear ramp to its phase in frequency space. In plain terms: if the pattern captured by a filter at a given scale and orientation moves by a small distance, the phase of the corresponding complex pyramid coefficient changes by an amount proportional to that distance. Measuring the change in phase is equivalent to measuring displacement — without computing optical flow, and without explicitly tracking any point.

The practical consequence is significant. The pyramid coefficients can detect motion far smaller than a single pixel in the image. Displacements of a fraction of a pixel — corresponding to physical surface movements in the range of tens of micrometres at normal working distances — produce measurable phase changes in the pyramid subbands. This is the sub-pixel sensitivity that makes it possible to amplify vibrations that are genuinely invisible in the raw video.

INPUT FRAMEDECOMPOSECOMPLEX STEERABLE PYRAMID SUBBANDSSCALE 1 (fine)90°SCALE 2 (medium)90°SCALE 3 (coarse)90°EACH COEFF.COMPLEX COEFFICIENTA= pattern strengthφ= pattern positiondφ/dt over time = surface velocity at that locationphase change rate = motion — no optical flow computation required
Each frame is split into subbands at multiple scales and orientations — each subband coefficient carries an amplitude (pattern strength) and a phase (pattern position). Changes in phase over time are the motion signal.

Step 3 — Building the Vibration Spectrum

Once the pyramid decomposition is computed for every frame of the video, the algorithm has a time series of phase values at each spatial location, for each scale and orientation in the pyramid. A surface vibrating at 25 Hz will produce a phase time series that oscillates at 25 Hz. The amplitude of that oscillation in the phase domain is proportional to the physical displacement amplitude of the surface at that point.

To identify which frequencies are present in the motion, the algorithm applies a Fast Fourier Transform to each of these phase time series — exactly the same FFT that a vibration analyst applies to an accelerometer time signal. The result is a vibration spectrum, but one that exists at every pixel simultaneously. This is the spatial resolution that a point sensor fundamentally cannot provide: not one spectrum per measurement point, but a continuous spatial map of vibration frequency content across the entire field of view.

Step 4 — Selecting the Frequency Band

With the full motion spectrum available spatially, the algorithm applies a temporal bandpass filter to the phase time series at each location. The analyst selects a frequency range — for example, 10 Hz to 50 Hz to cover a machine's fundamental and second harmonic, or a narrow band around a known resonance peak. The filter suppresses phase variations outside that range: low-frequency structural drift, rigid body motion, and high-frequency noise are removed. What remains is the phase signal corresponding exclusively to motion at the frequencies of interest.

This step maps directly onto classical vibration analysis. Selecting a frequency band in the phase domain is the same operation as focusing on a specific region of an accelerometer spectrum. The difference is that the filter operates simultaneously on every pixel in the spatial field, preserving the full structural distribution of the vibration energy at the selected frequencies.

Step 5 — Amplifying the Phase

Once the filtered phase signal is isolated at each pixel and subband, the algorithm multiplies the phase deviations by an amplification factor — typically between 10 and 100, depending on the magnitude of the motion and the quality of the video. Because phase shift is proportional to spatial displacement, multiplying the phase by a factor of 50 produces a video in which all motion at the selected frequencies appears 50 times larger than it physically is.

This is where the noise advantage of the phase-based approach becomes critical. In a raw video, noise appears as random brightness fluctuations. In the phase domain, this noise is incoherent: it has no preferred direction and no systematic temporal structure. Physical vibration, by contrast, produces coherent phase variations that repeat at the vibration frequency. When the amplification factor is applied, the coherent vibration signal grows proportionally while the incoherent noise does not accumulate in the same way. This is why phase-based amplification supports factors of ×30 to ×100 without the severe noise degradation that would destroy any method based on amplifying brightness differences directly.

PHYSICAL VIBRATION — COHERENT PHASEbefore amplification:small but repeating — same freq every cycle× α (e.g. ×50)after amplification:MOTION ×50 — NOW VISIBLEIMAGE NOISE — INCOHERENT PHASEbefore amplification:random — no preferred direction or frequency× α (same factor)after amplification:STILL INCOHERENT — NO COHERENT BUILD-UP
Physical vibration produces coherent phase oscillations that grow with amplification — image noise is incoherent in phase and does not accumulate. This allows amplification factors of ×50 or more without noise dominating the output.

Step 6 — Reconstruction

With the phase modified, the algorithm applies the inverse complex steerable pyramid: it reassembles all the subbands — with the amplified phase at the selected frequencies, and the unmodified amplitude and phase everywhere else — back into a complete video frame. This is done independently for each frame. The result is a video that looks largely like the original scene, except that motions at the selected frequency band are spatially amplified and visually clear. Colour, texture, and static structure are preserved.

One refinement in the VibraVizja® implementation is amplitude-weighted spatial smoothing of the phase signal before the amplification step. In image regions where the pyramid subband amplitude is low — for example a uniformly lit flat surface with no spatial texture — the phase estimate is inherently noisy. Weighting the phase spatially by the local subband amplitude before amplification suppresses these unreliable estimates without sacrificing spatial resolution in textured regions. The result is a cleaner amplified output particularly in low-contrast areas of the scene.

Why This Matters for Vibration Diagnostics

The engineering value of the algorithm follows directly from its properties. Because displacement is encoded in phase, the measurement is sensitive to sub-pixel motion: structural vibrations far too small to be seen by a camera can be detected and amplified. Because the temporal filter operates in the same frequency domain as classical vibration analysis, the output is immediately interpretable: the amplified video at a resonance frequency is the operational deflection shape of the structure at that frequency — without instrumentation, without a finite element model, and without the network of accelerometers that would be needed to reconstruct it from point measurements.

Because the method is camera-based and contactless, it resolves vibration spatially across the full field of view in a single recording: every visible surface, every connection point, every span of pipe or structural member, simultaneously. The spatial picture that point sensors can only approximate is captured in its entirety. This is what makes vibration amplification a complement to accelerometer-based monitoring rather than a competitor — accelerometers provide continuous, high-frequency, time-resolved data at fixed points; vibration amplification provides the spatial context that connects those points and reveals how the structure moves between them.

See the Algorithm Applied to Your Equipment

A VibraVizja® measurement session produces the amplified video, the vibration frequency content at each spatial point, and the operational deflection shapes at the frequencies you select — from a single camera recording, without contact.

Request a Free On-Site Trial