How Vibration Amplification Works: Phase, Pyramids
and the Science of Seeing Motion
VibraVizja® produces amplified videos in which surface vibration at selected frequencies is made visible to the human eye. This article explains, step by step, exactly how the algorithm achieves that — from the first mathematical transformation applied to each video frame to the final reconstructed output. No prior knowledge of image processing is assumed.
Most vibration measurement tools produce a number: an overall level, a spectrum, a trend over time. VibraVizja® produces something different — a video in which the motion of every visible surface has been amplified to a degree that makes it perceptible to the human eye, at the specific frequencies you select. The engineering result is familiar: a frequency spectrum, a resonance frequency, an operational deflection shape. What is different is that all of this is extracted from video, without contact, and resolved spatially across the full field of view.
The method is called phase-based video motion amplification. It builds on foundational work by Wadhwa, Rubinstein, Durand and Freeman at MIT, published in 2013, and on Simoncelli's complex steerable pyramid transform developed in the early 1990s. The core insight of that research — which the VibraVizja® algorithm applies to industrial vibration diagnostics — is this: the algorithm tracks how the phase of local image structure changes. Phase turns out to be a robust signal for measuring displacement.
'Phase tells you where local structures are. If you want to measure how something moves, phase is the right signal to track.'
Step 1 — Decomposing the Image
Each frame of the video is first transformed using a structure called a complex steerable pyramid. Think of it as a bank of spatially-tuned filters applied simultaneously at multiple scales and multiple orientations. Conceptually it is similar to an octave-band filter bank in acoustic analysis: just as an acoustic filter bank separates a sound into frequency bands, the steerable pyramid separates an image into spatial-frequency bands — but in two dimensions, capturing both scale (from fine detail to coarse structure) and orientation (horizontal, diagonal, vertical).
Step 2 — Why Phase Encodes Motion
The connection between phase and displacement comes from a well-established result in signal processing called the Fourier shift theorem. It states that shifting a signal in space is equivalent to adding a constant to its phase in frequency space. In plain terms: if the pattern captured by a filter at a given scale and orientation moves by a small distance, the phase of the corresponding complex pyramid coefficient changes by an amount proportional to that distance. Measuring the change in phase is equivalent to measuring displacement — without computing optical flow, and without explicitly tracking any point.
The practical consequence is significant. The pyramid coefficients can detect motion far smaller than a single pixel in the image. Displacements of a fraction of a pixel — corresponding to physical surface movements in the range of tens of micrometres at normal working distances — produce measurable phase changes in the pyramid subbands. This is the sub-pixel sensitivity that makes it possible to amplify vibrations that are genuinely invisible in the raw video.
Step 3 — Building the Vibration Spectrum
Once the pyramid decomposition is computed for every frame of the video, the algorithm has a time series of phase values at each spatial location, for each scale and orientation in the pyramid. A surface vibrating at 25 Hz will produce a phase time series that oscillates at 25 Hz. The amplitude of that oscillation in the phase domain is related to the physical displacement amplitude of the surface at that point.
To identify which frequencies are present in the motion, the algorithm applies a Fast Fourier Transform to each of these phase time series — exactly the same FFT that a vibration analyst applies to an accelerometer time signal. The result is a vibration spectrum, but one that exists at every pixel simultaneously. This is the spatial resolution that a point sensor fundamentally cannot provide: not one spectrum per measurement point, but a spatial map of vibration frequency content across the entire field of view.
Step 4 — Selecting the Frequency Band
With the full motion spectrum available, the algorithm applies a temporal bandpass filter to the phase time series at each location. The analyst selects a frequency range — for example, 10 Hz to 50 Hz to cover a machine's fundamental and second harmonic, or a narrow band around a known resonance peak. The filter suppresses phase variations outside that range: low-frequency structural drift, rigid body motion, and high-frequency noise are removed. What remains is the phase signal corresponding exclusively to motion at the frequencies of interest.
This step maps directly onto classical vibration analysis. Selecting a frequency band in the phase domain is the same operation as focusing on a specific region of an accelerometer spectrum. The difference is that the filter operates simultaneously on every pixel in the spatial field, preserving the full structural distribution of the vibration energy at the selected frequencies.
Step 5 — Amplifying the Phase
Once the filtered phase signal is isolated at each pixel and subband, the algorithm multiplies the phase deviations by an amplification factor — typically between 10 and 100, depending on the magnitude of the motion and the quality of the video. Because phase shift is related to spatial displacement, multiplying the phase by a factor produces a video in which all motion at the selected frequencies are larger than it physically is.
This is where the noise advantage of the phase-based approach becomes critical. In a raw video, noise appears as random brightness fluctuations. In the phase domain, this noise is incoherent: it has no preferred direction and no systematic temporal structure. Physical vibration, by contrast, produces coherent phase variations that repeat at the vibration frequency. When the temporal filter is applied, the incoherent component is strongly attenuated, so the amplification factor only amplifies the coherent component of the vibration signal. This is why phase-based amplification supports factors of ×30 to ×100 without the severe noise degradation.
Step 6 — Reconstruction
With the phase modified, the algorithm applies the inverse complex steerable pyramid: it reassembles all the subbands — with the amplified phase at the selected frequencies, and the unmodified amplitude — back into a complete video frame. This is done independently for each frame. The result is a video that looks largely like the original scene, except that motions at the selected frequency band are spatially amplified and visually clear. Colour, texture, and static structure are preserved.
One refinement in the above implementation is amplitude-weighted spatial smoothing of the phase signal before the amplification step. In image regions where the pyramid subband amplitude is low — for example a uniformly lit flat surface with no spatial texture — the phase estimate is inherently noisy. Weighting the phase spatially by the local subband amplitude before amplification suppresses these unreliable estimates without sacrificing spatial resolution in textured regions. The result is a cleaner amplified output particularly in low-contrast areas of the scene.
Why This Matters for Vibration Diagnostics
The engineering value of the algorithm follows directly from its properties. Because the algorithm is sensitive to sub-pixel motion structural vibrations, far too small to be seen by the naked eyes, can be detected and amplified. Because the temporal filter operates in the same frequency domain as classical vibration analysis, the output is immediately interpretable: the amplified video at a resonance frequency is the operational deflection shape of the structure at that frequency — without instrumentation, without a finite element model, and without the network of accelerometers that would be needed to reconstruct it from point measurements.
Because the method is camera-based and contactless, it resolves vibration spatially across the full field of view in a single recording: every visible surface, every connection point, every span of pipe or structural member, simultaneously. The spatial picture, which point-sensors can only sample at discrete locations, is captured completely. This is what makes vibration amplification a complement to accelerometer-based monitoring rather than a competitor — accelerometers provide, high-frequency, time-resolved data at fixed points; vibration amplification provides the spatial context that connects those points and reveals how the structure moves between them.
See the Algorithm Applied to Your Equipment
A VibraVizja® measurement session produces the amplified video, the vibration frequency content at each spatial point, and the operational deflection shapes at the frequencies you select — from a single camera recording, without contact.
Request a Free On-Site Trial