Video Synchronization options in FPGA

When dealing with live video, especially in FPGAs or ASICs, you’ll inevitably encounter synchronization challenges. These can range from simple refresh rate conversions and multi-stream synchronization to video alignment tasks, like line or frame alignment.

For small, arbitrary data transfers, on-chip memory resources are typically sufficient for clock domain crossing or synchronization. However, this approach isn’t feasible for live video, as even a single frame is too large for the on-chip memory of most FPGAs. Let’s do a quick calculation : A single Full HD (FHD) frame with 32-bit color depth requires: (1920x1080x32/(36*1024)) = 1800 BRAM36 resources. Considering that mid-range FPGAs often have only a few hundred of these resources, storing a full uncompressed frame on-chip is not possible.

This is where the term “Video Buffer” comes in. In simple terms, it’s a dedicated space in external DDR memory (like DDR3, DDR4, or LPDDR5) used to store incoming video frames. Depending on the application’s requirements, a system commonly uses two (dual-buffering) or three (triple-buffering) frame slots per video stream. This is precisely the purpose of my own Video Buffer IP. A simplified connection scheme might look as follows:

A video buffer is a generic unit used to reconcile different refresh rates between the receive (RX) and transmit (TX) sides of a system. It can also absorb back-pressure from a video processing pipeline, which might include tasks like CRC calculation, video blending, scaling, or compression.

Problems arise when video buffers cannot be used, yet there’s a requirement to synchronize a received video stream with a transmitted one. The core issue is that the RX and TX clocks are never perfectly identical. Even if both interfaces are set to a nominal refresh rate of, for example, 60 Hz, they will inevitably differ slightly due to the stability tolerance of their respective clock oscillators. In practice, the RX rate might be $59.998 Hz$ while the TX rate is $60.003 Hz$ .

This discrepancy is why many high-speed protocols, like PCIe, tolerate minor clock offsets (measured in ppm) and use special symbols to compensate for rate differences. While the timing error from a single frame is negligible, these errors accumulate. After hundreds of frames, the cumulative difference can become too large for a standard FIFO buffer to manage, leading to buffer underruns or overruns. This synchronization challenge can be addressed by using an external jitter attenuator, such as the SI5392.

A jitter attenuator works by taking the recovered RX video clock as a reference, tracking its frequency deviations, and generating a TX clock that mirrors those same deviations. This propagates any clock drift from the RX side to the TX side, making their frequencies practically equivalent (i.e., both running at $59.9993 Hz$ ).

However, there is an alternative to using an external component. The AMD UltraScale+ QPLLs (specifically QPLL0 and QPLL1) can perform this same function internally using their fractional divider mode, also known as Sigma-Delta Modulation (SDM). This feature is detailed in the GTH (UG576) and GTY (UG578) t transceiver user guides.

To make this work, one must implement a digital feedback loop using a “deep enough” FIFO. This FIFO needs to be large enough to absorb clock rate mismatches over a single video frame. A depth equivalent to “a couple of video lines” is a good rule of thumb.

The control mechanism is a state machine that continuously adjusts the TX clock based on the FIFO’s fill level.

Initialization: First, reset the FIFO. Then, wait for it to become exactly half-full. This half-full point is your target set-point.
Synchronization Start: Once the FIFO reaches this set-point, you can start generating the TX video timing signals (active video, blanking, syncs, etc.).
Periodic Check & Adjustment: The key is to check the FIFO’s fill level at a precise interval—exactly once per frame.
- For an FHD resolution at 60 Hz with a total frame size of $2200 \times 1125 = 2, 475, 000$ clock cycles, you would perform this check every $2, 475, 000$ clock cycles.
- If the FIFO fill level is above the half-full set-point, it means the RX rate is faster than the TX rate. You need to speed up the TX clock.
- If the FIFO fill level is below the half-full set-point, the TX rate is faster. You need to slow down the TX clock.
- Clock Adjustment: The QPLL fractional mode makes this adjustment simple. To speed up the clock, you increment the fractional part of the divider; to slow down, you decrement it. A major advantage of this method is that these changes can be made without using the Dynamic Reconfiguration Port (DRP).
Error Handling: If the FIFO runs completely full or empty (which could happen on an initial lock or with a large frequency shift), the correct action is to reset the synchronization logic and restart the synchronization process, perhaps using the last known-good fractional value as a starting point. The increment/decrement value could be static or dynamically reflect the “Out of the expected fill level”. Meaning that small differences will lead to minor adjustments, while large differences needs more aggressive corrections.

Depending on the differences between the RX an TX clocks, the reset might be re-triggered several times – causing the external video sink to flicker. But if the feedback loop is working properly, it will become stable eventually. Its worth mentioning though that changing the QPLL VCO clock far away from its base operating frequency might require the PLL to be reset. Always make sure that you are adjusting only per the documented changes in the respective user guide or vendor documentation.