Video Buffer IP

by | Mar 19, 2024

Overview

The Video Buffer IP provides capability to buffer video frames inside external DDR memory. It is specifically designed around AMD’s Memory interface IP (PG150) and its UI memory interface.  It is fully designed and verified in modern VHDL2008 and tested in hardware. Amount of video channels is configurable through generic definitions. The IP even supports auxiliary communication channel for 3-rd party IPs to/from the DDR memory. Amount of video buffers per channel is by default 3 to ensure the fastest response time and reduce the frame latency to a minimum. The video buffer arbiter always selects the most up-to-date completed video buffer for read side (Which in turn means that in theory, some frames might not be read at all) and the oldest video buffer for write side.

All data reception and transmission ports are AXI video stream compliant and various status and report signals are available on a per-channel / per-buffer basis, including the detection of a stalled video channel (A video channel where no new data are being received).  In order to further maximize efficiency of the transfers, all transactions heading to or from the memory interface are bursted to minimize random-accesses to the memory interface. Provided AXI4-Lite Control & status interface can be used to visualize the DDR interface’s available and used bandwidths.

Features

  • Arbitrary amount of video channels
  • Configurable pixel data width
  • Configurable pixel packing and unpacking factors
  • Usage of 3 video buffers per channel by default
  • RX / TX statistics on a per buffer and channel basis
  • Optional auxillary communication channel
  • Stall video detection on a per channel basis
  • AXI4-Lite Status and Configuration Interface
  • Pure VHDL 2008 vendor-independent code
FAQ
  • Is the IP Vendor Independent?
    Yes, the IP is currently vendor Independent, tested in hardware on AMD FPGA. The Asynchronous FIFOs used for CDC can be configured to use either AMD AXI Stream FIFO macros or pure RTL code, which fully supplements the CDC crossing. The amount of CDC stages can also be configured. However – the DDR interface expects to be connected to the UI interface as decribed in AMD’s PG150. AXI4 Interface is not currently supported to improve performance and reduce resources of the IP.
  • How many video channels are supported?
    The amount of video channels is arbitrary and is not in theory limited – not even to a power-of-2. However, for practical reasons, make sure that the DDR interface is capable to serve all required channels. Having not enough bandwidth in the memory interface will cause video sinks to lose data or starve the video sources for data. The AXI4-Lite interface addresing limits the maximum video channel count to 16.
  • What is the purpose of the AUX channel?
    The aux channel is just an auxillary data channel that can be used to read/write data from arbitrary locations in the RAM (IE It might also overwrite the Video Buffer Locations). The Video Buffer Locations are intended to be static memory locations, but it is possible change these locations dynamically when the channels are not operating.
  • Can the Video Sources and Sinks be used for non-video data?
    Generally speaking yes. The IP doesnt’ care or control which data it is buffering. The data scheme should however reflect the video frame notations – mostly TLAST / TUSER signallization to mark end of a “data packet”. Also note that in case of using “Single long line of data” might require increasing the bitwidth of the pixel/line counters. I would suggest that you communicate the required functionality with me before using the IP this way.
  • Can we ignore the S_AXI_TREADY (Backpressure) signals on write side?
    This is a tricky question. If the Memory interface is overloaded, it will deassert the tready signal, which will cause the data to be lost on the interface. I therefore highly encourage to not ignore the ports in the design. The TREADY might also be deasserted when the video sink changes video buffer for a few clock cycles. If you can however guarantee that the interface will not be overloaded and that each video frame will have a reasonable VSYNC time, then it can be generally ignored at your own risk.
  • What are the maximum and minimum video refresh rates?
    There is no minimum and/or maximum refresh rate. Sending one frame in a year is still considered a valid scenario. Maximum refresh rate is rather bound by the speed of the memory interface and resolution of the video. The IP doesnt limit anyhow the refresh rates.
  • What are the color mappings on S_AXI_TDATA / M_AXI_TDATA?
    The IP is agnostic to the mapping. The user is free to choose which bits maps to which TDATA bit locations.

NOTE: The IP’s resource estimates and FMAX timing performance were evaluated with Vivado 2024.2 tool targetting xcku5p-ffvb676-2-e device (KCU116 – Kintex Ultrascale+) or xcu50-fsvh2104-2-e (Alveo U50 – Virtex Ultrascale+). All evaluation was done with the default tool configuration for Synthesis / Optimization / Place & Route. Due to the excessive amount of possible IP configuration settings, only a selected portion was evaluated with:

  • 128-bit UI: Packing of 4 Pixels / Word 
  • 256-bit UI: Packing of 8 Pixels / Word
  • 512-bit UI: Packing of 16 Pixels / Word

Default pixel size of 24-bit with other settings kept at default values.

Resources 128-bit
Channels LUTs[K] FFs[K] BRAM36 DSP48 FMAX[MHz]
1 2.5 4.6 3 0 450
2 4.2 7.9 6 0 450
3 6.2 11.1 9 0 450
4 7.9 14.3 12 0 450
5 9.5 17.6 15 0 450
6 11.1 20.7 18 0 450
7 12.6 24 21 0 450
8 14.2 27.1 24 0 450

 

Resources 256-bit
Channels LUTs[K] FFs[K] BRAM36 DSP48 FMAX[MHz]
1 3.4 6.8 6 0 400
2 5.7 11.7 12 0 400
3 8 16.5 18 0 400
4 10.8 21.3 24 0 400
5 13.3 26.2 30 0 400
6 15.4 31 36 0 400
7 19 35.8 42 0 400
8 21.4 40.6 48 0 400

 

Resources 512-bit
Channels LUTs[K] FFs[K] BRAM36 DSP48 FMAX[MHz]
1 4.5 11.2 12 0 350
2 8.5 19.3 24 0 350
3 12 27.4 36 0 350
4 16 35.4 48 0 350
5 17.8 43.4 60 0 350
6 20.8 51.5 72 0 350
7 24 59.5 84 0 350
8 26.5 67.5 96 0 350

 

The IP is intended to be used primarily together with the Video Mixer IP and/or other video processing pipelines such as video scalers and/or video croppers. Any application, that basically involves “data framing” is however suitable. In order to further meet high performance requirements (Especially in terms of memory bandwidth), arbitrary UI interface data and address width is supported as well as arbitrary pixel size. Naturally, S_AXIS and M_AXIS interface ports support external asynchronous clocking and up to 1 pixel per clock cycle processing speed.

In order to achieve the highest performance, the read and write sides of the memory interface are interleaved. All channels also implement fair-use policy and bandwidth distribution – IE none of the channels have priority in accessing the memory over the other video channels. Custom data burst length also allows to meet the highest achievable efficiency of the memory interface. For practical reasons however, I strongly advise not to expect the DDR memory interface to support more than 80% efficiency.   

For more information, please visit the Video Mixer IP demo, which utilizes this IP within the KCU116 development platform.
Detailed documentation is available upon request, please contact me for details!

Vojtech Ters

FPGA Engineer, IrisCores.com

Contact Form