FFT IP

by | Sep 7, 2024

Overview

The FFT IP Implements by default the Cooley-Tukey Decimation in Frequency (DIF) FFT Algorithm, an efficient and performance-optimized implementation of the DFT (Discrete Fourier Transform) with sizes of up to 65536 samples. The core support both fixed point and floating point (Single Precision) implementation, floating point math however requires the usage of external IPs. Both forward and Inverse transforms are supported. The core can be also reconfigured during runtime and provides various status signals – including invalid FFT packet detection and Math overflow problems. The IP supports both Base-Radix and Pipelined architectures for maximum performance. Base-radix allows for computing of 1 FFT butterfly (Either Radix 2 / Radix 4 or Radix 8) per clock cycle. Pipelined architecture allows for computing with the same speed in all FFT stages in parallel (IE Fully Pipelined at a cost of extra resources). Mixed radix tranforms are also supported. The core has standardized AXI4-Lite status & control interface with Optional CDC to facilitate asynchronous control and processing clock domains.

Base radix IP configuration (R2 / R4 or R8) implements a  single FFT stage. In order to process the required transform, multiple stage loops are required. Amount of FFT stages is equal to ceil(log(Radix,FFT size)), IE. for a 4096-point transform and base radix of 4, 6 stages needs to be processed. Amount of stages required to compute the transform directly relates to the the core’s latency.Thats why higher Base-R transforms reduce latency and increase the overall performance of the core. The difference between base-R and pipeline architecture is that within the pipeline architecture, multiple transforms are processed in parallel – although Base-R and pipelined architecture will still have the same latency (Considering the base radix is equal). The core also supports both Decimation in Frequency (DIF) and Decimation in Time (DIT). There are however no benefits of using DIT over DIF and thus the results will be equal in both cases except for some rounding errors due to different order of math applied.

Features

  • FFT Sizes  from 2 to 65536
  • Dynamic reconfiguration
  • AXI-Stream & AXI4-Lite Compliant
  • Base – 2 / Base – 4 / Base – 8 / Pipelined Architectures
  • Mixed Radix (2 / 4 / 8)
  • Fixed point and Floating point
  • Variable data and twiddle factor widths
  • Natural output order by default
  • Both forward and inverse transforms supported
  • Full RTL-based VHDL2008 Source code without 3rd party IPs and/or vendor dependencies
FAQ
  • How is the floating point math supported?
    When  the IP is configured to implement floating point math, it expects a specific 3rd party components to be available during the synthesis. There should be 3 components for floating point addition, subtraction and multiplication. The exact interface description is shown in the documentation,which is currently available on request.
  • What are the benefits or using higher order radixes?
    For Base-R implemntation, choosing a higher order radix always leads to a performance boost at a cost of extra resource usage. It is therefore possible to balance the need for performance and resource usage.
  • How does the base radix choice affect the pipelined version? 
    For pipelined architecture, the default radix should be always set to 2. Even though higher-order radixes are supported, there are no extra benefits of using those. The reason is that the core is able to receive/transmit only one sample at a time. The default Base-2 pipelined architecture is capable to handle the processing speed.
  • Is the AXI4-lite interface required to be used?
    The core can be set to process a pre-defined transform during instantiation,so generally AXI4-lite interface is not required. It adds however certain features that might help with the usage of the IP such as math overflow status or invalid paacket detection.
  • Are there any benefits of using the DIT over DIF?
    No there are no benefits, it is recommended to keep the default DIF algorithm in place. DIT is however supported due to upcomming releases of  other IPs that will utilize the DIT algorithm.

Please contact me for more details on documentation and additional features. The core can be used as stand-alone IP inside a DSP processing chain or as an extra offload accelerator in which case, it could be for example used together with a DMA component. The DMA is also in the portfolio HERE. The image above is taken from an FFT demonstration application,where the core works in the “Accelerator” version along with DMA as shown below:

Initialially I have started to develop the FFT algorithm at the Czech Technical University in Prague, CZ, where I have implemented the algorithm in Matlab and the nVidia CUDA technology. Later on, I have reworked the version to C++ (Not officially released though) and now, I can proudly present even a version intended for ASIC/FPGA platforms. If you are looking for an FFT expert, look no further ☄️!

I do always guarantee a feature-rich and bug-free functionality with the highest performance. For more information such as latency/performance  estimates and/or resource usage, please dont hesitate to contant me. 

Vojtech Ters

IrisCores.com

Contact Form