چکیده
|
FPGAs have emerged as a promising platform for implementing neural networks due to their reconfigurability, parallelism, and low power consumption. Nonetheless, designing and optimizing FPGA-based neural network accelerators is a complex and time-consuming task with register transfer level (RTL) languages. High-level synthesis (HLS) tools provide a higher level of abstraction for FPGA design, enabling designers to concentrate on top-level design aspects, such as algorithms, rather than low-level hardware implementation details. One of the state-of-the-art object detection networks is you look only once (YOLO) network series which is constructed using different neural network technologies using cross-stage connections and feature extraction techniques like pyramid networks. In this paper, we propose a method for the implementation of YOLOv7-tiny network on FPGAs using HLS tools. We present a comprehensive analysis of the performance and resource utilization of FPGA-based neural network accelerators. Our methods show excellent results for real-time application requirements such as latency. Specifically, our work reduces the usage of digital signal processing (DSP) units by 90% and it saves up to 60% of flip-flops compared to state-of-the-art designs, while achieving competitive usage of block RAM and look-up tables. Additionally, the achieved design latency of 15 ms is extremely suitable for real-time applications. Also we will propose a method for BRAM utilization method and off-chip memory access.
|