چکیده
|
This paper presents a comprehensive approach to implementing Convolutional Neural Networks (CNNs) on Field-Programmable Gate Arrays (FPGAs). CNNs have become a cornerstone in numerous fields, enabling breakthroughs in areas such as computer vision, natural language processing, and speech recognition. CNNs comprise multiple layers designed to perform various computations. In this research, we propose a general methodology using Highlevel synthesis(HLS) tools for implementing CNNs on FPGAs and provide several use cases demonstrating competitive FPGA resource utilization in comparison to state-of-the-art works. Our experimental results demonstrate a significant reduction in resource utilization for DPS units, amounting to approximately 80% when compared to other neural network accelerators deployed on FPGAs. Furthermore, we have accomplished a noteworthy 50% reduction in Look-Up Table (LUT) usage compared to alternative accelerators, alongside an overall superior performance in comparison to CPU or GPU implementations.
|