Implementing FPGA-optimized Systolic Arrays using 2D Knapsack and Evolutionary Algorithms

Friday, November 26, 2021 6:00 pm - 6:00 pm EST (GMT -05:00)

Candidate: Harry Chan Chan
Title: Implementing FPGA-optimized Systolic Arrays using 2D Knapsack and Evolutionary Algorithms

Date: November 26, 2021
Time: 18:00
Place: MS Teams
Supervisor(s): Kapre, Nachiket

Abstract:
Underutilization of FPGA resources is a significant challenge in the deployment of FPGAs as neural network accelerators.
We propose an FPGA-optimized systolic array architecture to improve the CNN inference throughput by orders of magnitude through
parallelism-aware partitioning of on-chip resources.
We fracture the FPGA into multiple square systolic arrays and formulate the placement of these arrays as a 2D knapsack problem.
We simulate the cycle counts needed for each layer of the neural network given different systolic array sizes using SCALESim, and
generate physical implementation and operating frequencies of systolic arrays placed in uniformly staggered locations on Xilinx
VU37P and VU9P Ultrascale+ platform.
We use the information in an optimizer coupling CMA-ES evolutionary algorithm and a simple 2D Knapsack solver to discover
packable and routable partitioned designs targeting high frequency.
From our experiment, the most significant performance improvement observed comes from layers with large kernel sizes. We
demonstrate that inference throughput gain of 7-22.7$\times$ is possible with a 1.2-7.6$\times$ sacrifice of latency.
Our optimization tool can achieve up to ~8$\times$ higher throughput gain on eight MLPerf benchmark network topologies. Our tool
also generates designs across various latency and throughput combinations, providing a wide degree of design selection.

Support Waterloo Engineering