Candidate:
Harry
Chan
Chan
Title:
Implementing
FPGA-optimized
Systolic
Arrays
using
2D
Knapsack
and
Evolutionary
Algorithms
Date:
November
26,
2021
Time:
18:00
Place:
MS
Teams
Supervisor(s):
Kapre,
Nachiket
Abstract:
Underutilization
of
FPGA
resources
is
a
significant
challenge
in
the
deployment
of
FPGAs
as
neural
network
accelerators.
We
propose
an
FPGA-optimized
systolic
array
architecture
to
improve
the
CNN
inference
throughput
by
orders
of
magnitude
through
parallelism-aware
partitioning
of
on-chip
resources.
We
fracture
the
FPGA
into
multiple
square
systolic
arrays
and
formulate
the
placement
of
these
arrays
as
a
2D
knapsack
problem.
We
simulate
the
cycle
counts
needed
for
each
layer
of
the
neural
network
given
different
systolic
array
sizes
using
SCALESim,
and
generate
physical
implementation
and
operating
frequencies
of
systolic
arrays
placed
in
uniformly
staggered
locations
on
Xilinx
VU37P
and
VU9P
Ultrascale+
platform.
We
use
the
information
in
an
optimizer
coupling
CMA-ES
evolutionary
algorithm
and
a
simple
2D
Knapsack
solver
to
discover
packable
and
routable
partitioned
designs
targeting
high
frequency.
From
our
experiment,
the
most
significant
performance
improvement
observed
comes
from
layers
with
large
kernel
sizes.
We
demonstrate
that
inference
throughput
gain
of
7-22.7$\times$
is
possible
with
a
1.2-7.6$\times$
sacrifice
of
latency.
Our
optimization
tool
can
achieve
up
to
~8$\times$
higher
throughput
gain
on
eight
MLPerf
benchmark
network
topologies.
Our
tool
also
generates
designs
across
various
latency
and
throughput
combinations,
providing
a
wide
degree
of
design
selection.
Friday, November 26, 2021 6:00 pm
-
6:00 pm
EST (GMT -05:00)