Please note: This master’s thesis presentation will be given online.
Zahra
Rezapour
Siahgourabi, Master’s
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisor: Professor Ali Mashtizadeh
Poor data locality continues to be a performance bottleneck in today’s popular applications. The hierarchy of caches exiting in modern processors reduces data access latency from the main memory. However, inefficient cache utilization results in data cache miss overhead. Applications usually make frequent accesses to far away data that neglects the locality in the memory hierarchy. One approach to boost applications’ performance is to reorder structure fields in a manner that efficiently utilizes the cache. To do so, extensive program-wide information is needed to gain insight about the access frequencies and access patterns of data.
This thesis introduces AutoCPA, which exploits hardware performance monitoring counters to find optimization opportunities in target applications, and provides insightful guidance for structure reordering. This system is a low-overhead and easy-to-use toolchain that uses a sampling-based approach to collect and analyze memory traces. Moreover, it generates a prioritized set of reordering that can improve cache utilization and locality. The recommendations for the optimal structure layout provided by this tool are obtained from multiple cache analysis algorithms implemented in AutoCPA. Performance results obtained by running AutoCPA on two widely-used applications, Redis and Memcached, illustrate the benefit of the implementation. These results confirm the general performance improvement of applications, with up to 10% Instruction Per Cycle increase in Redis commands.