Bibliography (1/2) – Cache Replacement Policies

59
C H A P T E R 6
Conclusions
In this book, we have summarized and organized research in cache replacement by defining a
new taxonomy of cache replacement policies. While we have not discussed every policy in the
literature, we hope that our taxonomy outlines strategic shifts in the design of cache replacement
solutions. For example, we have shown that while the first three or four decades of research in
cache replacement focused largely on Coarse-Grained policies, the last decade has seen a notable
shift toward Fine-Grained policies.
Cache Replacement Championship 2017 is trend toward Fine-Grained policies is appar-
ent from the results of the 2017 Cache Replacement Championship (CRC), which provides a
useful snapshot of the current state-of-the-art.
e CRC is conducted every few years to compare state-of-the-art policies within a uni-
form evaluation framework. e most recent CRC was held in 2017 in conjunction with ISCA,
where submissions were compared on four configurations: (1) a single-core system without a
prefetcher, (2) a single-core system with L1/L2 prefetchers, (3) a four-core system without a
prefetcher, and (4) a four-core system with L1/L2 prefetchers. Evaluation was done using repre-
sentative regions of SPEC 2006 benchmarks, and simulations were run for 1 billion instructions
after warming up the caches for 200 million instructions.
Figures 6.1
1
and 6.2 summarize the performance of the top-three policies for each con-
figuration. For the two single-core configurations, we include the performance of MIN.
2
e Hawkeye policy [Jain and Lin, 2016] won the CRC in 2017. However, a more de-
tailed analysis of the CRC results points to some interesting trends. First, while the difference
among individual submissions (including those not shown) is small, cache replacement poli-
cies have improved significantly, as the top three solutions show impressive improvements over
SHiP, which was proposed in 2011. Second, the benefit of intelligent cache replacement is
more pronounced on multi-core systems than single-core systems. On the 4-core configuration,
the winning solution improves performance by 9.6% in the absence of prefetching (vs. 5% for
single-core configuration) and by 7.2% in the presence of prefetching (vs. 2.5% for single-core
configuration).
1
LIME is a version of Hawkeye produced by a group other than the original authors.
2
MIN is provably optimal in the absence of prefetching (first configuration), but not in the presence of prefetching
(second configuration). Since MIN requires knowledge of the future, we run the simulation twice to estimate MIN’s speedup.
e second simulation uses MIN’s caching decisions from the first simulation. It is difficult to simulate MIN on multi-core
configurations because applications can interleave non-deterministically across different runs.
60 6. CONCLUSIONS
3.1%
4.4%
4.7%
5%
6.6%
0
1
2
3
4
5
6
7
SHiP
(baseline)
MPPPB LIME
1
SHiP++ MIN
Speedup over LRU (%)
Single-Core IPC speedup (no prefetcher)
0.6%
2.2% 2.2%
2.5%
4.3%
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
SHiP
(baseline)
Hawkeye MPPPB SHiP++ MIN
Speedup over LRU (%)
Single-Core IPC speedup (with L1/L2 prefetcher)
Figure 6.1: Cache replacement championship 2017 single-core results.
6.9%
8.2%
9.0%
9.6%
0
2
4
6
8
10
12
SHiP
(baseline)
MPPPB ReD Hawkeye
Speedup over LRU (%)
Four-Core IPC speedup (no prefetcher)
2.9%
6.4%
6.5%
7.2%
0
1
2
3
4
5
6
7
8
SHiP
(baseline)
SHiP++ ReD Hawkeye
Speedup over LRU (%)
Four-Core IPC speedup (with L1/L2 prefetcher)
Figure 6.2: Cache replacement championship 2017 multi-core results.
Trends and Challenges Our comprehesive look at cache replacement policies reveals some
clear trends, which point to interesting opportunities for future research.
First, Fine-Grained policies, particularly Classification-based policies, have been shown to
outperform traditional Coarse-Grained policies. e most recent Cache Replacement Champi-
onship publicized the top three performing finishers of their 15 submissions, and all 3 used Fine-
Grained policies. Of course, Fine-Grained policies leverage aging mechanisms from Coarse-
Grained policies, so one avenue of future work would be to explore aging schemes customized
for Fine-Grained policies.
Second, Fine-Grained policies learn from past behavior, but the key impediment to their
performance is their prediction accuracy and their ability to handle inaccurate predictions. Im-
provements in prediction accuracy are needed to bridge the gap between practical policies and
Beladys MIN. As we discuss in Section 4.2.3, state-of-the-art Fine-Grained policies now view
cache replacement as a supervised learning problem, which opens up the possibility of applying
machine learning to cache replacement.
6. CONCLUSIONS 61
ird, as seen in Figure 6.1, there remains a considerable gap between state-of-the-art
replacement policies and MIN even for single-core configurations, which suggests that there is
still room to improve cache replacement for both single-core and multi-core configurations.
Fourth, recent advances have blurred the line between dead block predictors and Fine-
Grained cache replacement policies, and we show that even though traditional dead block pre-
dictors were designed for a different goal, they fit well in our cache replacement taxonomy.
Finally, there is room to improve cache replacement policies by considering richer factors,
including (1) cache replacement policies that cooperate with other components in the system,
such as the prefetcher, (2) policies that use performance goals that go beyond improved cache
hit rates, and (3) policies that take into account the cache architecture and new memory tech-
nologies. Unfortunately, accounting for richer factors within a cache replacement policy remains
a challenging problem because of the significantly larger design spaces and because of the lack
of efficient optimal algorithms to guide the design process. Nevertheless, with the low-hanging
fruit already plucked, these avenues remain promising for future cache replacement research.