resolved in the non-optimal banks -...

1
Javier Lira, Universitat Politècnica de Catalunya (UPC) – [email protected] Timothy M. Jones, University of Cambridge – [email protected] Carlos Molina, Universitat Rovira i Virgili – [email protected] Antonio González, Intel Barcelona Research Center, Intel Labs – UPC - [email protected] Dynamic NUCA (D-NUCA) is a flexible cache design that allows data to be mapped into multiple positions within the NUCA cache. Implements data migration to take accessed data close to the requesting core. Thus, decreasing access latency for future accesses to the same data. S-NUCA D-NUCA The Migration Prefetcher Although existing migration policies succeed on concentrating most accessed data to optimal banks, our experiments show that by 50% of hits in the NUCA cache are resolved in the non-optimal banks. Uses address correlation to anticipate data migrations in the NUCA cache. Evaluate a perfect approach and a realistic implementation. The main challenges are data lookup algorithm, prefetcher accuracy and the size of the NAT. The realistic approach implements: 1) Access: Local + Last Responser. 2) Accuracy: 1 confidence bit. 3) NAT size: 12 addressable bits (29 Kbytes/prefetcher). Access latency to the NUCA cache can be reduced up to 25% when using the realistic implementation of the migration prefetcher.

Upload: trankien

Post on 14-Feb-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Javier Lira, Universitat Politècnica de Catalunya (UPC) – [email protected]

Timothy M. Jones, University of Cambridge – [email protected]

Carlos Molina, Universitat Rovira i Virgili – [email protected]

Antonio González, Intel Barcelona Research Center, Intel Labs – UPC - [email protected]

Dynamic NUCA (D-NUCA)

is a flexible cache design

that allows data to be

mapped into multiple

positions within the

NUCA cache.

Implements data migration to take accessed data close to the requesting core. Thus, decreasing

access latency for future accesses to the same data.

S-NUCA D-NUCA

The Migration Prefetcher

Although existing migration policies succeed on concentrating most accessed data

to optimal banks, our experiments show that by 50% of hits in the NUCA cache are

resolved in the non-optimal banks.

Uses address correlation to anticipate data

migrations in the NUCA cache.

Evaluate a perfect approach and a realistic

implementation. The main challenges are

data lookup algorithm, prefetcher accuracy

and the size of the NAT.

The realistic approach implements:

1) Access: Local + Last Responser.

2) Accuracy: 1 confidence bit.

3) NAT size: 12 addressable bits (29 Kbytes/prefetcher).

Access latency to the NUCA cache

can be reduced up to 25% when

using the realistic implementation of

the migration prefetcher.