using uncacheable memory to improve unity linux performance

21
Using Uncacheable Memory to Improve Unity Linux Performance Ning Qu Xiaogang Gou Xu Cheng Microprocessor Research and Development C enter Peking University

Upload: rad

Post on 13-Jan-2016

45 views

Category:

Documents


3 download

DESCRIPTION

Using Uncacheable Memory to Improve Unity Linux Performance. Ning Qu Xiaogang Gou Xu Cheng Microprocessor Research and Development Center Peking University. Unity SoC architecture. Issues. No snooping. Cache coherency problem everywhere !!. poor temporal locality!. Issues cont. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Using Uncacheable Memory to Improve Unity Linux Performance

Using Uncacheable Memory to Improve Unity Linux Performance

Ning QuXiaogang Gou

Xu Cheng

Microprocessor Research and Development CenterPeking University

Page 2: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Issues

Main Memory

TLB DMA

DCache CPU ICache

No snooping

Cache coherency problem

everywhere !!

Unity SoC architecture

UniCore32UniCore-F64

(CP2)

I_BUSD_BUS

BIU

CP0

CP1

IMMU

I-Cache

DMMU

D-Cache

APB Bridge

PCI Bridge EMI10/100M

MAC

SPI

IIC

UART1

UART028 GPIO

RTC

INTC

PowerM.

OST

ResetC

System Control Modules

6 channelDMA

Page 3: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Issues cont.

process I/O buffer

User Processprocess I/O buffer

User Process

kernel I/O buffer

Linux Kernel

I/O device buffer

I/O Device

kernel I/O buffer

Linux Kernel

I/O device buffer

I/O Device

DMA DMA

poor temporal

locality!

Page 4: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Motivation

Heavy cost of Cache coherency operations Many high-end embedded processors have Cache, But many of them have very limited support to guarantee cache coherency

Poor locality leads to more data Cache pollutionCache is based on property of localitySome programs have poor locality, for example TCP/IP processing

How to avoid the disadvantages?

Uncacheable memory may be a solution!

Page 5: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Contributions

Analyze the scenarios in which Cache doesn’t perform well, propose uncacheable memory has two advantages Eliminate most of Cache coherency operations Avoid Cache pollution

Apply uncacheable memory in Unity Linux to improve the I/O performance. Some important aspects improves from 5% - 29%

Page 6: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Outline

IssuesMotivationContributionUncacheable MemoryEvaluationRelated WorkConclusions

Page 7: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Recv Packet Flow

BufferKernel Space

User Space

I/O Device

step 1 step 2 step 3 step 4

User Bufferflush cache

DMA copy

Simple data

processing

CPU copy

Buffer Buffer Buffer

using uncacheable memory

Page 8: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Send Packet Flow

BufferKernel Space

User Space

I/O Device

step 1 step 2 step 3 step 4

User Buffer clean cache DMA copy

Simple data

processingCPU copy

Buffer Buffer Buffer

using uncacheable memory

Page 9: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Cacheable vs. Uncacheable

Send Receive

CH processing 1. copy from U to K

2. clean data cache

1. clean&invalidate data cache

2. copy from K to U

NC processing 1. copy from U to K(N) 1. copy from K(N) to U

side effect 1. accessing uncacheable memory is slower

2. no data cache pollution

3. no cache clean operation

1. accessing uncacheable memory is slower

2. no data cache pollution

3. no cache flush operation

DMA send and receive cost analysis

Page 10: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Cacheable vs. Uncacheable cont.

DMA Send:

DMA Recv:

Cache clean costload U to Cache

load U into Cache

load K to Cache

store to KCache flush cost load U into Cache and store

load U into Cache and store

load K to Cache

load K

Page 11: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Cacheable vs. Uncacheable cont.

Recv and Send Performance CH vs NC

Page 12: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Using Uncacheable Memory

Implemented in Unity Linux ported from Linux 2.4.17 Uncacheable page table

eliminate Cache coherency operations when modifying the page tables

Uncacheable socket buffer for sending eliminate Cache coherency operations avoid data Cache pollution

Page 13: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Outline

MotivationIssuesContributionUncacheable Memory?EvaluationRelated WorkConclusions

Page 14: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Methodology

Benchmarks: Netperf, Lmbench and Modified Andrew benchmark.

Experiments environment 160 MHz Unity network computer with 256 MB

DRAM, a SoC build-in 10M/100M Ethernet card Dell 4600 server, two Intel Xeon PIII 700 MHz

processors with 4 GB DRAM and 1000M/100M Ethernet card

All benchmarks are executed in single-user mode on NFS.

Page 15: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Netperf Benchmark Results

Netperf TCP_STREAM Send Performance

Page 16: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Netperf Benchmark Results cont.

Netperf TCP_RR Performance

Page 17: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Lmbench Benchmark Results

Lmbench Performance

Page 18: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Modified Andrew Benchmark Results

Modified Andrew Benchmark

Page 19: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Related Work

Related work: accelerate uncacheable memory performance New memory type

Intel write-combining MIPS R10000: uncached-accelerated page

New instructions SPARC V9, ARM, Unity II: block move instructions

Future work: new memory type support Read like common cache with low pollution Write like Write-Combining without write-allocate

Page 20: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Conclusions

This paper focuses on the uncacheable memory usage. Pros: eliminating coherency operations and

avoiding data Cache pollution. Cons: slow accessing time

Uncacheable memory can perform well with a carefully design when considering system specialties

Page 21: Using Uncacheable Memory to Improve Unity Linux Performance

Peking University

Thank You!

Questions?