7 eti pres
DESCRIPTION
SCC ETI PowerTRANSCRIPT
![Page 1: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/1.jpg)
©2011 ET International, Inc
ETI SCC Baremetal FrameworkBandwidth and Power Findings
Rishi Khan3/30/11
![Page 2: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/2.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alOutline
• SCC Framework Overview• Bandwidth Findings• Power Findings• Software Access
![Page 3: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/3.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alSCC Framework Overview
![Page 4: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/4.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alMessaging Goals
• Asynchronous Communications• Single Threaded• Possibly Long Latency until data is received• Maximize bandwidth• Handle big and small messages• Extensible layer that supports MPI, BSD
sockets, etc
![Page 5: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/5.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alDesign Choices
• One channel per core-pair per direction• Large window size (up to 1MB/channel)• Fast polling of incoming data (use MPB)• Circular buffer with 16 slots and read/write pointers• Poll local pointers, signal remote pointers• Use separate cache lines to avoid locking
2 cache lines * 48 channels = 3K per core
• Double map read and write pages Read – L2 cache enabled Write – L2 cache disabled (write back)
![Page 6: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/6.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alCircular Buffer Example
Core 0 (reader)
Cache
MPB
Channel->local_read
Channel->mpb_write
Core 1 (writer)
Cache
MPB
Channel->local_write
Channel->mpb_read
DRAMChannel->body[]
Is there space?
Write the data (with length as first 2 bytes)
Upd
ate
writ
e po
inte
r
Pol
l loc
al w
rite
poi
nter
Read data
UpdateRead
Pointer
![Page 7: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/7.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alSocket API
• int stream_recv(int nid, void *buf, size_t len, int nb);• int stream_send(int nid, const void *buf, size_t len);
0
20
40
60
80
100
120
Intel RCCE
ETI Streams (DRAM, Blocking)
ETI Streams (MPBs, Blocking)
ETI Streams (MPBs, Non-Blocking)
Message Size (bytes)
Mes
sag
ing
Ban
dw
idth
(M
B/s
ec)
L1
L2
![Page 8: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/8.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alMPI
0
20
40
60
80
100
120
Linux MPI (Intel, Blocking, TCP)
Baremetal MPI (ETI, Blocking)
Baremetal MPI (ETI, Non-blocking)
RCKMPI
Message Size (bytes)
Mes
sag
ing
Ban
dw
idth
(M
B/s
ec)
L1
L2
![Page 9: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/9.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alPower Goals
• External monitoring of voltage and current• Backend Power API
Update time functions with frequency changesKeep chip under safe conditions!!
• Internal synchronization of clocks• External synchronization of host and SCC
![Page 10: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/10.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alExternal Monitoring
• Read /opt/sccKit/systemSettings.ini• Telnet BMC 5010• Request Status / Parse Data• Store timestamps
![Page 11: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/11.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alBackend Power API
• power_session scc_open_power(heap h);• void scc_close_power(power_session ps);• int scc_set_freq(power_session ps, u32 requested_frequency);• int scc_set_voltage(power_session ps, u32 requested_millivolts);• char* scc_error_string(status_code code);
100 106 114 123 133 145 160 178 200 266 320 400 533 800
0.7 X X X X X X X X X X X X
0.8 X X X X X X X X X X X X X
0.9 X X X X X X X X X X X X X
1.0 X X X X X X X X X X X X X
1.1 X X X X X X X X X X X X X X
1.2 X X X X X X X X X X X X X X
1.3 X X X X X X X X X X X X X X
Allowable Frequency
Vol
tage
![Page 12: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/12.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alInternal Synchronization
• Cores come out of sccReset in 20ms intervals• Each core’s clock starts at cycle 0 at reset• Each core’s frequency may be different• Solution:
Set all cores to 400MHz Barrier After Barrier, set internal integrator to 0
![Page 13: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/13.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alFormulas for Time
• Use this formula for time: count = scc_cycle_count() - _integral_cycle; ns = _integral_time_ns +count*_current_ns_in_cycles;
• Use this for frequency change: _integral_time_ns += (scc_cycle_count() - _integral_cycle) *_current_ns_in_cycles; _integral_cycle = current_time; _current_ns_in_cycles = 1.0e9/((double)_global_clock/
(double)freq_divider);
…
Inte
gral
Tim
e
Freq
scc_cycle_count()
_integral_cycle
![Page 14: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/14.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alSyncing Front/Back
• Change voltage from 0.7 to 1.1 every 1 second• Measure changes on frontend• Cannot get better than 0.5 seconds
0 2 4 6 8 10 12 14 16 18 2020
20.5
21
21.5
22
22.5
Amps
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
Residuals
![Page 15: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/15.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alBug in BMC Voltage Readings
• 3 power islands• Drop voltage from 1.2 to 0.7 immediately• Raise Voltage after 20 seconds
0 5 10 15 20 25 30 35 400.7
0.8
0.9
1
1.1
1.2
Voltage
V0
V1
V2
Time
Vo
ltag
e
0 5 10 15 20 25 30 35 4017
18
19
20
21
22
23
Amps
Time
Am
ps
20.5 Seconds0.6 Seconds
![Page 16: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/16.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alOther SCC issues
• If more than 24 cores pound on one MPB, contention overtakes system.Sleep required between polling
• Allowable Voltage/freq are chip specific• BMC telnet response is > 100ms
![Page 17: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/17.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alFuture Work
• DARPA UHPC: Study how voltage/freq affect power dissipation
• Allan Snavely (UCSD)Systematically study loops over a number of
parameters to find the best voltage/freq.Create formulas to approximate good power
settings for unknown loops
![Page 18: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/18.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alAccess to Software
• Email [email protected] • Beta available• Considering open sourcing SCC-specific
portions of our work for others to test/learn/improve
![Page 19: 7 eti pres](https://reader035.vdocuments.us/reader035/viewer/2022062313/55892435d8b42a6e778b4748/html5/thumbnails/19.jpg)
Copyright 2011 ET International, Inc.
ET
Inte
rnat
ion
alAcknowledgements
• Mark Deazley (ETI)• Eric Hoffman (ETI)• Allan Snavely (UCSD)• Intel:
Tim MattsonTed KubaskaRob NoradkiWilf Pinfold, Shekhar Borkar (UHPC)