lecture 1. technology trend
DESCRIPTION
COM503 Parallel Computer Architecture & Programming. Lecture 1. Technology Trend. Prof. Taeweon Suh Computer Science Education Korea University. Transistor Basics. Digital chips are designed with transistors Transistor is a three-ported voltage-controlled switch - PowerPoint PPT PresentationTRANSCRIPT
Lecture 1. Technology Trend
Prof. Taeweon SuhComputer Science Education
Korea University
COM503 Parallel Computer Architecture & Programming
Korea Univ
Transistor Basics
• Digital chips are designed with transistors
• Transistor is a three-ported voltage-controlled switch Two of the ports are connected depending on the voltage on
the third port For example, in the switch below the two terminals (d and s)
are connected (ON) only when the third terminal (g) is 1
2
g
s
d
g = 0
s
d
g = 1
s
d
OFF ON
Korea Univ
Silicon
3
• Transistors are built out of silicon, a semiconductor
• Silicon is not a conductor• Doped silicon is a conductor
– n-type (free negative charges, electrons)– p-type (free positive charges, holes)
Silicon Lattice
Si SiSi
Si SiSi
Si SiSi
As SiSi
Si SiSi
Si SiSi
B SiSi
Si SiSi
Si SiSi
-
+
+
-
Free electron Free hole
n-Type p-Type
wafer
Majority: ElectronsMinority: Holes
Majority: HolesMinority: Electrons
Korea Univ
Periodic Table of the Elements
4
Korea Univ5
MOS Transistors
• Metal oxide silicon (MOS) transistors: – Polysilicon (used to be Metal) gate– Oxide (silicon dioxide) insulator– Doped Silicon substrate and wells
n
p
gatesource drain
substrate
SiO2
n
gatesource drain
nMOS
Polysilicon
n p p
pMOS
gate
source drain
gate
source drain
substrate
Korea Univ6
MOS Transistors• The MOS sandwich acts as a capacitor (two conductors
with insulator between them)• When voltage is applied to the gate, the opposite charge
is attracted to the semiconductor on the other side of the insulator, which could form a channel of charge
n
p
gatesource drain
substrate
SiO2
n
gatesource drain
nMOS
Polysilicon
n p p
pMOS
gate
source drain
gate
source drain
substrate
Korea Univ7
nMOS Transistor
n
p
gatesource drain
substrate
n n
p
gatesource drain
substrate
n
GND
GND
VDD
GND
+++++++- - - - - - -
channel
Gate = 0 (OFF) (no connection between source and
drain)
Gate = 1 (ON) (connection between source and
drain)
Korea Univ8
Transistor Function
gs
d
g = 0
s
d
g = 1
s
d
gd
s
d
s
d
s
nMOS
pMOS
OFF ON
ON OFF
Korea Univ9
CMOS (Complementary MOS)• CMOS is used to build the vast majority of all
transistors fabricated today nMOS transistors pass good 0’s, so connect source to GND pMOS transistors pass good 1’s, so connect source to VDD
pMOSpull-upnetwork
outputinputs
nMOSpull-downnetwork
Korea Univ10
• Top view
• Cross-section
CMOS Layout
Korea Univ11
NOT Gate
VDD
A Y
GND
N1
P1
NOT
Y = A
A Y0 11 0
A Y
A P1 N1 Y0 ON OFF 11 OFF ON 0
Layout (top view)
Korea Univ12
NAND Gate
A
B
Y
N2
N1
P2 P1NAND
Y = AB
A B Y0 0 10 1 11 0 11 1 0
AB Y
A B P1 P2 N1 N2 Y
0 0 ON ON OFF OFF 1
0 1 ON OFF OFF ON 1
1 0 OFF ON ON OFF 1
1 1 OFF OFF ON ON 0
Layout
Korea Univ13
Now, Let’s Make an Inverter Chip
Core 2 Duo
Your Inverte
rchip
• Yield means how many dies are working correctly after fabrication
die
Korea Univ
(Semiconductor) Technology• IC (Integrated Circuit) combined dozens to hundreds of transistors into a
single chip• VLSI (Very Large Scale Integration) is used to describe the tremendous
increase in the number of transistors in a chip• (Semiconductor) Technology: How small can you make a transistor
0.1 µm (100nm), 90nm, 65nm, 45nm, 32nm, 22nm technologies
14
n
p
gatesource drain
substrate
SiO2
n
gatesource drain
nMOS
Polysilicon
n p p
pMOS
gate
source drain
gate
source drain
substrate
Korea Univ
x86?
• What is x86? Generic term referring to processors from Intel, AMD and VIA Derived from the model numbers of the first few generations of processors:
• 8086, 80286, 80386, 80486 x86 Now it generally refers to processors from Intel, AMD, and VIA
• x86-16: 16-bit processor• x86-32 (aka IA32): 32-bit processor * IA: Intel Architecture• x86-64: 64-bit processor
• Intel takes about 80% of the PC market and AMD takes about 20% Apple also have been introducing Intel-based Mac from Nov. 2006
15* aka: also known as
Korea Univ
x86 History (as of 2008)
16
Korea Univ
x86 History (Cont.)
17
32-bit (i386)
32-bit (i586) 64-bit (x86_64)32-bit (i686)
8-bit 16-bit 4-bit
2009 20111st Gen. Core i7
(Nehalem)2nd Gen. Core i7(Sandy Bridge)
20123rd Gen. Core i7
(Ivy Bridge)
20134th Gen. Core i7
(Haswell)
Korea Univ18
Moore’s Law
• Transistor count will be doubled every 18 months
Exponential growth2,250
42millions
1.7 billions
Montecito
Korea Univ
Feature Size (Technology) Trend
19
Korea Univ
Power Dissipation
20
• By early 2000, Intel and AMD made every effort to increase clock frequency to enhance the performance of their CPUs
• But, the power consumption is the problem
P ≈ CVDD2f
C: CapacitanceVDD: Voltagef: Frequency
Korea Univ
Power Density Trend
21Source: Intel Corp.
Korea Univ
Watch this!
22
Click the chip
Slide from Prof H.H. Lee in Georgia Tech
Korea Univ
How to Reduce Power Consumption?
• Reduce supply voltage with new technologies i.e., reducing transistor size
• Keep the clock frequency in modest range No longer increase the clock frequency
• Then… what would be the problem?
• So, the strategy is to integrate simple many CPUs in a chip
23
Performance
Dual Core, Quad Core….
Korea Univ
Reality Check, circa 200x• Conventional processor designs run out of steam
Power wall (thermal) Complexity (verification) Physics (CMOS scaling)
• Unanimous direction Multi-core Simple cores (massive number) Keep
• Wire communication on leash • Gordon Moore happy (Moore’s Law)
Architects’ menace: kick the ball to the other side of the court?
24Modified from Prof. Sean Lee in Georgia Tech
Korea Univ25
Multi-core Processor Gala
Prof. Sean Lee’s Slide in Georgia Tech
Korea Univ
Intel’s Core 2 Duo
26
• 2 cores on one chip• Two levels of caches
(L1, L2) on chip• 291 million
transistors in 143 mm2 with 65nm technology
L2 Cache
Core0 Core1
Source: http://www.sandpile.org
DL1 DL1
IL1 IL1
Korea Univ
Intel’s Core i7 (Nehalem)
27
• 4 cores on one chip• Three levels of
caches (L1, L2, L3) on chip
• 731 million transistors in 263 mm2 with 45nm technology
Korea Univ
Intel’s Core i7 (Sandy Bridge)
28
2nd Generation Core i7
995 million transistors in 216 mm2 with 32nm
technology
L1 32 KB
L2 256 KB
L3 8MB
Korea Univ
Intel’s Core i7 (Ivy Bridge)
29
3rd Generation Core i7
L1 64 KB
L2 256 KB
L3 8MB
1.4 billion transistors in 160 mm2 with 22nm
technology
http://blog.mytechhelp.com/laptop-repair/the-ivy-bridge/
Korea Univ
Intel’s Core i7 (Haswell)
30
4th Generation Core i7
L1 64 KB
L2 256 KB
L3 8MB
1.6 billion transistors in 177 mm2 with 22nm
technology2x Graphics performance
over Ivy Bridge
Korea Univ
AMD’s Opteron – Barcelona (2007)
31
• 4 cores on one chip• 1.9GHz clock• 65nm technology• Three levels of caches (L1, L2, L3) on chip• Integrated North Bridge
Korea Univ
Intel Teraflops Research Chip
• 80 CPU cores• Deliver more than 1 trillion
floating-point operations per second (1 Teraflops) of performance
32
Introduced in September 2006
Korea Univ
Intel’s 48 Core Processor
• 48 x86 cores manufactured with 45nm technology• Nicknamed “single-chip cloud computer”
33
Debuted in December 2009
Korea Univ
Tilera’s 100 cores (June 2011)• Tilera has introduced a range of processors (64-bit Gx family:
36 cores, 64 cores and 100 cores), aiming to take on Intel in servers that handle high-throughput web applications 64-bit cores running up to 1.5GHz Manufactured in 40nm technology
34
TILE Gx 3000 Series Overview
Korea Univ
IBM Bluegene/Q Processor• The Bluegene/Q processors power the
world #3 Sequoia supercomputer, boasting 16.32 petaflops in Lawrence Livermore National Labs 1,572,864 cores
• Bluegene/Q has 18 cores First processor supporting hardware
transactional memory Each core is a 64-bit 4-way multithreaded
PowerPC A2 16 cores are used for running actual
computations; one will be used for running the operating system; the other is used to improve chip reliability
1.47 billion transistors 1.6 GHz
35
http://www.top500.org
IBM’s Bluegene/Q Processor (2011)
Korea Univ
#1 Supercomputer (2013)• Tianhe-2 (MilkyWay-2) in
National University of Defense Technology, China Xeon Phi 3,120,000 cores 1,024 TB Memory 17,808 MW power
consumption 33 petaflops
36
http://www.top500.org
Korea Univ
Performance
• If you edit your ms-word document on dual core, would it be running twice faster?
• The problem now is how to parallelize applications and efficiently use hardware resources (available cores)…
• If you were plowing a field, which would you rather use: Two strong oxen or 1024 chickens? - Seymour Cray (the father of supercomputing)
37
No!
Well, it is hard to say in Computing World
Korea Univ
Parallel Programming Models
• Most widely adopted parallel programming models OpenMP
• Shared-memory programming model• Parallel constructs are added to a sequential program written in
Fortran, C or C++• Comparably simple to use since the burden of working out the
details of the parallel program is up to the compiler
Pthread: POSIX (Portable Operating System Interface) Threads• Shared-memory programming model• Pthreads are defined as a set of C and C++ programming types and
procedure calls A collection of routines for creating, managing, and coordinating a
collection of threads – So, it is a library• Programming with Pthreads is much more complex than with
OpenMP38
Korea Univ
Parallel Programming Models
MPI: Message Passing Interface• Developed for distributed-memory architectures, where
multiple processes execute independently and communicate data as needed by exchanging messages
• Most widely used in the high-end technical computing community, where clusters are common
• Most vendors of shared memory systems also provide MPI implementations that leverage the shared address space
• Most MPI implementations consist of a specific set of APIs callable from C, C++ ,Fortran or Java --Wiki
39