Buffer-to-Buffer Credits Exchanges andBuffer-to-Buffer Credits, Exchanges, and Urban LegendsPatty Driever IBMPatty Driever, IBMHoward L. Johnson, Brocade
3 August 2010 (9:30am – 10:30am)Session 7320Session 7320Room 104
L l St ffLegal Stuff
• Notice• IBM may have patents or pending patent applications covering subject matter
described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk NY 10504of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
• Any references in this information to non-IBM Web sites are provied for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.
• Trademarks• The following terms are trademarks of the International Business MachinesThe following terms are trademarks of the International Business Machines
Corporation in the United States, other countries, or both: FICON® IBM® RedbooksTM System z10TM z/OS® zSeries® z10TM
• Other Company, product, or service names may be trademarks or service marks of othersothers.
Ab t tAbstract
• Performance in a FICON network is influenced by the underlying flow y y gcontrol mechanisms of Fibre Channel. In this session, we examine how Buffer-to-Buffer credits flow from the channel to the control unit. We also look at how exchanges are used in FICON applications and how they change with the introduction of zHPF. During both examinations, we explore the role of the FICON Director in managing Buffer-to-Buffer credits and exchanges over a cascaded network. Throughout the session we debunk the various FICON “UrbanThroughout the session, we debunk the various FICON “Urban Legends” featuring credits and exchanges. Take the opportunity to learn from two of the FICON industry’s leading experts in channel and fabric development and join our sessionfabric development and join our session.
A dAgenda
• Buffer Credits• What are they and how do they work?
• How do you fill the pipe?• What if you can’t fill the pipe?• What’s wrong with multiple senders and one receiver?• What happens when the pipes are different sizes?• What’s it like in the real world?
• Exchanges• What are they and how do they work?
• What’s an Exchange?• What s an Exchange?• How many exchanges are needed?• Can they be "reused?”• Can you have too many exchanges?y y g
Wh t i B ff t B ff C dit?What is Buffer-to-Buffer Credit?
• The greater the BB Credit….The greater the BB Credit….• A. The faster frames can be sent• B. The farther apart the two ports can be• C. The larger the frames can be• D. None of the above
Wh t i B ff t B ff C dit?What is Buffer-to-Buffer Credit?
• The greater the BB Credit….The greater the BB Credit….• A.• B. The farther apart the two ports can be• C.• D.
Fl C t lFlow Control
• Related to the devices’ ability to receive and process frames• Manages when frames are coming faster than they can be processed• Manages when frames are coming faster than they can be processed• Dropped frames occur when frames are arriving too fast to be processed
• Frames can only be transmitted when the receiver is ready• Credit establishment communicates the number of frames a device can receive at a timeCredit establishment communicates the number of frames a device can receive at a time• The credit value is exchanged at login• Transmission stops when credit runs out• The receiver indicates when it is ready to receive more frames
Credit exchange at Fabric Login
Host says, “I can receive 40 frames.”
Storage says, “I can receive 16 frames.”
Switch says, “I can receive 8 frames.”
B ff C ditBuffer Credit
• At initialization, the two ports establish creditEach b ffer credit corresponds to a frame (regardless of si e)• Each buffer credit corresponds to a frame (regardless of size)
• Each side can support different values• Credit Count
• If a port doesn’t have credit, it can’t send a frameCredit Count has reached zero• Credit Count has reached zero
• Mechanism limits frame drops
Host thinks, “Good I can send S it h thi kGood, I can send 8 frames without thinking about it.”
Storage thinks, “Good, I can send 8 frames without thinking about it.”
Switch thinks, “OK, I can send 40 frames that way and 16 frames this way, but I have to think about it.”
Credit Count88
Credit Count8
Credit Count40
Credit Count16
Credit accounting after Fabric Login
R i R d (R RDY)Receiver Ready (R_RDY)
• R_RDYU d f li k l l fl t l
• Frame transmissionBB C dit i d t d• Used for link level flow control
• Called buffer-to-buffer credit (BB Credit)
• R_RDY is not a frame• It is a “primitive” so it doesn’t consume
• BB Credit is decremented• Once for each frame transmitted
• When BB Credit = 0• Transmission stops
• Frame received• It is a primitive so it doesn t consume a buffer • R_RDY is sent
• Causes transmitter to increment BB Credit
Credit Accounting during Transmission
Storage says “ICredit Count
1
Host says, “I can send 8 frames without stopping.”
Storage says, I can send 8 frames without stopping.”
1Credit Count
40
Credit Count5
Credit Count16R_RDY
stopping.Switch says, “I can 40 frames that way and 16 frames the other way without stopping.”
R_RDY
Urban Legend:B ff C dit t Z P blBuffer Credits at Zero are a Problem
• Buffer credit determines DISTANCETh di t t d b t d till i t i f ll li k f t• The distance two nodes can be apart and still maintain full link frame rate
• Buffer credit is the number of FRAME buffers• A port provides for it’s NEAREST neighbor for RECEIVING framesp p g• Does NOT have to be symmetrical
• Buffer credit is a FRAME count• Not a data SIZE• Not a data SIZE• A 1 byte frame consumes 1 buffer credit• A 2K byte frame consumes 1 buffer credit
• Number of credits needed is determined by:• Raw Link Speed• Speed of light thru a fiber• Distance between two adjacent nodesDistance between two adjacent nodes
Initial Conditions
“Perfect” Switch
C t l U it
Number of B-B Credits the switch advertised to the channel during link init
20Control Unit
Channel
NOTE: In these animations, both the frames and R_RDY’s are numbered. This is for illustrative purposes only. In reality, neither the frames nor the R_RDY’s are numbered. The arrival of an R_RDY only informs the receiver that A frame has been forwarded, now WHICH frame has been forwarded.
Suppose the switch is too far away from the channel for the B-B credit it pp yadvertised to the channel
20
0
789101112131415161718192021 7891011121314151617181920
2 3 4 5 6 7 8 9 101112
21
13141516 1 2 3 4 5 6
0
8910111213141516171819202122 891011121314151617181920
3 4 5 6 7 8 9 101112
2122
1314151617 1 2 3 4 5 6 7
0
91011121314151617181920212322 91011121314151617181920212322
4 5 6 7 8 9 101112131415161718 1 2 3 4 5 6 7 8
0
1011121314151617181920212324 22 1011121314151617181920212324 22
5 6 7 8 9 10111213141516171819 1 2 3 4 5 6 7 8 9
0
212324 2225 1112131415161718192021
1 2 3 4 5 6 7 8 9 10
2324 2225
6 7 8 9 1011121314151617181920
11121314151617181920
0
212324 222526 12131415161718192021
2 3 4 5 6 7 8 9 10
2324 222526
7 8 9 1011121314151617181920
121314151617181920
11
0
212324 22252627 131415161718192021
3 4 5 6 7 8 9 10
2324 22252627
8 9 1011121314151617181920
1314151617181920
1112
0
212324 2225262728 1415161718192021
4 5 6 7 8 9 10
2324 2225262728
9 1011121314151617181920
14151617181920
111213
0
212324 222526272829 151617181920212324 222526272829
1011121314151617181920
151617181920
5 6 7 8 9 10 11121314
0
212324 22252627282930 1617181920212324 22252627282930
11121314151617181920
1617181920
6 7 8 9 10 1112131415
0
212324 2225262728293031 17181920212324 2225262728293031
121314151617181920
17181920
7 8 9 10 111213141516
0
212324 222526272829303132 181920212324 222526272829303132
1314151617181920
181920
8 9 10 11121314151617
0
212324 22252627282930313233 1920212324 22252627282930313233
14151617181920
1920
9 10 1112131415161718
0
212324 2225262728293031323334 20212324 2225262728293031323334
151617181920
20
10 111213 151617181914
0
212324 22252627282930313233343536 212324 22252627282930313233343536
181920 2117 121314151617181920
0
212324 2225262728293031323334353637 212324 22252627282930313233343536
181920 13141516171819202122
37
0
212324 222526272829303132333435363738 212324 22252627282930313233343536
1920 14151617181920212223
3738
0
212324 22252627282930313233343536373839 212324 22252627282930313233343536
20 15161718192021222324
373839
0
212324 2225262728293031323334353637383940 212324 22252627282930313233343536
1718192021 25222324
37383940
16
0
212324 2225262728293031323334353637383940 212324 22252627282930313233343536
17181920212223242526
37383940
0
212324 2225262728293031323334353637383940 212324 22252627282930313233343536
181920222324252627
37383940
21
0
212324 2225262728293031323334353637383940 212324 22252627282930313233343536
192022232425262728
37383940
21
0
212324 2225262728293031323334353637383940 212324 22252627282930313233343536
202223242526272829
37383940
21
0
212324 2225262728293031323334353637383940 212324 22252627282930313233343536
222324252627282930
37383940
21
0
2324 2225262728293031323334353637383940 2324 22252627282930313233343536
222324252627282930
37383940
21 31 21
0
232425262728293031323334353637383940 2324252627282930313233343536
3132
37383940
22232425262728293021 2221
Suppose there are multiple senders to one receiver
20
Suppose there are multiple senders to one receiver
Each sender attempts to send at 100% link speed
20
20
3 1011
3
1223344556
789
10
1213141516171819
5 63 4
1223344556
67
1213141516171819
52 3 4
27891011
52 3 4
2 101112
2
2233445566
789
10
1314151617181920
5 63 4
2233445566
78
1314151617181920
5 63 4
289101112
5 63 4
2 111213
2
2334455667
89
10
1415161718192021
5 6 74
2334455667
1415161718192021
5 63 478
15 63 4 8
910111213
1 11121314
1
3344556677
89
10
1516171819202122
5 6 74
3344556677
5 6 7489
1516171819202122
15 6 74 9
1011121314
1 12131415
1
3445566778
91011
1617181920212223
5 6 7 8
3445566778
89
1617181920212223
5 6 74
091011121314
5 6 74
15
0 1213141516
0
4455667788
91011
1718192021222324
5 6 7 8
4455667788
910
1718192021222324
5 6 7 8
0101112131415
5 6 7 8
16
0 1314151617
0
4556677889
101112
1819202122232425
96 7 8
4556677889
910
18192021222324
5 6 7 8
0101112131415
5 6 7 8
1617
18
0 131415161718
0
5566778899
101112
19202122232425
96 7 8
5566778899
1011
19202122232425
96 7 8
0111213141516
96 7 8
1718
19
0 141516171819
0
56677889910
111213
20212223242526
9 107 8
56677889910
202122232425
101196 7 8
0111213141516
96 7 8
19
1718
1920
0 141516171819
0
667788991010
111213
212223242526
9 107 8
667788991010
2122232425
11129 107 8
26
0 121314151617
9 107 8
1918
20
2021
0 151617181920
0
6778899101011
121314
222324252627
9 10 118
6778899101011
1112
2223242526
9 107 8
0 121314151617
9 107 8
1918
2021
202122
0 151617181920
0
778899101011
121314
2324252627
11
9 10 118
77889910101111
1213
23242526
9 10 118
27
0 131415161718
9 10 118
19202122
212223
0 161718192021
0
788991010111112
131415
2425262728
9 10 11 12
788991010111112
1213
24252627
9 10 118
0 131415161718
9 10 118
1920212223
21222324
0 161718192021
0
8899101011111212
131415
25262728
9 10 11 12
8899101011111212
1314
25262728
9 10 11 12
0
19
1415161718
9 10 11 12
192021222324
22232425
0 171819202122
0
89910101111121213
141516
26272829
1310 11 12
89910101111121213
1314
25262728
9 10 11 12
0
19
1415161718
9 10 11 12
192021222324
22232425
0 171819202122
0
991010111112121313
26272829
1310 11 12
141516
991010111112121313
1415
26272829
1310 11 12
0
19
15161718
20
1310 11 12
202122232425
23242526
0 181920212223
0
9101011111212131314
151617
27282930
13 1011 12
9101011111212131314
26272829
1310 11 1214150
1310 11 12
19
15161718
20202122232425
More real life example: Two senders sending at 30% 50% link rate to one receiver
20
More real life example: Two senders sending at 30% - 50% link rate to one receiver
20
20
H h dit d I d?How much credit do I need?
• Good “Rule of thumb”
Number of credits needed = 1 + Link speed in Gb/s * Distance in KmFrame Size in KBFrame Size in KB
Example: 20 Km at 1 Gb/s 110
4Gb32 Credits
Example: 20 Km at 1 Gb/s1 + 1 * 20 = 11
25060708090
100en
t Dat
a R
ate
Example: 10 Km at 4 Gb/s1 + 4 * 10 = 21
24 20 36 52 68 84 100 116 132 148
Distance (Km)
10203040
Per
ce
2 Distance (Km)
H “l ” i f ?How “long” is a frame?
• Traveling at the speed of light, a frame can be very long• At 1G a frame is about 4Km in length• At 2G a frame is about 2Km in length• At 4G a frame is about 1Km in length• At 8G a frame is about 0.5Km in length
Th t f t i iThe parts of a transmission
• Frame• Building block of an Fibre Channel connection• Contains the information to be transmitted• The address of the source and destination• Control information
• Sequence• A set of one or more related FramesA set of one or more related Frames• Transmitted unidirectionally from one port to an other
• Exchange• An Exchange is one or more nonconcurrent sequences• An Exchange is one or more nonconcurrent sequences• A single operation• May be unidirectional or bidirectional
Fib Ch l FFibre Channel Frame
The basic building block is the FRAME
SOF Header Payload CRC EOF
4 24 0 – 2048(2112)
4 4
Source/Destination link addressesExchange ID
Protects Frame
Identifies end of a frame
Identifies beginning of a frame
Protocol TypeStart or End of SequenceStart or End of Exchange
Identifies end of a frame
g g
Fi IU E lFicon IU Examples
SOF Header Ficon CRC EOF
1 Frame IU to transfer a Read CCW
SOF Header Header CRC EOF
SOF Header Ficon Header
CCW Data (2016 bytes) CRC EOF
3 Frame IU to transfer 4K of data
Header
SOF Header CCW Data (2048 bytes) CRC EOF
SOF Header CCW Data (32 bytes)
Ficon CRC CRC EOF
Urban Legend:FICON f E hFICON uses fewer Exchanges
• Fibre Channel Architecture defines an Exchange asg• “A mechanism for identifying and managing an operation between
two ports“
• All IUs (a.k.a. Sequences) that make up a single I/O operation are part of an Exchange
• In Ficon, each concurrent I/O operation uses two Exchanges• One unidirectional Exchange for IUs from the Channel to the CU• A different unidirectional Exchange for IUs from the CU to the• A different unidirectional Exchange for IUs from the CU to the
Channel
• The PAIR is commonly know as a “Ficon Exchange”• The PAIR is commonly know as a Ficon Exchange
S d IUSequences and IUs
• Each Upper Layer Protocol (ULP) defines the contents and format of pp y ( )it’s own Information Units (IUs)• Commands• Data• Status• Control• Etc
• Ficon IUs can be up to 8K (8192) in size• 8160 (8K-32) bytes of data
• 32 bytes contain Ficon Header informationy
• 4 frames are needed for the largest IU• The collection of frame(s) that make up a IU are called a Sequence
• A Sequence may be as small as a single Frame• A Sequence may be as small as a single Frame
H E h d I d?How many Exchanges do I need?
• Little’s Law states:• The number of “things” in a system can be determined by multiplying the
average arrival rate of those “things” by the average time each “thing” stays in the system.
• Applied to Ficon:• The average number of Exchanges active at any given time =
A erage I/O rate * A erage response timeAverage I/O rate * Average response time
• Example: 5000 Ficon I/Os / Second on a given channel with .4ms service time1 needs 2 Active Exchanges (pairs) at any given timeservice time needs 2 Active Exchanges (pairs) at any given time
1 The amount of time the I/O is active in the channel
SSummary
• Buffer CreditsBuffer Credits• Distance• Flow Control
• Exchanges• Unidirectional• Unidirectional• Bidirectional
S k Bi hSpeaker Biography
• Patty DrieverPatty Driever• IBM• System z I/O and Networking Technologist
• Contact Information• [email protected]
S k Bi hSpeaker Biography
• Howard L. JohnsonHoward L. Johnson• BROCADE• Technology Architect, FICON
26 t h i l d l t d t• 26 years technical development and management
• Contact Information• howard johnson@brocade [email protected]
E d t E d C ditEnd to End Credit
• Device to Device Flow ControlDevice to Device Flow Control• Between source and destination
• Not the linksSi il t b ff t b ff fl t l• Similar to buffer-to-buffer flow control• At N_Port Login• Report available receive buffers (EE_Credit)• Transmitter counts buffers transmitted (EE_Credit_CNT)• Receiver acknowledges frame (ACK)
• ACK 1 (a single data frame in a sequence) – most common• ACK n (several (N) consecutive data frames in a sequence)• ACK 0 (all data frames in a sequence) – not used
Vi t l Ch lVirtual Channels
• Technology to allocate BB Credits to particular data flowsTechnology to allocate BB_Credits to particular data flows• Class F traffic has one data flow• Assigned with Zoning by using special Zone names