virtex-5 integrated pci express block plus - debugging ... · this debugging guide concludes with a...

38
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 1 Xilinx Answer 42368 Virtex-5 Integrated PCI Express Block Plus Debugging Guide For Link Training Issues Important Note: This downloadable PDF of an Answer Record is provided to enhance its usability and readability. It is important to note that Answer Records are Web-based content that are frequently updated as new information becomes available. You are reminded to visit the Xilinx Technical Support Website and review (Xilinx Answer 42368) for the latest version of this Answer. Introduction This document describes techniques to debug link training issues related to designs using the Virtex-5 FPGA Endpoint Block Plus core for PCI Express. A complete list of signals to capture in ChipScope Pro when debugging link training issues has been provided. ChipScope Pro screen captures illustrate how to analyze those signals and establish theories on potential reasons causing the problem. There are two major sections in this guide. The first section provides an overview of link training including the Link Training and Status State Machine (LTSSM) states and TS1 and TS2 ordered sets. The second section focuses on using ChipScope Pro to capture the relevant signals on the GTP/GTX interface to identify potential problems during link training. This guide helps the user understand how the LTSSM progresses and what states the signals should be in during this progression. This debugging guide concludes with a checklist of common problems to address when having link training issues. There are usually three major link training failures. One is a complete failure to establish a link of any width; indicated by the core output trn_lnk_up_n not asserting. The second is when the link trains to a lower width than intended, such as an x8 link training as x4. Third, is a link that is constantly entering into the RECOVERY state. Link training problems are normally due to board signal integrity problems or improper GTP/GTX usage. The board must meet both the electrical requirements set forth by the GTP/GTX user guides and also the PCI Express Base Specification. Link Training Overview After FPGA configuration, the two connected devices go through the link training process. This Link Training and Status State Machine (LTSSM) defines this process. Figure 1 shows the different states of the LTSSM. The main states to consider while debugging link training issues are DETECT, POLLING, CONFIGURATION, and L0. Detailed descriptions of the LTSSM states are found in Chapter 4 of the PCI Express Base Specification. In the DETECT state, each lane performs receiver detect to determine if a link partner is present on that lane. Lanes that do not detect a link partner, are not used and the FPGA drives electrical idle on these lanes. The second stated entered during link training is the POLLING state. This is the first state that the link partners exchange TS1 and TS2 ordered sets. During this state bit symbol lock and lane polarity are established. The CONFIGURATION state follows POLLING. During CONFIGURATION, link and lane numbers are exchanged through the training ordered sets and the link width is established. Once CONFIGURATION completes, the next state is L0. The L0 state is the normal working state where data is transferred on the link. The core output signal trn_lnk_up_n is asserted during this state. Note that trn_lnk_up_n, does not assert immediately upon entering L0, but asserts after the data link layer achieves the DL.ACTIVE state meaning the initial flow control credits have been exchanged.

Upload: hoangthuan

Post on 28-Aug-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 1

Xilinx Answer 42368

Virtex-5 Integrated PCI Express Block Plus – Debugging Guide For Link Training Issues

Important Note: This downloadable PDF of an Answer Record is provided to enhance its usability and readability. It is important to note that Answer Records are Web-based content that are frequently updated as new information becomes available. You are reminded to visit the Xilinx Technical Support Website and review (Xilinx Answer 42368) for the latest version of this Answer.

Introduction

This document describes techniques to debug link training issues related to designs using the Virtex-5 FPGA Endpoint Block Plus core for PCI Express. A complete list of signals to capture in ChipScope Pro when debugging link training issues has been provided. ChipScope Pro screen captures illustrate how to analyze those signals and establish theories on potential reasons causing the problem.

There are two major sections in this guide. The first section provides an overview of link training including the Link Training and Status State Machine (LTSSM) states and TS1 and TS2 ordered sets. The second section focuses on using ChipScope Pro to capture the relevant signals on the GTP/GTX interface to identify potential problems during link training. This guide helps the user understand how the LTSSM progresses and what states the signals should be in during this progression. This debugging guide concludes with a checklist of common problems to address when having link training issues.

There are usually three major link training failures. One is a complete failure to establish a link of any width; indicated by the core output trn_lnk_up_n not asserting. The second is when the link trains to a lower width than intended, such as an x8 link training as x4. Third, is a link that is constantly entering into the RECOVERY state. Link training problems are normally due to board signal integrity problems or improper GTP/GTX usage. The board must meet both the electrical requirements set forth by the GTP/GTX user guides and also the PCI Express Base Specification.

Link Training Overview

After FPGA configuration, the two connected devices go through the link training process. This Link Training and Status State Machine (LTSSM) defines this process. Figure 1 shows the different states of the LTSSM. The main states to consider while debugging link training issues are DETECT, POLLING, CONFIGURATION, and L0. Detailed descriptions of the LTSSM states are found in Chapter 4 of the PCI Express Base Specification. In the DETECT state, each lane performs receiver detect to determine if a link partner is present on that lane. Lanes that do not detect a link partner, are not used and the FPGA drives electrical idle on these lanes. The second stated entered during link training is the POLLING state. This is the first state that the link partners exchange TS1 and TS2 ordered sets. During this state bit symbol lock and lane polarity are established. The CONFIGURATION state follows POLLING. During CONFIGURATION, link and lane numbers are exchanged through the training ordered sets and the link width is established. Once CONFIGURATION completes, the next state is L0. The L0 state is the normal working state where data is transferred on the link. The core output signal trn_lnk_up_n is asserted during this state. Note that trn_lnk_up_n, does not assert immediately upon entering L0, but asserts after the data link layer achieves the DL.ACTIVE state meaning the initial flow control credits have been exchanged.

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 2

Figure 1: Link Training and Status State Machine (LTSSM)

During the link training process, following are discovered and determined:

Lane Polarity

Link Data Rate

Link and Lane Numbers

Link Width

Lane Reversal

In overall, link training process does following:

Link data rate negotiation

Bit lock per lane

Lane Polarity

Symbol lock per lane

Lane ordering within a link

Link width negotiation

Lane-to-Lane de-skew within a multi-lane link

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 3

Ordered Sets

During the link training process the physical layer communicates by exchanging TS1 and TS2 ordered sets. Ordered sets are packets that originate and terminate in the physical layer. There are four different types of ordered sets. Ordered sets are not scrambled so they are easily viewed using ChipScope Pro or in simulation. The four different types of ordered sets are training sequence ordered sets (TS1s and TS2s), Electrical Idle ordered set, SKP ordered sets, and FTS ordered sets. Link training uses TS1 and TS2 ordered sets to exchange information to establish the link. Occasionally, a SKP ordered set is transmitted during link training so it is necessary to distinguish the difference.

Training Sequence 1 and 2 (TS1 and Ts2)

o TS1 and TS2 ordered sets are comprised of 16 symbols. o The first symbol is COM which is the K28.5 character. The receiver uses this character to achieve Bit

Lock and Symbol Lock. o TS1 and TS2 ordered sets contain information regarding link number, lane number, N_FTS, training

control (such as hot reset, disable link, loopback etc.). For detail information on TS1 and TS2 refer to section 4.2.4 of the PCI Express Base Specification v1.1.

o A TS1 is identified by the D10.2 (4Ah) data character or a D21.5 (B5h) on a polarity reversed link. o A TS2 is identified by the D5.2 (45h) data character or a D26.5 (BAh) on a polarity reversed link.

Table 1 shows the description for each symbol in TS1 ordered set. TS1s and TS2s are the same except for the symbols 6-15 which denotes TS2 identifier for a TS2 ordered set.

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 4

Table 1: TS1 ordered set

Electrical Idle Ordered Set

o The Electrical Idle Ordered-Set consists of four symbols- COM-IDL-IDL-IDL = BC,7C,7C,7C. o The transmitter sends out the electrical idle ordered set before driving electrical idle. o After receiving the electrical idle order set, the link partner prepares for the link to transition to electrical

idle.

SKP Ordered Set

o Consists of four symbols - COM, SKP, SKP, SKP = BC,1C,1C,1C o SKP ordered set is transmitted at regular intervals from transmitter to the receiver. o Used for Clock Tolerance Compensation

FTS Ordered Set

o Also consists of four symbols - COM, FTS, FTS, FTS = BC,3C,3C,3C o A transmitter sends FTS ordered sets o The number of required ordered sets is agreed during link training and initialization

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 5

Capturing Signals in the ChipScope Pro tool

To capture signals in the ChipScope tool, a user can use either the ChipScope Pro Inserter flow or the ChipScope Pro CORE Generator flow. In the Inserter flow, the user would enter NGC file into the tool and the tool automatically lists the signals for the user to select and capture in the ChipScope Pro debugging tool. In the CORE Generator tool flow, the user must generate the ChipScope Pro cores in the CORE Generator software and instantiate it manually in the source file. The ChipScope Pro Inserter flow is easier, but the required signals might not be visible. However, in the CORE Generator tool flow, a user can select to capture any signals in the source file. In this section, the ChipScope Pro Inserter flow is discussed. Since the Block Plus wrapper for PCI Express source files are provided after v1.12 of the core, users might find it more flexible to capture the signals with the ChipScope Pro CORE Generator flow. For more details on the ChipScope Pro Inserter flow and the ChipScope Pro CORE Generator flow, see the ChipScope Pro User Guide (UG029).

In some cases, the signals are optimized away during synthesis, hence the signals cannot be found in the ChipScope Pro

Inserter. In this case, use the KEEP attribute to stop XST from optimizing a particular signal.

In VHDL, declare the KEEP attribute in the file architecture, before the begin keyword:

attribute keep: string

After KEEP and the signal have been declared, specify the VHDL constraint as follows:

attribute keep of signal_name: signals is “true”;

In Verilog, add following: (* KEEP = "{TRUE}" *)

wire signal_name;

Below are the steps to capture signals with the ChipScope Pro Inserter flow. 1. After generating the core in the CORE Generator tool, modify the xst.scr script in the /implement directory to set

KEEP_HIERARCHY to true.

run

-ifn xilinx_pci_exp_1_lane_ep_inc.xst

-ifmt Mixed

-p xc5vlx50t-ff1136-2

-bufg 0

-top xilinx_pci_exp_ep

-ofn xilinx_pci_exp_ep.ngc

-opt_mode SPEED

-opt_level 2

-ofmt NGC

-uc endpoint_blk_plus_v1_13.xcf

-keep_hierarchy YES

2. Run implement.bat [implement.sh] depending on the operating system you are using. 3. Once the synthesis is complete, the NGC file called xilinx_pci_exp_ep.ngc is generated in the /results directory inside of the /implement directory.

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 6

4. Open the ChipScope Pro Core Inserter tool. Specify the location of the input design netlist. If you are using the same name for the output design netlist and the output directory you specify is where the original input design netlist is located, the ChipScope Pro Core Inserter will replace the input design netlist with the output design netlist. If you either rename the output design netlist or specify a different output directory, make sure you replace the input design netlist with the generated output design netlist.

Figure 2: ChipScope Pro Core Inserter - Device and Design Netlist Entry

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 7

5. Select USER1 in the Boundary Scan Chain.

Figure 3: ChipScope Pro Core Inserter - Boundary Scan Chain

6. Enter the Trigger Width as required.

Figure 4: ChipScope Pro Core Inserter - Integrated Logic Analyzer Options

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 8

7. Select the Data Width and the Data Depth as required:

Figure 5: ChipScope Pro Core Inserter - Data Width and Data Depth Selection

8. Double-click on any of the ports shown in red below:

Figure 6: ChipScope Pro Core Inserter - Net Connections

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 9

9. Click on the appropriate section of the structure hierarchy to select the signals.

Figure 7: ChipScope Pro Core Inserter - Selecting Data Signals

10. Select core_clk for the clock signal:

Figure 8: ChipScope Pro Core Inserter - Selecting Clock Signal

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 10

11. After the trigger, data, and the clock signals have been selected click OK and then click Insert.

Figure 9: ChipScope Pro Core Inserter - Final Step, Core Insertion

12. A new NGC file will be generated with the ChipScope Pro core inside the input NGC file. Before closing the ChipScope Pro Core Inserter, save the project. A CDC file will be generated. This CDC file is required to view the signals in the ChipScope Pro analyzer.

13. Re-implement the design by running implement.bat or implement.sh. Make sure a section of the script with

commands to synthesize has been removed. If not, the synthesis will run again and replace the NGC file that contains the ChipScope Pro core. The implementation script should only contain the following:

cd results

echo 'Running ngdbuild'

ngdbuild -verbose -uc ..\..\example_design\xilinx_pci_exp_blk_plus_1_lane_ep_xc5vlx50t-

ff1136-1.ucf xilinx_pci_exp_ep.ngc -sd ..\..\..\

echo 'Running map'

map -timing -ol high -xe c -pr b -o mapped.ncd xilinx_pci_exp_ep.ngd mapped.pcf

echo 'Running par'

par -ol high -xe c -w mapped.ncd routed.ncd mapped.pcf

echo 'Running trce'

trce -u -v 100 routed.ncd mapped.pcf

echo 'Running design through netgen'

netgen -sim -ofmt vhdl -w -tm xilinx_pci_exp_ep routed.ncd

echo 'Running design through bitgen'

bitgen -w routed.ncd

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 11

Debug Signals for Link Training Issues

This section provides a list of debug signals that can be captured in the ChipScope Pro tool to diagnose where the problem is coming from. All the signals are related to the interface between the integrated block for PCI Express and the GTP/GTX interface. Transceiver Interface - Transmit Side

Table 2 shows the transceiver interface transmit side debug signals. All these signals exist for each lane. pipe_tx_data

and pipe_tx_data_k are required to analyse the TS1 and TS2 ordered sets transmitted by the core POLLING and

CONFIGURATION. Table 2: Transceiver Interface - Transmit Side Debug Signals

Signal Name Description pipe_tx_data_k Control bits for the transmit data.

0: Data byte 1: Control byte

pipe_tx_detect_rx_loopback Causes the RocketIO transceiver on the selected lane to begin receiver detection operation

pipe_tx_data Transmit data for selected lane. pipe_tx_compliance When 1, sets the running disparity for the selected lane to

negative. pipe_tx_elec_idle Electrical idle requested on transmit channel of selected lane.

When 1, selects electrical idle on Transmit channel of selected lane. When 0, indicates that there is valid data on pipe_tx_data.

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 12

Transceiver Interface - Receive Side Table 3 shows the transceiver Interface receive side debug signals. These signals are required to analyze the TS1 and TS2 ordered coming downstream from the link partner to the endpoint during POLLING and CONFIGURATION. For further details on signals listed in Table 3, refer to the Virtex-5 FPGA Integrated Endpoint Block for PCI Express Designs User Guide (UG197). Table 3: Transceiver Interface - Receive Side Debug signals

Signal Name Description pipe_rx_status Encodes receiver status and error codes for the received data

stream and receiver detection on selected lane. 000: Data received OK 001: One skip symbol (SKP) added 010: One SKP removed 011: Receiver detected 100: 8B/10B decode error 101: Elastic Buffer overflow 110: Elastic Buffer underflow 111: Receive disparity error

pipe_rx_phy_status Communicates completion of RocketIO transceiver functions like power management, state transitions, and receiver detection on lane.

pipe_rx_data Receive data. pipe_rx_elec_idle Electrical idle detected on receive channel of selected lane. pipe_rx_polarity When 1, tells the RocketIO transceiver on selected lane to do a

polarity inversion (on the received data). pipe_rx_chanaligned Signal from the RocketIO transceiver elastic buffer. Stays High to

denote that the channel is properly aligned with the master transceiver according to observed channel bonding sequences in the data stream.

pipe_rx_data_k Control bit(s) for receive data. 0:Data byte 1:Control byte

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 13

LTSSM States The signal below indicates which LTSSM state the core is currently at. The core goes to Recovery state to achieve bit lock and symbol lock. If the ChipScope Pro tool capture shows frequent transition to the Recovery state, it normally indicates a noisy link.

l0_ltssm_state<3:0>

The states of the link training state machine (l0_ltssm_state) are encoded as shown in Table 4:

Table 4: LTSSM States

Advanced Transceiver Debug Signals The signals below verify whether the different parts of the physical layer of the core are functioning correctly or not. For

example, sync_done is an output of the synchronization module in GTP wrapper. In few cases, it has been observed that

the core was not linking up due to this signal being not asserted. The investigation later found that the design had the

fast_train_simulation_only signal in the design set to 1. The fast_train_simulation_only signal should be

set only in simulation. Make sure the state of the signals below agree with Figure 10. If you see any descrepancies, create a WebCase with Xilinx Technical Support and attach the VCD waveform from ChipScope Pro.

rxchanbondseq<7:0>

gt_deskew_lanes<7:0>

gtreset

clock_lock

TXENPMAPHASEALIGN

tx_sync_reset

TXPMASETPHASE

rxreset

rst_pcie

rxbyteisaligned<1:0>

rxchanbondseq<1:0>

mgt_txreset

icdrreset<0>

gt_pipe_reset<0>

sync_done

resetdone<7:0>

pcie_reset

gt_pipe_reset<7:0>

PLLLKDET_OUT

rxbyteisaligned<7:0>

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 14

LTSSM State Analysis

The first step in debugging a link training problem is to determine in which of the LTSSM states the problem is occurring. Remember, the LTSSM states to look at during link training are DETECT, POLLING, and CONFIGURATION. Once

CONFIGURATION completes the LTSSM moves into the normal operation state of L0 and trn_lnk_up_n is asserted

once the data link layer reaches DL.ACTIVE. Use the signal pcie_blk l0_ltssm_state<3:0> to determine which

state the LTSSM is in and possibly as a trigger to pinpoint potential problems.

Detect State Signal Analysis

Trigger the ChipScope Pro tool when the l0_ltssm_state goes to DETECT. The relevant signals and how they are

toggled while entering the DETECT state is shown in Figure 10.

Figure 10: Debug Signals State during entry to DETECT

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 15

During the DETECT state, the receiver detection takes place on each lane. If the detection process is done correctly, the

following sequence should be observed in the ChipScope Pro tool. Trigger on pipe_tx_detect_rx_loopback_l0.

PCIe Hard Block asserts pipe_tx_detect_rx_loopback.

GTP performs receiver DETECT.

After the receiver is detected, GTP asserts pipe_rx_phy_status and puts 011 on pipe_rx_status to

indicate the receiver is present.

PCIe Hard Block then de-asserts pipe_tx_detect_rx_loopback and pipe_tx_elec_idle.

The ChipScope Pro tool capture of the above sequence is show in Figure 11. If you see any descrepancies, create a WebCase with Xilinx Technical Support and attach the VCD waveform from the ChipScope Pro tool.

Figure 11: pipe_tx_detect_rx_loopback, pipe_rx_phy_status and pipe_tx_elec_idle during DETECT

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 16

Figure 12 shows a zoomed in view of Figure 11.

Figure 12: Zoomed in View

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 17

Polling State Signal Analysis When each link partner enters into POLLING, it begins transmitting TS1 ordered sets. However, each link partner might

not enter polling at the same time, so it is possible that the Xilinx endpoint might be transmitting TS1s on pipe_tx_data

while still receiving 00h on the pipe_rx_data pins. Hence, in the ChipScope Pro tool, when TS1 appears at

pipe_tx_data, pipe_rx_data might still be 00.

To check whether TS1 transmission has started or not, trigger when ltssm_state enters POLLING. The screen shots

below show the ChipScope Pro tool capture of the signals when the endpoint device enters POLLING. As seen in the

image, as soon as the device comes out of the electrical idle (indicated by de-assertion of pipe_tx_elec_idle), the

device starts to send TS1s. Note that the link and lane number are set to PAD value which is F7. TS1 ends with 4A whereas TS2 ends with 45. According to the PCI Express Base Specification v1.1, both devices should send a minimum of 1024 TS1, which amounts to 64 µs, to achieve bit and symbol lock. If there is sufficient buffer space, it would be possible to capture and verify whether 1024 TS1s are transmitted and received or not.

Figure 13 shows the zoomed out view when ltssm_state enters POLLING.

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 18

Figure 13: Debug Signals state during entry to POLLING

Figure 14 shows the zoomed in view when the l0_ltssm_state transitions from DETECT to POLLING.

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 19

Figure 14: Debug Signals state during transition from DETECT to POLLING

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 20

After receiving eight consecutive TS2 ordered sets and transmitting 16 TS2 ordered sets (after receiving one TS2 ordered set), the device exits to the configuration state. Devices at both ends of the link do not exit to CONFIGURATION at the same time. In the ChipScope Pro tool capture below, it shows the endpoint entering the configuration state after exchanging the required number of TS2s. During POLLING, a device will exit to the DETECT if it receives TS1/TS2s with the link and the lane number fields set to a value other than PAD. This indicates a bad link and the issues related with signal integrity should be investigated. Polarity inversion occurs in the POLLING state. If the core sees the complement of the TS1/TS2 (B5/BA) ordered sets, it has to invert the polarity of its differential input pair terminals. The PCIe

® Integrated Block Core asserts

pipe_rx_polarity signal for the corresponding lane where the polarity is reversed to tell the GTP/GTX to invert the

receive polarity for that lane. If the link training issues such as x8 core is training down to x4 core is seen on a board with

polarity inversed on some of the lanes, trigger on the pipe_rx_polarity signal for that lane and check if it is triggered

or not. If it does not trigger, create a WebCase with Xilinx Technical Support. Polarity inversion is a mandatory feature described by PCI Express Base Specification v1.1. All PCI express compliant cores must support polarity inversion on all lanes independently.

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 21

Figure 15: Entry to CONFIGURATION after exchanging required number of TS2s

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 22

Figure 16 is a zoomed in view of Figure 15. In the capture, BC-1C-1C-1C is also seen at pipe_tx_data. This is the SKP ordered set transmitted by the GTP as part of the clock correction sequence. SKP ordered sets are used to compensate for differences in frequencies between bit rates at two ends of a link. SKP ordered set is transmitted periodically. For more information on SKP ordered set, see 4.2.7 of the PCI Express Base Specification (v1.1).

Figure 16: SKP Ordered Set

Configuration State Signal Analysis In CONFIGURATION, link numbers and lane numbers are negotiated. A downstream port proposes a link number to the link partner. The upstream port accepts the link number and returns TS1 ordered sets with the link number value. Next, the downstream port sends the lane numbers. If the upstream port agrees with the proposed lane numbers, it replies with TS ordered set with lane numbers in each lane instead of the PAD value.

If the link trains down (for example, from x8 to x4), rx_data and tx_data at the GTP interface should be captured in

ChipScope Pro to figure out how the change in the link and lane number field in the ordered set are occurring. If the endpoint device is sending lane numbers in all 8 lanes but link partner is replying with the lane numbers for only the first four lanes and the rest still with PAD value, it would potentially indicate some signal integrity issue on the link. The value that endpoint sent in the link in those lanes are probably not understood by the link partner due to the signal integrity issue.

Figure 17 shows the root complex sending TS1s with link number assigned to 00. The endpoint agrees with this and

starts transmitting TS1s on tx_data with link number 00 in the link number field.

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 23

Figure 17: Root Complex sending TS1s with Link Number assigned to ‘00’ After the link number has been negotiated, the root complex then starts to send TS1 with lane numbers on lane number field of TS1. Figure 18 shows the same.

Figure 18: Root Complex sending TS1s with Lane Numbers

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 24

In response to transmission of TS1 with lane numbers from the link partner, the endpoint starts sending the same corresponding lane numbers on each lane, thus agreeing with the lane numbers to communicate with. This is shown in Figure 19.

Figure 19: Lane Number negotiation between the endpoint and the root complex

In CONFIGURATION, the N_FTS value is agreed. In the ChipScope Pro capture above, the endpoint is sending FF in the

N_FTS field in TS1 indicating that the endpoint requires 255 FTS when exiting from L0s to L0 to achieve bit and symbol lock. On the other hand, the root complex sends 32 in its N_FTS field in TS1, indicating that it requires only 32 FTS to be transmitted by the endpoint when exiting from L0s to L0.

In the configuration state, lane-to-lane deskew must occur. This is indicated by the assertion of gt_deskew_lanes as

shown in Figure 20. A user can trigger at this signal to check whether the lane-to-lane deskew has been accomplished or not.

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 25

Figure 20: Lane-to-Lane Deskew Completion

PCIe Clocking

According to PCIe Base Specification v1.1, the reference clock should not exceed +/- 300ppm tolerance. In the case of asynchronous clocking system, the following must be observed:

The ports at the two ends of the link transmit data at a rate that is within 600 ppm of each other at all times.

Spread Spectrum Clocking (SSC) is turned off.

If the link is not training (i.e., trn_lnk_up_n is never asserted), verify that the clock source meets the PCIe Base

Specification requirement. If the clock meets the specification requirement, probe the PLLLKDET signal. This port indicates that the VCO rate is within acceptable tolerances of the desired rate. In other words, the assertion of this signal indicates that the internal PLL is successfully locking on to the incoming reference clock. If PLLLKDET is not asserted, neither of the two GTP transceivers in the GTP_DUAL tile operates reliably. If this signal is de-asserted, check (Xilinx Answer 18329) to make sure the correct clocking infrastructure has been adopted. In the scenario where the clocking infrastructure is correct and PLLKDET remains continuously de-asserted, check GTP Transceiver User Guide (UG196) under the GTP-to-Board Interface chapter. The following points present the excerpts of the detailed description provided in the user guide:

1. Verify that the power supply is noise free. GTP has following power supply pins: MGTAVCC, MGTAVCCPLL, MGTAVTTRX, MGTAVTTRXC and MGTAVTTTX. Among these MGTAVCCPLL, MGTAVTTTX, MGTAVTTRX and MGTAVCC require a filter circuit to suppress the high frequency noise. The Virtex-5 FPGA Data Sheet provides the required exact voltage level and tolerance ranges of these analog supplies.

2. Each Virtex-5 LXT or SXT device requires one 50 ohm external precision (1%) resistor on the PCB (connected directly to the MGTRREF pin and to the closest MGTAVTTTX pin). Make sure that this requirement is correctly met.

3. The GTP Transceivers in the Virtex-5 LXT and SXT FPGAs use a calibration circuit to accurately determine the termination resistance for all transceivers in a column. This circuit is located in bank 112 for each device and utilizes a single reference resistor connected to MGTTREF_112. To correctly power this circuit and allow propagation of the calibration information to instantiated GTP_DUAL tiles, certain power guidelines must be followed. Check (Xilinx Answer 30915) for more details.

4. There are certain requirements to be fulfilled for the reference clock to the GTP. Some of them are mentioned below. For more detail information refer the GTP Transceiver User Guide (UG196):

a. There should be AC coupling between the clock source and the dedicated GTP_DUAL clock input pins. b. It is required to have a dedicated point-to-point connection between the oscillator and GTP_DUAL clock

input pins. c. GTP_DUAL tile that sources a reference clock must be instantiated and REFCLKPWRDNB must be

asserted high. d. If a GTP_DUAL tile is used only for forwarding a reference clock, user should meet the requirement in

Table 5.

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 26

Table 5: GTP_DUAL_TILE Requirements when Forwarding only Reference Clock The link between the transmitter and the receiver is subjected to different disruptive effects such as jitter induced by link transmission, jitter due to dynamic data patterns on the link, noise induced into the signal pair and the signal attenuation due to the impedance effect of the transmission line. Users should take into account all of these possibilities and make sure that the input jitter requirement provided in section 4.3 of the PCI Express Base Specification v1.1 is properly followed. Using a high speed scope, capture the link eye diagram and verify that the eye diagram meets the requirement as defined in section 4.3.3.1 (Transmitter Compliance Eye Diagrams) and 4.3.4 that talks about minimum receiver eye timing and voltage compliance specification. In the PCI Express Base Specification v1.1, it provides two tables in section 4.3.3 and 4.3.4:

a. Differential Transmitter (TX) Output Specifications b. Differential Receiver (RX) Input Specifications.

Users should verify that their design complies with the parameter values provided in these tables. For proper high-speed operation, the GTP transceiver requires a high-quality, low-jitter reference clock. Using a high speed scope, measure the input jitter on the provided reference clock. Verify that the measured jitter is within the jitter margins provided in the Virtex-5 FPGA Data Sheet.

AC Coupling

As defined in the specification, it is required to put AC coupling capacitors at the transmitter lanes differential signal pair. The value of AC coupling capacitor is between 75nF and 200nF. The user should make sure that the PCI express card has AC coupling capacitor placed at the close proximity of the transmitter lane. Check if the correct AC capacitor value has been put in place or not. There might be a possibility for a cracked capacitor. Ensure it is not the case.

Pre-emphasis (or De-emphasis)

To reduce the effect of inter symbol interference, PCI express employs the concept of de-emphasis. Pre-emphasis and De-emphasis are basically the same. If five consecutive bits are transmitted with the same polarity, the bits after the first bit is de-emphasized compared to the first bit. In other words, the first bit is pre-emphasized compared to the rest of the four following bits. Each GTP transceiver has a TXPREMPHASIS port for controlling pre-emphasis. Table 7 from GTP Transceiver User Guide (UG196), shows the percentage decrease of signal amplitude for de-emphasized bits at each TXPREEMPHASIS level. The higher the percentage, the more de-emphasis is applied. The user should be careful in using the pre-emphasis feature. Too much pre-emphasis can result in signal distortion. In the case where the link is down training, it is suggested

Pin or Pin Pair Connect To Filter

MGTRXP/MGTRXN GND -

MGTTXP/MGTTXN Floating, no connection -

MGTREFCLKP/MGTREFCLKN Floating, no connection -

MGTAVTTTX GND -

MGTAVTTRX GND -

MGTAVTTRXC MGTAVTTRX Y

MGTAVCCPLL 1.2V dedicated supply Y

MGTAVCC Vccint -

MGTRREF MGTAVTTTX with resistor -

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 27

to try with different pre-emphasis values. The value can be configured in the CORE Generator interface during core generation, or can be directly changed in the source for the GTP/GTX wrapper. The amplitude of the TX driver’s differential swing can be controlled using the TXDIFFCTRL ports. TXDIFFCTRL controls the drive strength of the main pad driver and the pre-emphasis pad driver.

Table 6 shows the differential output voltage for different settings at the port. Along with the pre-emphasis (Table 7), it is suggested to try different values for this port when having link training problems. Table 6: Transmitter Output Swing

Port Value Transmitter Differential

Swing(mV)

TXDIFFCTRL0[2:0] = TXBUFDIFFCTRL0[2:0] TXDIFFCTRL1[2:0] = TXBUFDIFFCTRL1[2:0]

000 1100

001 1050

010 1000

011 900

100 800

101 600

110 400

111 0

Table 7: Transmitter Pre-emphasis Settings

Port

Value

Transmitter Pre-emphasis(%)

TXPREEMPHASIS0[2:0] TXPREEMPHASIS1[2:0]

Pre-emphasis Boost Off

TX_DIFF_BOOST = FALSE (Default

Setting)

Pre-emphasis Boost On

TX_DIFF_BOOST = TRUE

000 2 3

001 2 3

010 2.5 4

011 4.5 10.5

100 9.5 18.5

101 16 28

110 23 39

111 31 52

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 28

Virtex-5 PCI Express Protocol Standard Characterization Test Report provides an example of an eye diagram that illustrates the affect of applying pre-emphasis.

Figure 21: Before applying pre-emphasis Figure 22: After pre-emphasis is applied

Signal Integrity

Multi-lane designs can introduce crosstalk and noise onto the serial lanes. When having link training issues with multi-lane links, first try isolating the upper lanes and force the link to attempt to train as an x1. For add-in cards, this can be done by using any interposer or by placing tape on the upper lane pins on the connector. Use a tape similar to Scotch tape.

The signal to monitor to detect probable link issues is pipe_rx_status. Table 8 shows the possible values for

pipe_rx_status and the interpretation for each value.

Table 8: pipe_rx_status values

PIPERXSTATUSLn[2:0]

000: Data received OK 001: One skip symbol (SKP) added 010: One SKP removed 011: Receiver detected 100: 8B/10B decode error 101: Elastic Buffer overflow 110: Elastic Buffer underflow 111: Receive disparity error

What to check if 8b/10b error is reported

When the incoming data is corrupted due to crosstalk or other forms of interference on the link, the pipe_rx_status

signals would normally indicate 8B/10B errors or disparity errors. When 8B/10B or disparity errors are seen, the following tests should be performed.

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 29

Measure the eye diagram of the receive direction using a high speed scope. Verify that the measurement meets the requirements of PCI Express Base specification v1.1. While using the high speed scope, probes should be placed as close as possible to the FPGA receive pads and before the AC capacitors.

Ensure that AC capacitors have been placed on the receive and transmit lanes as discussed in the AC Coupling section.

If the rx_status signal reports errors followed by the frequent LTSSM transition to RECOVERY, it is an indication of

possible signal integrity issues on the board. It is advised to consult a signal integrity expert to debug such issues.

GTP Parameter Modification

When generating Endpoint Block Plus Wrapper for PCI Express in Virtex-5 FPGA, the GTP/GTX wrapper is generated with all recommended settings. Although it is not recommended to change the default parameters, it might be necessary

to do so during the debug procedure. If there is a problem with the link such as trn_lnk_up_n is not asserted or the link

is training down, then change the TXDIFFCTRL and TXPREMPHASIS values as discussed in the Pre-emphasis (or De-emphasis) section. This is for the transmit side. For the receiver side, the user can test with receiver equalization value. For more information, refer to the Virtex-5 RocketIO GTP Transceiver User Guide (UG196). Table 9 (in the user guide) shows ports related to the receiver equalization. Table 9: RX Termination and Equalization Ports:

Port Dir Clock Domain Description

RXENEQB0 RXENEQB1

In

Async

Active-Low port for enabling linear receive equalization: 0: Receiver equalization is enabled 1: Receiver equalization is disabled

RXEQMIX0[1:0] RXEQMIX1[1:0]

In

Async

This port controls the wideband/high-pass mix ratio for the RX Equalization circuit. The following ratios are available: 00: 50% wideband, 50% high-pass 01: 62.5% wideband, 37.5% high-pass 10: 75% wideband, 25% high-pass 11: 37.5% wideband, 62.5% high-pass

RXEQPOLE0[3:0] RXEQPOLE1[3:0]

In

Async

This port controls the location of the pole in the RX equalizer high-pass filter. Adjusting this value shifts the threshold for rejecting low-frequency signals. The following settings are available: 0xxx: Filter pole depends on resistor calibration 1000: 0% nominal pole 1001: -12.5% 1010: -25.0% 1011: -37.5% 1100: +12.5% 1101: +25.0% 1110: +37.5% 1111: +50.0%

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 30

Following are some of the signals that could be checked to make sure the GTP/GTX side is working correctly or not. The signals could be probed with ChipScope Pro tool. The signals are shown in the ChipScope Pro tool capture mentioned in the LTSSM State Analysis section.

RESETDONE must be asserted

PLLLKDET must be asserted

RXBYTEREALIGN/RXBYTEISALIGNED must be asserted

RXCHANREALIGN/RXCHANISALIGNED must be asserted The things to check if PLLLKDET is not asserted have already been discussed in the PCIe Clocking section. Apart from that, verify if there exists incoming clock or not. This can be checked by probing REFCLKOUT and TXOUTCLK. RECLKOUT is the same as CLKIN. It is the free-running clock (i.e., it operated before the PLLLKDET is asserted). TXOUTCLK is not a free-running clock; it is only valid after PLLLKDET is asserted.

Changing GTP Settings with ChipScope Pro VIO

GTP/GTX settings such as TXDIFFCTRL, TXPREEMPHASIS, RXEQPOLE and RXEQMIX can be changed to figure out the suitable value for the particular system under test. It would be tedious to re-implement the design each time for different values. Instead, use the ChipScope Pro VIO core to dynamically change the values associated with these signals in order to find the best values for a given board.

The VIO core cannot be inserted using the ChipScope Pro Inserter. The CORE Generator flow must be used to implement a VIO core. Generate the ICON, ILA and VIO cores, and then integrate the generated top-level file in the wrapper where the signals are to be manipulated. The cores are generated separately in the CORE Generator software as shown in the screen captures below:

Figure 23: List of ChipScope Pro cores in CORE Generator

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 31

Before generating the core, calculate the number of trigger signals required for the ILA core and how many outputs are

required from the VIO core. In the example below, the ltssm_state and trn_lnk_up_n were selected as the trigger

signals, thus Trigger Port Width of 5 is selected in the ILA core generation (shown in Figure 24).

Figure 24: ChipScope Pro Integrated Logic Analyzer For the VIO core, select signals based on the those that are to be changed dynamically. In the example below, all four signals: TXDIFFCTRL, TXPREEMPHASIS, RXEQPOLE and REXEQMIX have been selected totaling the number of output ports from VIO core to be 24.

Figure 25: ChipScope Pro Virtual Input/Output

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 32

In Figure 25, Enable Synchronous Output Port is selected since the output from the core is provided as the input to the GTP. Select two control ports when generating ICON core.

Figure 26: ChipScope Pro Integrated Controller (ICON) After all the cores have been generated, modify the pcie_gt_wrapper.v file to add following: input wire [3:0] ltssm_state_gt_wrapper,

…………………………………………………………..

…………………………………………………………..

wire [4:0] trig0;

wire [35:0] control0;

wire [35:0] control1;

wire [23:0] sync_out;

assign trig0[4:1] = ltssm_state_gt_wrapper;

assign trig0[0] = trn_lnk_up_n;

…………………………………………………………..

…………………………………………………………..

.RXEQMIX0(sync_out[23:22]),

.RXEQMIX1(sync_out[21:20]),

.RXEQPOLE0(sync_out[19:16]),

.RXEQPOLE1(sync_out[15:12]),

//.RXEQMIX0(2'b01),

//.RXEQMIX1(2'b01),

//.RXEQPOLE0(4'b0000),

//.RXEQPOLE1(4'b0000),

…………………………………………………………..

…………………………………………………………..

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 33

.TXDIFFCTRL0(sync_out[5:3]), //3'b100

.TXDIFFCTRL1(sync_out[2:0]), //3'b100

…………………………………………………………..

…………………………………………………………..

//.TXDIFFCTRL0(gt_txdiffctrl_0), //3'b100

//.TXDIFFCTRL1(gt_txdiffctrl_1), //3'b100

…………………………………………………………..

…………………………………………………………..

.TXPREEMPHASIS0(sync_out[11:9]), //3'b111

.TXPREEMPHASIS1(sync_out[8:6]), //3'b111

…………………………………………………………..

…………………………………………………………..

//.TXPREEMPHASIS0(gt_txpreemphesis_0), //3'b111

//.TXPREEMPHASIS1(gt_txpreemphesis_1), //3'b111

…………………………………………………………..

…………………………………………………………..

//-----------------------------------------------------------------

// ILA core instance

//-----------------------------------------------------------------

Chipscope Pro_ila i_ila

(

.CLK(gtclk_bufg),

.CONTROL(control0),

.TRIG0(trig0)

);

//-----------------------------------------------------------------

// ICON core instance

//

//-----------------------------------------------------------------

Chipscope Pro_icon i_icon

(

.CONTROL0(control0),

.CONTROL1(control1)

);

//-----------------------------------------------------------------

// VIO core instance

//-----------------------------------------------------------------

Chipscope Pro_vio i_vio

(

.CLK(gtclk_bufg),

.CONTROL(control1),

.SYNC_OUT(sync_out)

);

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 34

Modify the pcie_gt_wrapper_top.v file to add the highlighted lines shown below: input wire [3:0] ltssm_state_gt_wrapper_top,

output wire [7:0] gt_rx_elec_idle,

output wire [23:0] gt_rx_status,

pcie_gt_wrapper_i

(

.ltssm_state_gt_wrapper (ltssm_state_gt_wrapper_top),

.gt_rx_elec_idle (gt_rx_elec_idle),

.gt_rx_status (gt_rx_status),

…………………………………………………………..

Implement the design after making the above modifications. If the design is implemented in the ISE tools, you should see the following hierarchy:

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 35

After making these modifications, implement the design and download the new bit file to the board. Open the ChipScope Pro analyzer. You will initially see the following without any corresponding signal names as shown in Figure 17:

Figure 27: ChipScope Pro Analyzer: Waveform, Trigger Setup and VIO Console Manually modify the names to make it more readable. To rename the signals, check the pcie_gt_wrapper.v file and rename the signals accordingly. Figure 28 shows the signals after renaming the original names.

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 36

Figure 28: ChipScope Pro Analyzer with Signals Renamed Modify the parameter by selecting the values in the VIO console. Trigger can be setup to trigger at different LTSSM states.

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 37

Debugging Checklist

So far, different aspects of debugging link training issues were discussed. In this section, few debugging tips are provided based on experience while working with customers who ran into link training issues.

1. If the board has some of the lanes polarity reversed and the link is training down, capture pipe_rx_polarity in

the ChipScope Pro tool. The pipe_rx_polarity should be asserted for the lanes which are polarity reversed.

The assertion of this signal indicates that the core has detected the polarity inversion. By asserting the

pipe_rx_polarity signal, it tells the transceiver to reverse the polarity of the incoming signal.

2. Make sure the endpoint device is not in reset. This could be checked by capturing the sys_reset_n signal in the

ChipScope Pro tool.

3. Probe the signal lines and make sure that the signal levels are within the level provided in the specification.

4. If the link is not consistently training, check the value of fast_train_simulation_only core input. This

should be set to 1 for simulation only. For hardware implementation it should be set to 0.

5. The link up issue might occur if the MGTAVTTRCAL pin is not connected to its power supply. Make sure the pins are connected correctly. For more information, refer to the Virtex-5 RocketIO GTP Transceiver User Guide (UG196).

6. The signal trn_reset_n must deassert (go to logic 1) before the link can train (i.e., trn_lnk_up_n is

asserted). If trn_reset_n is not asserted, the probable reasons are as follows:

Fundamental reset (indicated by signal sys_reset_n) is not asserted.

Loss of Transceiver PLL Lock (indicated by signal plllkdet).

Loss of the fabric PLL lock (indicated by signal clock_lock).

Therefore, to debug the issue regarding trn_reset_n not being de-asserted, follow these steps:

Ensure that sys_reset_n is released or asserted High.

Check that the transceiver PLLLKDET output is asserted. There is one output per lane.

Check that the fabric PLL (PLL or MMCM) is locked. Note that the reset to the PLL is tied to the PLLLKDET output of the transceivers and if that signal is not asserted, then the PLL will not lock.

For more details, refer to (Xilinx Answer 34894). 7. Make sure the scrambling is not turned off. When you generate the core, there is an option in the CORE

Generator interface where you can force no scrambling. For normal operation of the hardware, this option should not be checked.

8. Make sure there is no timing error reported in the design implementation.

9. If it is a custom board, it could be a problem with board. To rule out the board issue, try with demo boards (e.g., ML555).

Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 38

Conclusion

This document presented different aspects of debugging link training issues in the Virtex-5 FPGA Endpoint Block Plus Core. If a user of the core is experiencing issues with link training, it is recommended to go through this document and check the provided suggestions. With this document, it is expected that the user will be able to capture the signals related to link training in the ChipScope Pro tool, perform analysis of the captured waveform to figure out where the problem might be. If this document does not help to resolve the problem, please create a WebCase with Xilinx Technical Support. Attach all of the captured ChipScope Pro waveforms, and the details of your investigation and analysis.

Revision History

07/19/2011 - Initial release