improvement of the pci pass-through...arbitrary mapping (4) appear immediately (1) add (2) attach...

37
All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009 Jun Kamada <[email protected]> Akio Takebe <[email protected]> FUJITSU LIMITED Improvement of the PCI pass-through

Upload: others

Post on 17-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009

Jun Kamada <[email protected]>Akio Takebe <[email protected]>

FUJITSU LIMITED

Improvement of the PCI pass-through

Page 2: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 20092

� Background

� Why SCSI ?

� pvSCSI and PCI pass-through

� Part 1: Current status of pvSCSI enhancements

� Part 2: The booting guest with PCI pass-through

AgendaAgenda

Page 3: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 20093

BackgroundWhy SCSI ? (1/2)

backup restore

load

unload

Storage

Tape drive Tape cartridge

Safety

box

move

preserve

Backup to tape is a fundamental functionality for reliability and availability.

� Free to move (to safe place)

� Long term preservation

Tape drive is usually controlled by SCSI functionality.

SCSI support on guest VM is highly desired in virtualized environment. (Issuing SCSI command from guest VM)

Page 4: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 20094

BackgroundWhy SCSI ? (2/2)

� In data center, reliability and availability (e.g. hardware snapshot, tapebackup) are provided by SCSI feature.

� These servers are consolidated into a server in virtualized environment.

SCSI support on guest VM is mandatory.(Issuing SCSI command from guest VM)

DB

ServerBackup

Server

snapshotData

FileData

File

DBMS

Storage (RAID)

Tape Drive

SCSI commandSCSI command

Load, unload, reset

Hardware snapshot

LAN

SANSAN

DataCenter

Page 5: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 20095

BackgroundpvSCSI and PCI pass-through

� We have developed the pvSCSI driver andwill continue to enhance it.

� Report current status of the enhancement. (Part 1)

� On the other hand, we have needs to provide …

� Reliability with hardware assist. (e.g. PCIe AER, …)

� Seamless move between P and V.

We are focusing on SAN/PXE boot using VT-d/IOMMU.

� Report enhancements of guest BIOS in order to provide SAN/PXE boot. (Part 2)

Page 6: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 20096

Current status of pvSCSI enhancements

Part 1Part 1

Page 7: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 20097

LUN

1:0:1:3

Dom0

LUN LUN…

Physical SCSI tree(s)

LUN LUN

Virtual SCSI tree

0:0:0:1 0:1:2:3 2:0:0:12:0:0:0

Physical SCSI host(host=0)

Virtual SCSI host(host=2)

Physical LUNs Virtual LUNs

Guest Domain

LUN LUN

Virtual SCSI tree

2:0:0:12:0:0:0

Virtual SCSI host(host=2)

Virtual LUNs

(3) Add

The pvSCSI driver for Xen 3.3.0 provides:

� LUN(Logical Unit Number) pass through

� LUN hot-plug

Arbitrarymapping

(4) AppearImmediately

(1) Add

(2)Attach

Current implementation (Xen 3.3.0)Current implementation (Xen 3.3.0)

Page 8: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 20098

Current implementation provides completely virtualized (arbitrarily mapped) SCSI tree to guest domain.

� Some kind of SCSI commands (REPORT_LUN, EXTENDED_COPY, …) should be emulated on backend.

(They depend on physical topology of SCSI tree.)

� A lot of work is needed in order to Implement emulation logic for all the commands, so current implementation supports only mandatory commands.

Does not support full SCSI functionality. :-(

It can provide flexibility, but …

Issue of current implementationIssue of current implementation

Page 9: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 20099

1. Implement all emulation logics step by step.

� Hard work.� Cannot support some vendor specific commands, maybe.

2. “Add” new mode in order to attach whole HBA toguest domain. (It allows bypassing “SCSI commandemulation” on backend driver.)

� Easy to implement. (Details will be shown in following slide.)� Can support all vendor specific commands.

We took second approach.

How to solve the issueHow to solve the issue

Page 10: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200910

Dom0

LUN LUN

Physical SCSI tree

LUN LUN

Virtual SCSI tree

0:0:0:1 0:1:2:3 2:1:2:32:0:0:1

Physical SCSI host(host=0)

Virtual SCSI host(host=2)

Physical LUNs Virtual LUNs

Guest Domain

LUN LUN

Virtual SCSI tree

2:1:2:32:0:0:1

Virtual SCSI host(host=2)

Virtual LUNs

(2)Attach

(1)Create

Additional implementation provides:

� Host (HBA: Host Bus Adaptor) pass through

Same ID(underline only)

Posted implementation (1/2)Posted implementation (1/2)

Page 11: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200911

� Backend Driver

� LUN/Host mode identification flag for each virtual SCSI tree� Emulation bypassing logic (if the flag shows “Host mode”)

� Frontend Driver

� No need to modify

� xend

� User interface (in order to specify “Host mode”)� LUN scan logic (provides shorter processing time by using“lsscsi” command, if exist. (Community’s request))

Following are modifications actually needed

Posted implementation (2/2)Posted implementation (2/2)

Page 12: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200912

� We posted a series of patches on last week and, they were already merged into the unstable tree. (Thanks!)

� Please try and evaluate them. Many commentsare appreciated.

Thanks

Conclusion (Part 1)Conclusion (Part 1)

Page 13: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200913

The booting guest with PCI pass-through

Part 2Part 2

Page 14: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200914

IntroductionWhat is problem of booting from PCI pass-

through?

IntroductionWhat is problem of booting from PCI pass-

through?

qemu

guestdom0

Emulation disk

(boot disk)

Pass through disk

(data disk)

qemu

guestdom0

Pass through disk

(boot disk & data disk)

Before After

Page 15: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200915

Contents of Part2Contents of Part2

�What are required for SAN/SAS boot

� Details of the requirements

� Status of the requirements

� Sample

� Other challenge (PXE boot)

� Some concerns

Page 16: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200916

What are required for SAN/SAS boot?What are required for SAN/SAS boot?

BCV: Boot Connection Vector. It’s typically used by SCSI controller.

• int 0x13 handler of pass through device

(support calling convention of BCV style)

• BIOS function

• PMM(POST Memory Manager) service

• PnP runtime function

• IPL/BCV table

Page 17: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200917

Details of the requirementsDetails of the requirements

Page 18: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200918

Calling PCI expansion ROM(1/3)Calling PCI expansion ROM(1/3)

What is BCV style?

BCV is a pointer that points to code inside the Expansion ROM. By using the code, PCI cards supporting the boot spec of BCV style can hook INT 0x13 at the device initialization. Then BIOS can access the harddisk connected to the PCI cards by using the special INT 0x13 handler.

IDE disk handler

CD-ROM handler

FDD handler

PnP device handler

BIOSBIOS needs to read MBR of a boot disk.

int 0x13

Page 19: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200919

0xc0000

0xea000

1. Hvmloader map the Expansion ROM to 0xc0000

ROM header

PCI data structure

Image

3. rombios jump to Entry point for INIT function after supplying ax register with bus:dev:function number.

Calling PCI expansion ROM (2/3)How to initialize Expansion ROM

Calling PCI expansion ROM (2/3)How to initialize Expansion ROM

2. Hvmloader,rombioschecks some data

PnP Expansion Header

jmp <address>

0xaa55

0h signature

2h Image size

3h Entry point for INIT function

6h reserved

18hPointer to PCI data Structure

PnP Expansion Header

1Ah

Page 20: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200920

Calling PCI expansion ROM(3/3)How to initialize Expansion ROM

Calling PCI expansion ROM(3/3)How to initialize Expansion ROM

0xc0000

0xea000

ROM header

PCI data structure

Image

PnP Expansion Header

jmp <address>

0xaa55

0h signature

2h Image size

3h Entry point for INIT function

6h reserved

18hPointer to PCI data Structure

PnP Expansion Header

1Ah

(0000h is none)

$PnP…

0h signature… …

06h offset of next header

09h checksum

BCV16h

……

………

………

Next PnP Expansion Header

4. Jump to BCV for hooking INT 0x13h

Code to hook INT 0x13h

Page 21: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200921

PMM service?PMM service?

� The PMM provides memory allocation only during POST.

� PCI expansion ROM use PMM service. For example, PCI expansion ROM need a memory block to decompress their code and to allocate data area only used during initialization.

Page 22: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200922

PnP runtime function?PnP runtime function?

� PnP runtime functions are used by O/S and application program. It allow them to access BIOS features. (Get version, the number of device, …)

� PCI expansion ROM may check only PnP Installation Check structure to determine if the system has a Plug and Play BIOS.

Page 23: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200923

IPL/BCV table?IPL/BCV table?

� IPL Table/IPL priority

� IPL Table/IPL priority decide in which order devices will be selected for booting.

� In the case of xen, they are configured like “boot=cda” in a guest configuration file.

� BCV Table/BCV priority

� BCV Table/BCV priority decide in which order devices will be selected for installing INT 0x13 handler.

� The order would affect the boot order.

Page 24: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200924

Example of Boot OrderExample of Boot Order

HDD disk

Network

CD-ROM

Floppy

IDE diskIPL table BCV table

IPL priority

1

2

3

4

2

4

1 3BCV priority

2

2 1

1

Floppy

Network

CD-ROM

4

1 2

1

3

Additional PCI card

IDE disk

Additional PCI card

(e.g. SCSI card)

Boot Order

IPL BCV

1

2

Page 25: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200925

Status of the requirementsStatus of the requirements

Page 26: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200926

Calling convention

�Supported the calling convention of BCV style in BIOS Boot spec

BCV cover not only PCI device but also ISA device.

But IOMMU does not support ISA devices.

So we supported only the calling convention of BCV style for PCI devices.

Status(1/3)Status(1/3)

Page 27: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200927

BIOS function

�PMM would be needed in some PCI cards.

�PnP runtime function would be not called (in my experience). But we need to support dummy PnP runtime function because some Expansion ROM may check only supporting PnP runtime function.

PMM has already been supported by Kouya Shimura

The dummy PnP runtime function is easy to support.

In Bochs community, Sebastian Herbszt has already posted the patch.

Status(2/3)Status(2/3)

Page 28: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200928

BCV table/BCV priority

How to support BCV priority for pass-though device on Xen?

A) If without Emulation disks, boot a pass-through device.

B) If we specify a pass-through device as a bootable, the expansion ROM of only the device is loaded.

For example, pci= [ “bb:dd.ff,boot=1” ]

C)Enhance the IPL table. If pass-through device is specified in boot order, the pass-through device of boot=1 option is selected as a boot device.

For example, boot=“p”.

Status(3/3)Status(3/3)

Page 29: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200929

SampleSample

Page 30: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200930

Other challenge(PXE boot with PCI pass-through)

Other challenge(PXE boot with PCI pass-through)

Page 31: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200931

Issue of PXE bootIssue of PXE boot

� Expansion ROM of Ethernet

� Almost PnP devices of ethernet don’t have Expansion

ROM image on themselves.

� So we try to use gPXE for booting from a pass-through devices.

Page 32: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200932

Try PXE boot with gPXETry PXE boot with gPXE

Configuration & Hack

� Comment out checking the device number in hvmloader.

� Don’t specify emulation nic.

� Only specify a nic of pass through device.

� Recompile gPXE with the driver of the device and remake eb-roms.h

Page 33: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200933

Result…Result…

� gPXE may not support the NIC cards

� gPXE may check device-id/vendor-id and so on inside itself.

� Need more debug…

Page 34: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200934

Some concernsSome concerns

� Lack of I/O port space� Boot device use I/O port but I/O port is only 64k.

�MMIO problem� See docs/misc/vtd.txt (Assigning devices to HVM

domains)

� Dependency of Multifunction device� Some Multifunction device don’t work when we pass

the single function to guest.

� pci.hide option� If we use many pass-through devices, pci.hide option

will be very long…

Page 35: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200935

Any question?

Q&AQ&A

Page 36: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200936

This work was partly funded by Ministry of Economy,

Trade and Industry (METI) of Japan as the Secure

Platform project of Association of Super-Advanced

Electronics Technologies (ASET).

Page 37: Improvement of the PCI pass-through...Arbitrary mapping (4) Appear Immediately (1) Add (2) Attach ... IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will

All Rights Reserved, Copyright (C) FUJITSU 2007 - 200937

Thank you