ovm sparc rootdomains wp 1914481

22
An Oracle White Paper February 2013 Implementing Root Domains with Oracle VM Server for SPARC

Upload: kishore-thakur

Post on 13-Apr-2015

53 views

Category:

Documents


6 download

DESCRIPTION

LDOMs

TRANSCRIPT

Page 1: Ovm Sparc Rootdomains Wp 1914481

An Oracle White Paper

February 2013

Implementing Root Domains with Oracle VM Server for SPARC

Page 2: Ovm Sparc Rootdomains Wp 1914481

Implementing Root Domains with Oracle VM Server for SPARC

Introduction .............................................................................................. 2

Overview of Virtualization Technologies ............................................... 3

Oracle VM Server for SPARC Overview ............................................... 4

Introducing Root Domains ........................................................................ 8

I/O Domain Models ............................................................................... 8

Root Domains ....................................................................................... 9

Operational Model With Solaris Zones .................................................... 11

Zone Workloads ................................................................................. 11

Operational Use Cases .......................................................................... 13

SPARC T4-4 (4 Socket) ...................................................................... 13

SPARC T4-4 (2 Socket) ...................................................................... 14

SPARC T4-2 ....................................................................................... 15

General Domaining Considerations .................................................... 16

One Domain ....................................................................................... 16

Two Domains ..................................................................................... 16

Four Domains ..................................................................................... 17

Three Domains ................................................................................... 17

Hybrid Domains .................................................................................. 18

Populating PCIe Slots ......................................................................... 18

Building Root Domains: How-To ............................................................. 19

Preparing the Control Domain ............................................................ 19

Building Root Domains ....................................................................... 20

Building Root Domains with virtual I/O for boot and networking .......... 20

Page 3: Ovm Sparc Rootdomains Wp 1914481

2

Introduction

This paper describes how to implement an increasingly useful type of virtualization feature known as

root domains. It also describes an operational model where root domains can be used in conjunction

with Solaris Zones for maximum performance and a high degree of flexibility.

Root domains are a type of logical domain characterized by the fact that they own one or more PCIe

root complexes and therefore do not share I/O components with other logical domains. System

resources are merely divided among the domains (rather than being virtualized and shared). This type

of virtualization is unique to the SPARC systems supported by Oracle VM Server for SPARC and

offers a number of benefits, specifically bare metal performance (i.e. without virtualization overhead)

and lower interdependence among domains.

Zones are a feature of Oracle Solaris and this model describes how to use them in a complementary,

layered fashion together with root domains. In this model root domains form the lower layer, and

Zones the second layer above the domains. The combination allows highly flexible and effective

virtualization, resource management, workload isolation and mobility.

The main objective of this paper is to introduce the root domain model as one of several architectures

that the Oracle VM Server for SPARC product supports. It has a number of benefits, as well as a

number of restrictions. The architectural decisions of how to best deploy Oracle VM Server for

SPARC technology is driven by the business needs and requirements, and may involve a combination

or hybrid approach.

Page 4: Ovm Sparc Rootdomains Wp 1914481

3

Overview of Virtualization Technologies

Type 1 Hypervisors

A hypervisor layer sits between the hardware and general-purpose operating system guests. It provides some level of platform virtualization and abstraction to these guests. It is generally said that the guests run on top of the hypervisor. This type of compute virtualization is primarily designed for servers. The way the hypervisor is implemented has a measurable impact on performance. This is discussed in more detail in the next section. Oracle VM Server for SPARC is a Type 1 hypervisor.

Type 2 Hypervisors

The hypervisor runs within a conventional operating system. Guest operating systems then run within this hypervisor. This type of compute virtualization is primarily designed for desktop users and is beyond the scope of this paper.

Oracle VM VirtualBox product is a Type 2 hypervisor.

OS-based Virtualization

In this type of virtualization the general-purpose operating system itself provides a form of workload isolation within its own kernel. Only one operating system is actually running even though it may appear that multiple instances exist simultaneously. A hypervisor is not required to implement OS-based virtualization. Solaris Zones is a mature, market leading form of OS-based virtualization. In this paper we explain how to utilize Solaris Zones in conjunction with Oracle VM Server for SPARC in order to provide a highly performant and flexible virtualization environment.

Page 5: Ovm Sparc Rootdomains Wp 1914481

4

Oracle VM Server for SPARC Overview

A traditional hypervisor virtualizes all system resources (CPU, memory and I/O). By design the hypervisor abstracts the underlying hardware platform from the guest operating systems. Most VM systems run guests in a constrained user mode, so every time the guest OS wants to perform a privileged instruction it has to trap and context switch into the hypervisor or perform a binary rewrite of the instruction. The hypervisor then executes or emulates the privileged instruction in the context of that virtual machine alone. This is necessary because a VM guest OS cannot be allowed to perform certain “privileged” operations such as changing memory mapping or to masking hardware interrupts directly on the real hardware. To permit virtual machines to do such operations would violate the separation of virtual machines from one another and compromise integrity. This level of virtualization often incurs a measurable performance overhead. Operating systems frequently perform privileged instructions of the types mentioned above, and in a virtualized environment this can cause tens of thousands of context switches per second to emulate privileged operations, especially for I/O intensive workloads. This penalty manifests itself in two ways.

1. A portion of the resources must be dedicated to running the hypervisor (a virtualization tax).

2. Overall throughput is lower and latency is higher due to the overhead inherent in this type of virtualization.

Figure 1 - Traditional Hypervisor

The architecture of Oracle VM Server for SPARC, on the other hand, is modular and consists of two main components.

1. A firmware based hypervisor which allocates resources (CPU, memory and optionally I/O devices) directly to logical domains

2. Software based I/O virtualization, which runs in a service domain and provides virtual I/O services to the guest domains. This is an optional component.

The hypervisor sits directly on bare metal, and owns all the hardware resources: CPU, Memory and I/O. The guest domains running the workloads must involve the hypervisor for access to ALL resources.

Page 6: Ovm Sparc Rootdomains Wp 1914481

5

Figure 2 – Oracle VM Server for SPARC Service Domain Model

Unlike a traditional hypervisor, CPU threads and memory are allocated directly to the logical domains. The guest OS (Solaris) running within that logical domain runs on “bare metal” and can perform the aforementioned privileged operations directly, without the need to context switch into the hypervisor. This key differentiator illustrates why this logical domain model does not carry a “virtualization tax” from a CPU and memory overhead perspective as the conventional model does. Furthermore, in Oracle VM Server for SPARC using virtualized I/O is optional. It is possible to assign the I/O directly to the logical domains themselves, creating what we call the root domain model.

Figure 3 - Oracle VM Server for SPARC Root Domain Model

In this model the hypervisor is simply performing the task of resource division and isolation. The hypervisor creates the logical domains and allocates resources to them but is not involved in the execution of instructions, in the allocation and access of memory, nor in performing I/O operations. It sets up the boundaries and gets out of the way. This further eliminates the “virtualization tax” from an I/O perspective.

The hypervisor is a feature of SPARC T-series architecture. It is controlled from the Primary Domain. A Service Domain owns all the I/O and is allocated CPU and Memory for its own purposes The guest domains running the workloads have direct access to the CPU and Memory allocated to them, and I/O is provided as virtualized I/O via the Service Domain.

PCIe devices or an entire PCIe root complex can also be allocated directly to the guest domains. The guest domains running the workloads now have direct access to CPU, Memory and I/O and run with bare-metal performance.

Page 7: Ovm Sparc Rootdomains Wp 1914481

6

In summary, Oracle VM Server for SPARC differs from traditional “thick” hypervisor designs in the following ways:

Memory and CPU resources are assigned directly to logical domains and are neither virtualized nor oversubscribed (and therefore do not suffer the traditional inefficiencies of having to pass through a virtualization layer)

Many I/O virtualization functions are offloaded to domains (known as service domains)

I/O resources can also be assigned directly to guest domains thus avoiding the need to virtualize I/O and the associated performance overhead

This permits a simpler hypervisor design, which enhances reliability and security. It also reduces single points of failure by assigning responsibilities to multiple system components, which further improves reliability and security. In this architecture, management and I/O functionality are provided within domains. Oracle VM Server for SPARC does this by defining the domains role:

Control domain - management control point for virtualization of the server, used to configure domains and manage resources. It is the first domain to boot on a power-up, is an I/O domain, and is usually a service domain as well. There can only be one control domain.

I/O domain - has been assigned physical I/O devices: a PCIe root complex, a PCI device, or an SR-IOV (Single-Root I/O Virtualization) function. It has native performance and functionality for the devices it owns, unmediated by any virtualization layer. There can be multiple I/O domains.

Service domain - provides virtual network and disk devices to guest domains. There can be multiple service domains, although in practice there is usually one, sometimes two or more for redundancy. A service domain is always an I/O domain as it must own physical I/O resources in order to virtualize them for guest domains. 1

Guest domain - a domain whose devices are all virtual rather than physical: virtual network and disk devices provided by one or more service domains. In common practice, this is where applications are run. There usually are multiple guest domains in a single system.

Domain roles may be combined, for example, a control domain can also be an I/O domain and a service domain.

Historical Deployment Model

The typical deployment pattern has focused primarily on using a single service domain, which owns all of the physical I/O devices and provides virtual devices to multiple guest domains where the

1 There are exceptions: A service domain with no physical I/O could be used to provide a virtual

switch for internal networking purposes, or could be configured to run the virtual console service.

Page 8: Ovm Sparc Rootdomains Wp 1914481

7

applications are run. This single service domain is also an I/O domain and in practice it is usually also the control domain.

A variant of this model also exists where a second service domain can be added to provide redundant access to disk and network services to the guest domains. This allows guests to survive an outage (planned or unplanned) of one of the service domains, and can be used for non-disruptive rolling upgrades of service domains. This model offers many benefits: a high degree of flexibility, the ability to run an arbitrary number of guest domains (limited by CPU and memory resources), the ability to easily clone domains (when their virtual boot disks are stored on ZFS) and when appropriately configured, the ability to live migrate domains to another system. These benefits come at the expense of having to allocate CPU and memory resources to the service domains, thus reducing the amount of CPU and memory available to applications, plus the performance overhead associated with virtualized I/O. This expense is justified by the operational flexibility of using service domains. This configuration is considered the “default” or baseline model due to its operational flexibility. The achieved performance is usually acceptable, and remains appropriate for many deployments. This model is well documented in numerous Oracle publications. Hybrid models also exist where individual PCIe cards (Direct I/O) or virtual functions on a shared PCIe card (via SR-IOV technology) can be assigned to guest domains. These are beyond the scope of this paper.

Page 9: Ovm Sparc Rootdomains Wp 1914481

8

Introducing Root Domains

The guest domain model has been widely and successfully used, but more configuration options are available now. Servers have become larger since the original T2000 class machines with two I/O busses, so there is often more I/O capacity that can be used for applications. Increased T-series server capacity particularly in terms of single thread performance has made it attractive to run more vertical applications, such as databases, with higher resource requirements than the "light" applications originally deployed on earlier generations of T-Series. This has made it more attractive to run applications directly in I/O domains so they could get native, non-virtualized I/O performance. This model is leveraged by the SPARC SuperCluster Engineered System, announced in late 2011 at Oracle OpenWorld. In SPARC SuperCluster, root domains are used for high performance applications, with native I/O performance for disk and network and optimized access to the InfiniBand fabric.

I/O Domain Models

The I/O architecture of the SPARC T-series servers is a PCI Express (PCIe) system which consists of

one or more PCI root complex devices which connect the processor and memory subsystem to the

PCIe slots via PCIe switches. In the SPARC T4 and earlier SPARC T-series systems, there is one PCIe

root complex per socket. Each PCIe root complex is associated with a specific set of PCIe slots. This is

a static mapping that varies depending on the number of populated CPU sockets. The diagrams on

page 11 onwards show the mapping of the PCIe root complexes to the I/O devices.

Using Oracle VM Server for SPARC, there are three ways in which I/O devices can be assigned to an I/O domain:

Whole PCIe root complex – the I/O domain owns the entire PCIe root complex and all PCIe slots and cards assigned to it. This is the model most commonly implemented for a service domain. It is also the model discussed in this paper for root domains, the key difference being that there are multiple root domains and no service domain and therefore no shared I/O resources2.

Direct I/O – the I/O domain owns just one PCIe card. However, the domain does not own the actual root PCIe complex (PCIe bus) that the PCIe card sits on. In order to implement this model the hypervisor emulates a PCIe switch and its association to the PCIe card, and this structure is presented to the guest domain.

SR-IOV – in this model a single PCIe card owned by one root PCIe complex is made to physically appear in multiple domains simultaneously. However, the virtualization is performed by virtual functions in the PCIe card itself, not by the hypervisor or a service domain, hence the name.

2 Certain resources such as console interfaces or management network interfaces can still be virtualized via a small service domain function. However, these are not on the data path for performance.

Page 10: Ovm Sparc Rootdomains Wp 1914481

9

In the Direct I/O and SR-IOV models the guest will have near native I/O performance, however its availability is dependent upon the I/O domain that actually owns the PCIe root complex in which the PCIe device physically resides. For the remainder of this paper we focus exclusively on a very specific implementation of the domain model we refer to as root domains.

Root Domains

So far we’ve discussed I/O domains primarily in the context of them also being service domains, where the sole purpose for owning I/O is to virtualize it for guest domains. However, the focus of this paper is the concept of an I/O domain hosting one or more applications directly, without relying on a service domain. Specifically, domain I/O boundaries are defined exactly by the scope of one or more root PCIe complexes. The other two methods (Direct I/O and SR-IOV) are not utilized in this model. This offers a number of key advantages over all of the other models available in the Oracle VM Server for SPARC technology and in particular a distinct advantage over all other hypervisors which use the traditional “thick” model of providing all services to guest VMs through software based virtualization (at the expense of performance).

Performance: All I/O is native (i.e. bare metal) with no virtualization overhead.

Simplicity: The guest domain and associated guest operating system owns the entire PCIe root complex. There is no need to virtualize any I/O. Configuring this type of domain is significantly simpler than with the service domain model.

I/O fault isolation: A root domain does not share I/O with any other domain. Therefore the failure of a PCIe card (i.e. NIC or HBA) impacts only that domain. This is in contrast to the service domain, Direct I/O or SR-IOV models where all domains that share those components are impacted by their failure.

Improved security: There are fewer shared components or management points. The number of root domains is limited by the number of root PCIe complexes available in the platform. With the SPARC T4 processor there is one root PCIe complex per SPARC T4 socket. So a SPARC T4-2 system has two root PCIe complexes and a SPARC T4-4 system has four. Thus the maximum number of root domains possible on a SPARC T4-2 and SPARC T4-4 system is two and four respectively. It is important to note that root domains are not the ideal solution for all workloads and use cases. In practice, a solution that comprises of some systems with service domains as well as some systems with root domains may be appropriate. In fact, the same physical server could consist of 2 root domains running applications, and 2 service domains providing services to a number of fully virtualized guest domains. This is discussed in more detail in the operational model section. Root domains have a number of restrictions that should be carefully considered:

They are less flexible. With the guest domain model the number of domains can be changed dynamically, usually without impacting running workloads. Changing the I/O allocation of a

Page 11: Ovm Sparc Rootdomains Wp 1914481

10

root domain requires downtime, however, CPU and memory continue to be able to be dynamically allocated.

The number of domains available is limited to the number of PCIe root complexes. On a T4-4 this means one, two, three or four root domains. Additional workload isolation can be performed using Solaris Zones as discussed later in the operational model section.

They cannot be live migrated. In general, any domain with direct access to physical I/O cannot be live migrated.

They require more planning, particularly during the purchase phase to ensure there are enough NICs and HBAs, as these components are not shared across domains.

Some of the root domains may not have access to local disks or networks, and will need to be provided via the PCIe cards, or by configuring virtual I/O devices.

Best Practices For Availability

While root domains provide a level of fault isolation, they do not provide full isolation against all failure modes. Indeed the primary objective in creating root domains is performance with a secondary benefit of improved isolation. Application or service availability should always be architected at a higher level. Best practices for high availability such as clustering across compute nodes and remote replication for disaster recovery should always be applied to the degree the business requirements warrant. For example, HA should not be implemented with both nodes of the cluster located within the same physical system. However, multiple tiers (i.e. web, app, db) can be placed in different domains and then replicated on other nodes using common clustering technology or horizontal redundancy. Furthermore, the lack of live migration capability with root domains should not be viewed as an impediment to availability. Live migration does not solve availability problems. A domain that has failed cannot be live migrated. It’s not live anymore. Indeed many of the objectives seen as desirable features of live migration (i.e. workload migration for service windows) can be obtained by architecting availability at a higher level so that taking one node down does not interrupt the actual service. Then cold workload migration can be implemented by booting domains from a SAN or by the use of technologies such as Solaris Cluster, Oracle Weblogic Server, or Oracle RAC that provide high availability. In the proposed operational model, Zones become the migration entities (boundaries). Zone shutdown and restart time is significantly faster than for the main Solaris instance. This further mitigates the requirement for Live Migration within this scenario. Oracle has published a number of papers that discuss these concepts further and specific to individual workloads. Refer to the Maximum Availability Architecture and Optimized Solutions sections of OTN (the Oracle Technology Network).

Page 12: Ovm Sparc Rootdomains Wp 1914481

11

Operational Model With Solaris Zones

Although the maximum number of root domains is limited to two on the T4-2 and four on the T4-4 this does not imply a limit on the number of workloads possible. Solaris Zones provide a very convenient, mobile and manageable workload containment mechanism.

Zone Workloads

Using Solaris Zones as workload containers inside root domains provides a number of benefits:

Workload isolation. Applications can each have their own virtual Solaris environment.

Administrative isolation. Each Zone can have a different administrator.

Security isolation. Each Zone has its own security context, and compromising one Zone does not imply that other Zones are also compromised.

Resource control. Solaris has very robust resource management capabilities that fit well with Zones. This includes CPU capping (including for Oracle license boundaries), Fair Share Scheduler, memory capping and network bandwidth monitoring and management.

Workload mobility. Zones can be easily copied, cloned and moved, usually without the need to modify the hosted application. More importantly, Zone startup and shutdown is significantly quicker than traditional VM migrations.

We refer to this as “Zone Workloads” (ZW).

Figure 4 - Oracle VM Server for SPARC with Zone Workloads

The use of zone workloads on top of root domains is considered best practice, even if you only intend to run a single workload. There is no performance or financial cost to running zones in the domain, and it provides operational flexibility by being able to dynamically resize or migrate to another platform. However, applications can be installed directly within the root domains if so desired.

Page 13: Ovm Sparc Rootdomains Wp 1914481

12

Using guest domains for Test and Development

For development systems it is often more desirable to implement domains using the service domain model. This affords maximum flexibility where performance is generally not a concern. For maximum flexibility place the application or workload within a Zone inside the domain, as this becomes the entity that is promoted through the development and test lifecycle. Functional test can be implemented the same way. Zones can be copied, cloned and/or moved from development to test. Zones can further be cloned multiple times to test in parallel if desired.

Migrating Zones to the Production environment

When maximum performance is desired, production and pre-production can be implemented with root domains. Ideally, pre-production will be on hardware identical to the production system and with an identical root domain configuration. Workloads can be migrated from development or functional test guest domains to the root domains that exist in the Prod and Pre-Production environments.

Page 14: Ovm Sparc Rootdomains Wp 1914481

13

Operational Use Cases

When considering the use of root domains, the main limiting factor is the number of PCIe root complexes that are available in the physical server. In the case of the T4 series of servers, there is a root complex per socket, therefore a T4-4 can have up to 4 root domains, and a T4-2 can only have 2 root domains. One of the root domains will always serve as the control domain, and in the case of a hybrid model, the control domain could also function as a service domain for the virtualized guest domains if required. Each root complex owns a number of specific I/O slots on the server, as well as specific on-board I/O devices. This is hardcoded, and has a bearing on physical card placement, and which domains are capable of accessing on-board devices. The mapping of the PCI root complexes to I/O devices for the relevant T4 platforms is shown below. This information will be used to define the best practices for creating the root domains in the next section. In all the diagrams below, the solid lines represent connections to the pci@1 port of the root complex or the niu connection, and the dotted lines represent the connections to the pci@2 port. Additionally, whilst the 10GbE ports are shown based on the colour of the socket that the 10GbE port is connected to, it can be individually assigned to any domain.

SPARC T4-4 (4 Socket)

Page 15: Ovm Sparc Rootdomains Wp 1914481

14

The four possible root complexes are defined by the top level PCI path, and are as follows:

1. RC0: pci@400, green

2. RC1: pci@500, blue

3. RC2: pci@600, purple

4. RC3: pci@700, orange

It should be noted that the allocation of disks and on-board 1GbE ports is not spread evenly across all

the root complexes. RC0 has half the disks and two 1GbE ports, RC2 has two 1GbE ports, and RC3

has the other half of the disks. This needs to be taken into consideration when creating root domains,

as neither RC1 nor RC2 have access to local boot disks, and neither RC1 nor RC3 have access to local

1GbE ports. This is easily overcome by providing PCIe cards with this functionality, or having the

other domains provide virtual I/O services.

SPARC T4-4 (2 Socket)

When a T4-4 only has 2 CPUs, the number of root complexes is also reduced to 2.

The two possible root complexes are defined by the top level PCI path and are as follows:

1. RC0: pci@400, green

2. RC1: pci@500, blue

As per the 4 socket case, there is some asymmetry here as well, as only RC0 has access to the onboard

1GbE ports, but the disks are split between the 2 root complexes.

Page 16: Ovm Sparc Rootdomains Wp 1914481

15

SPARC T4-2

The T4-2 has 2 root complexes, as shown:

1. RC0: pci@400, green

2. RC1: pci@500, blue

In the case of the T4-2, while both root complexes have access to the onboard 1GbE ports, only the

RC0 domain has visibility of the on-board disks.

Page 17: Ovm Sparc Rootdomains Wp 1914481

16

General Domaining Considerations

As can be seen from the above diagrams, in all cases, the RC0 root complex always has access to some

internal disks, and some 1GbE onboard Ethernet. For this reason, it is always recommended that the

server be initially installed using only the I/O provided by the RC0 root complex, and that RC0 always

remain assigned to the first, and therefore control domain. This allows the remaining root complexes

to be de-assigned from the control domain, and assigned to the new root domains being created.

PCI cards may need to be installed in the other root domains to provide access to external disks for

booting, and for networking.

Given the use-cases for these domains, it is expected that they will generally all be provisioned with

HBAs and additional network cards, as high performance I/O is the main reason for creating these

domains in the first place.

It should also be noted that this discussion focuses solely on the allocation of the PCI root complexes. CPU and memory can be freely and dynamically allocated to any domain as per the usual Oracle VM for SPARC rules3.

One Domain

It is of course possible to use any of the T4 platforms with a single domain, and not make use of any of the Oracle VM for SPARC functionality. In this case, the domain simply has access to all attached I/O. The diagrams above can be used to ensure that the I/O cards are balanced evenly throughout the system.

Two Domains

This model creates 2 independent root domains by assigning one or more root complexes to a second domain, and leaving the remaining root complexes under the control of the control domain.

SPARC T4-2

In the simple case of the SPARC T4-2 servers, there are only 2 root complexes available, and RC0 will be attached to the control domain, and RC1 will be assigned to the new second root domain. As can be seen from the diagrams above, the second T4-2 domain will not have access to any on-board disks, and therefore needs to be given access to disk. This can be achieved either via a HBA in one of the odd numbered slots, or by providing virtual disk access from the primary root domain. In this case, it is preferable to use directly attached storage for this domain as the virtual disk route creates a dependency on the control domain, which is undesirable in this model.

3 When performance is critical it is recommended that CPU be allocated in a minimum of 1 core increments and for optimal performance on full socket boundaries.

Page 18: Ovm Sparc Rootdomains Wp 1914481

17

SPARC T4-4 (2 Sockets)

Similarly, the 2 socket configuration of the T4-4 only has 2 root complexes, and RC0 will be attached to the control domains, and RC1 assigned to the new second root domain. The second domain, while it has access to disk, does not have access to the on-board 1GbE ports. Again, in this case, the onboard 10GbE ports can be used, or an appropriate network card be installed into one of the EM slots 8-15.

SPARC T4-4 (4 Sockets)

With 4 root complexes to deal with, there are a number of options which can be considered. In the

simplest case, assigning RC0 and RC1 to the control domain, and RC2 and RC3 to the second domain

creates 2 fully independent domains each with access to both internal disk and onboard 1GbE ports.

If an asymmetric I/O configuration is required, then it is also possible to create a pair of domains with

RC0 assigned to the control domain, and RC1, RC2 and RC3 assigned to the second domain. This still

allows both domains access to the internal disks and 1GbE networks.

Any other combination will create a configuration which leaves one of the domains without access to

either onboard disk, or the onboard 1GbE ports, and should therefore be avoided.

Four Domains

The four root domain model is only applicable to the SPARC T4-4 (4 socket) configuration. Each of the four PCI root complexes will be assigned to a domain. The RC0 will be assigned to the control domain. RC1, RC2, and RC3 will be assigned to the remaining three domains. In this configuration, two of the domains will not have access to the internal disks, and must be provided an alternative way of booting them. The simplest way of dealing with this is, as before, placing an HBA into the relevant EM slot, and booting from there. It is also possible to provide a pair of virtual disks from the RC0 and RC3 domains. This provides redundant boot disk access, and the dependent domain will only require one or the other of the service domains to be active for uninterrupted operation. This model does add operational complexity and native access to boot disks is preferred. Similarly, two of the domains will not have access to the 1GbE ports on the server. In general, this is not viewed as an issue, as all domains still have native on-board 10GbE access, but if 1GbE network access is required, this can either be done via an express module card, or if necessary by using virtual networking from the other domains.

Three Domains

The three domain case is similar to the four domain model, except that the domain that would have been assigned RC1 is not created, and RC1 is assigned to any of the other domains. RC1 is chosen in this case because it is the only root complex with access to neither the internal disks, nor the internal 1GbE networks, and allows this model to be treated just like the Four Domain model.

Page 19: Ovm Sparc Rootdomains Wp 1914481

18

Hybrid Domains

In any of the above models, nothing precludes you from using one or two of the domains as control

and possibly I/O domains for the provision of normal guest domains, while leaving the remaining

domains to act as pure root domains.

Populating PCIe Slots

The quantity and type of PCIe cards that will be used will vary depending on the requirements of the

solution. However, there are some general rules which should be followed to ensure that I/O is spread

among the internal PCI switches to achieve both performance and redundancy if possible.

Given the use-cases that demand this type of configuration, it is expected that each PCI root complex

will be populated with at least one HBA and one NIC card. This allows direct access from the domain

to SAN based disks and networking.

The diagrams above can be used to ensure that optimal card placements are made.

Page 20: Ovm Sparc Rootdomains Wp 1914481

19

Building Root Domains: How-To

Creating root domains is conceptually no different from creating standard I/O domains, except that we simply don’t create virtual services in the I/O domains, we run them as pure bare-metal O/S instances. When creating these domains, the initial control domain must be configured to only use the root complex that it will be left with after all the other domains are created. The remaining root complexes can then be allocated to the new root domains. The new root domains will then need to have Solaris installed using either its directly attached disk and networks, or via virtual disks and networks provided by the control domain. In the example below, it is assumed that the PCIe slots have been populated with HBA and NIC devices so that the newly created domains have access to boot disks and have sufficient network ports for connectivity to the Datacenter infrastructure: e.g. Production, Backup, Management networks.4 It is of course possible to provide virtual devices to the domains using the usual methods, so that access to disks and networks without high performance requirements can still be provided, but this removes some of the isolation and simplicity benefits that this model provides. The full details of this are provided in the Oracle VM Server for SPARC documentation provided here:

http://www.oracle.com/technetwork/documentation/vm-sparc-194287.html

Preparing the Control Domain

The first domain on the SPARC T-series server is always the control, or the primary domain. In the

context of the root domain model, its main purpose is where the LDoms5 manager is running. Under

the root domain model, the control domain does not typically provide any virtual services other than

the virtual console service. For this reason, it can usually be treated as a domain that can be used to run

applications, providing the security concerns of access to the ldm commands have been satisfied. The

easiest way to accomplish this is to run applications within zones on the control domain.

These instructions work on both Solaris 10 and Solaris 11 control domains, although the use of Solaris

11 is preferred. Ensure that you are running the latest SPARC T4 firmware, and the latest

updates/patches for the version of Solaris you are running. In the case of a Solaris 10 domain, you will

need to install the Oracle VM Server for SPARC Software. The control domain will ALWAYS be left

with the pci@400 root complex at a minimum. Remove the ones that will be used for the other root

domains as part of the initial configuration.

4 Console access to the domains is provided via the control domain, which should be connected to the management network. It may also be worth considering also attaching the root domains to the management network to allow ssh access and for O/S installation and updates.

5 LDoms stands for “Logical Domains” and was the product name superseded by Oracle VM Server for SPARC. However, the naming convention of the commands and processes has not changed.

Page 21: Ovm Sparc Rootdomains Wp 1914481

20

The following steps are examples. Values of cpu, memory allocation and root complexes may need to

be changed to reflect the actual configuration.

Steps to perform initial configuration of the control domain: # ldm add-vcc port-range=5000-5100 primary-vcc0 primary

# svcadm enable svc:/ldoms/vntsd:default

# ldm start-reconf primary

# ldm set-vcpu 64 primary

# ldm set-memory 256G primary

# ldm remove-io pci@500 primary

(repeat for all other pci@ 600,700 as required)

# ldm add-spconfig initial; reboot;

Building Root Domains

Each root domain can now be created as required. The standard domain creation steps are used, except

we add the root complex to each domain as required. The following assumes that the root complex

associated with the root domain contains disks suitable for installing Solaris, and network access:

Steps to create a root domain:

# ldm create ldom1

# ldm set-vcpu 64 ldom1

# ldm set-memory 256G ldom1

# ldm add-io pci@500 ldom1

Repeat as necessary for all other domains.

Save ldom configuration

# ldm add-spconfig domains

(yes….. it really is that simple)

Building Root Domains with virtual I/O for boot and networking

It is possible to configure the root domains so that they can use virtual disks and networks provided by

other root domains with direct access to internal disks and networks.

In this case, it is simply a matter of configuring virtual disk and network services and configuring in the

logical domains configuration above.

For more information about Oracle's virtualization solutions, visit oracle.com/virtualization.

Page 22: Ovm Sparc Rootdomains Wp 1914481

Implementing Root Domains in Oracle VM

Server for SPARC

February 2013

Authors: Mikel Manitius, Michael Ramchand,

Jeff Savit

Contributing Authors: Benoit Chaffanjon

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores, CA 94065

U.S.A.

Worldwide Inquiries:

Phone: +1.650.506.7000

Fax: +1.650.506.7200

oracle.com

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

This document is provided for information purposes only, and the contents hereof are subject to change without notice. This

document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in

law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any

liability with respect to this document, and no contractual obligations are formed either directly or indirectly by this document. This

document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our

prior written permission.

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group. 0113