opennebulaconf 2014 - lightning talk: managing a scientific computing facility with opennebula -...

Managing a Scientific Computing Facility with OpenNebula

!Sara Vallero on behalf of the INFN-Torino computing team

OpenNebula Conf - December 2-4, 2014 - Berlin

The present work is partially funded under contract 20108T4XTM of Programmi di Ricerca Scientifica di Rilevante Interesse Nazionale (Italy).

1

OpenNebula Conf - December 2-4, 2014 - Berlin S.Vallero

The INFN Torino Computing Centre

STORAGE RESOURCES • 1600 TB (gross) total

COMPUTATIONAL RESOURCES • 69 hypervisors (KVM)• 1200 job-slots• 200 virtual machines

LAN/WAN • 10Gbps links

Cloud project started in 2009

2

Stakeholders• WLCG grid Tier2

(primarily for the ALICE experiment at CERN)

• grid Tier2 for the BESIII experiment at IHEP Beijing

• Computing for upcoming experiments:

• PANDA at FAIR Darmstadt

• Belle-2 at KEK Tsukuba

• Virtual Analysis Facility for ALICE (interactive analysis, elasticity)

INFN: Italian National Institution for Nuclear Physics • fundamental physics studies • several units in major Italian cities

• Medical Image Processing (local research group)

• Theory (local research group)

• Virtual farms on-demand

Torino Unit


Storage ServersImage Repository

Datastore

Services Cluster Workers Cluster

Two clusters for different VM classesSERVICES-CLASS VMs (pets)

• provide critical services• in/out-bound connectivity• live migration• server-class hardware• no particular local disk I/O requirements• shared image repository• resiliency-optimized FS for shared system disks (RAID1)

3

WORKERS-CLASS VMs (cattle)• computational work-force

(e.g. grid worker nodes)• private IP only• high storage I/O performance• lower-class hardware• locally cached image repository for fast start-up• performance-optimized file system

Gluster Replicated

Volume

SharedDatastorefor runningVMs

Cache for image repo Datastore


Current and planned activities

4

1. Toolkit for virtual farm on-demand provisioning • Virtual Routers (OpenWRT appliances)• Elastic public IP• iSCSI datastore for persistent disk space• EC2 interface• CloudInit contextualisation

2. Elasticity • automatic reallocation of VMs according to application’s needs (wherever

appropriate)• though: works only in infinite resources approximation, we usually run in saturation• in place only for the Virtual Analysis Facility so far

3. National federated cloud for scientific computing • upcoming INFN-wide project mostly based on OpenStack• need to interoperate with OpenStack-based geographical services (e.g. Keystone)

4. Monitoring as a service • based on the ELK stack (ElasticSearch, Logstash, Kibana)• uniform monitoring interface for applications/infrastructure

move to new OpenNebula tools

opennebulaconf 2014 - lightning talk: managing a scientific computing facility with opennebula -...

Technology

virtual machineslanwan

scientific computing

berlin s

cached image repository

sara vallero

vallero current

ihep beijing computing

alice experiment