research computing with newton gerald ragghianti nov. 12, 2010

8
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010

Upload: mitchell-dalton

Post on 23-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Computing with Newton Gerald Ragghianti Nov. 12, 2010

Research Computing with Newton

Gerald Ragghianti

Nov. 12, 2010

Page 2: Research Computing with Newton Gerald Ragghianti Nov. 12, 2010

What is the Newton Program?

• Community collaboration• Supporting High Performance Computing

• Infrastructure management• Consultation• Training

• Research Objectives• Increase Effectiveness• Reduce duplication• Enhance Capability

User

applicationsComputational

environment (OS, cluster

management, software)

Computing hardware

Computing infrastructure (space, network, power, cooling)

Community organization (policies, membership)

Page 3: Research Computing with Newton Gerald Ragghianti Nov. 12, 2010

Service features

• Technical• HPC cluster• High-performance storage• Remote data backup (time machine)• Computing environment management• Resource management (Grid Engine)

• Professional• Computing support• Technical advising• Consulting (proposal development)

Page 4: Research Computing with Newton Gerald Ragghianti Nov. 12, 2010

The Newton cluster• Linux-based distributed memory compute cluster

• 295 computers• 2500 CPUs• 5TB RAM• 40 Gbit/sec Infiniband• 80 TB Storage

Storage server

Lustre storageHead node

Compute node Compute node Compute nodeCompute node

Lustre storage

External network

Infiniband network

Ethernet network

Compute node

Compute node Compute node Compute nodeInteractive

nodeCompute node

Lustre storage

Storage server

Page 5: Research Computing with Newton Gerald Ragghianti Nov. 12, 2010

Advantages of working on Newton• Teams of free systems experts

• System Administrators• Research Consultants• Network Engineers• Service Desk• Operations personnel (24-hour)

• Larger potential available resources• Guaranteed future support• No worries about data centers, cooling, electrical power, physical/software security, operating systems, software maintenance, hardware acquisition/installation/failures, data integrity• Lets you concentrate on the interesting stuff

Page 6: Research Computing with Newton Gerald Ragghianti Nov. 12, 2010

What an investment gets me• Contractual guarantee of service • CPU investment:

• We purchase compute nodes on your behalf• Highest tier (Tier 1) access to the job queues

• Tier 1 membership for three years after investment

• Guaranteed CPU allocations (ownership)• Proportional to the investment size• Equal to or greater than an independent cluster

• Storage investment:• We purchase storage or allocate from previously purchased storage• Fixed amount of dedicated storage for a 5 year term• Multiple performance levels available• Optional daily backup of data

Page 7: Research Computing with Newton Gerald Ragghianti Nov. 12, 2010

How CPU allocations work• CPU allocation is proportional to the size of the investment

• As the cluster gets larger, this may go up• Determines how many CPUs you may use in the queues

• Your group will become Tier 1 priority

• Always higher priority than tiers 2 and 3.

• Job queues (based on job runtime)• Long (>24 hours): use up to the CPU allocation• Medium (< 24 hours): use up to 2x the CPU allocation• Short (< 2hours): no limits• High priority means lower job wait time (median job wait is 14s)

• PI can specify sub-allocations for group members

Page 8: Research Computing with Newton Gerald Ragghianti Nov. 12, 2010

More Information• Newton Program website: http://newton.utk.edu/

• Program policies• Documentation• Meetings / support / consulting schedule

• HPC Mailing List:[email protected]