georgia advanced computing resource center …gacrc software software selection, installation,...

43
GEORGIA ADVANCED COMPUTING RESOURCE CENTER (GACRC) RESOURCES Enterprise Information Technology Services Michael S. Lucas December 5 th , 2012

Upload: others

Post on 21-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

GEORGIA ADVANCED COMPUTING RESOURCE CENTER (GACRC) RESOURCES

Enterprise Information Technology Services Michael S. Lucas December 5th, 2012

Page 2: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

GACRC STAFF

Greg Derda - IT Manager/Bioinformatics Consultant

Yecheng Huang - Bioinformatics Consultant Shan-Ho Tsai - Computational Physics and High

Performance Computing Consultant Paul Brunk - Principal Unix Systems Administrator Curtis Combs - Principal Storage Administrator Jason Stone - Servers Systems Administrator

2

Page 3: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

GACRC HARDWARE RESOURCES

HPC Computers 230 compute nodes (2600 compute cores), 32 with InfiniBand connectivity. Job submission for all

nodes is managed by a variant of Sun Grid Engine (Univa Grid Engine) queue management system (zcluster).

An IBM p655 AIX cluster with a total of 32 8-way nodes available to general users (pcluster).

For large memory jobs, there are: Four 8-core, 192GB high-memory compute nodes Ten 12-core, 256GB high-memory compute nodes Two 32-core, 512GB high-memory compute nodes.

Six 32-core, 64GB GPU control nodes.

One nVidia Tesla S1070 with four GPU cards (960 GPU cores) for programs written to use this

architecture.

One nVidia 2075 GPU processor (448 GPU cores)

3

Page 4: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

GACRC HARDWARE RESOURCES (CONT.)

Storage and Connectivity The GACRC has a three-tiered storage architecture. Tier 1 = 150TB (usable) on a Panasas ActiveStor 12 storage

cluster; Tier 2 = 165TB (raw) on five Sun Fire X4500 Thumpers. Tier 3 = 330TB (raw) on ten TEC Services ARCH storage

subsystems. All of the GACRC's existing frames and storage subsystems

are interconnected using Brocade switches over private networks and protected by a pair of Juniper firewall security appliances.

4

Page 5: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

GACRC HARDWARE RESOURCES (CONT.)

New Hadoop Cluster

5

Page 6: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

GACRC SOFTWARE

Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source solutions and commercial offerings.

Over 300 applications (Bioinformatics, Computational Chemistry, Computational Physics, Statistics and more) including compilers, debuggers, and math libraries.

6

Page 7: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

RESEARCH COMPUTING RESOURCE SURVEY

We will be sending this link to you via email after the Lunch and Learn and appreciate your feedback.

https://ugeorgia.qualtrics.com/SE/?SID=SV_dmnOtNRxfsI9ZbL

7

Page 8: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

CONTACT INFORMATION

Name: Michael Lucas Email: [email protected] Web: eits.uga.edu

8

Page 9: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

AMAZON WEB SERVICES

Enterprise Information Technology Services Shawn Ellis December 5th, 2012

Page 10: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

INFRASTRUCTURE AS A SERVICE (IAAS)

Customers still run their own servers Purely virtual datacenter: no racks, hardware,

network cabling, etc. 17 world-wide locations UGA chooses which locations host UGA and

processing Elasticity

Friday, January 04, 2013

10

Page 11: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

AMAZON WEB SERVICES ENTERPRISE AGREEMENT

Features Right to audit: FISMA, SOC1 and Security Standards Pricing, pay-ahead credit

Challenges UGA is one of the first public institutions to negotiate a Enterprise Agreement for

AWS, definitely the first in USG. Indemnity, data privacy, FERPA, HIPPA, GLBA. Bandwidth, latency New paradigms for system administration

Advantages Amazon is very interested in working with Higher Education partners now! We have received funding grants, access to top-level executives and technical

people.

11

Page 12: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

WHAT IS HAPPENING?

The contract is being negotiated EITS is working with researchers on testing

Hadoop clustering, Hadoop with R. Questions about communicating cost and

capacity to existing and potential researchers in this model.

Friday, January 04, 2013

12

Page 13: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

CONTACT INFORMATION

Name: Shawn Ellis

Email: [email protected]

Web: eits.uga.edu

13

Page 14: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

BIG DATA: Why Should You Care and How Can You Deal with It?

Lakshmish Ramaswamy Dept. of Computer Science

[email protected]

Page 15: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

You Should Care Because..

Big Data is Everywhere LSST: 30 TB/Day LHC: 16 TB/Day

1e+11 Base Pairs

Health: 6 TB/Day 260M Tr/Day 500 TB/Day

21 TB/Day

Page 16: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

What is Big Data? • “Data whose size forces us to look beyond

the tried and tested methods prevalent at that time”

• Currently it is the data that is is too large to be placed in a RDBMS and analyzed with

help of desktop-based statistics package Requires parallel algorithms running on a server cluster

• The V3 View • Volume --- Terabytes Zetabytes • Velocity --- Batch Data Streaming Data • Variety --- Structured Structured, semi-

structured, textual, multi-media, graphs

Page 17: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

What is Causing Data Explosion? • Proliferation of pervasive

computing/communication devices • Inexpensive data collection and storage • Transformation to an e-based society • Desire to monitor and harness micro-level

characteristics and trends Sampling is not a preferred option

• Data Hoarding The “my great-grand PhD student may find it useful”

syndrome !!!

Page 18: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Where is the Data Coming From? • End-User created data

Emails, content shared on SNAs, Wikipedia, photos/videos, tweets

• Data collected about people Financial data, surveillance cameras, academic records

• Scientific data Atmospheric monitoring, high-energy physics, oceanics,

deep-space exploration

• Medical data Diagnostics, physician opinions, genetic information, scans

• Business data Stock market, currency market, company performance,

logistics, retailing, inventory

Page 19: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Limitations of Traditional Technologies

• RDBMS, data warehousing, data mining, supercomputers

• DBMSs designed for efficient transactions not efficient analytics

• Data is increasingly unstructured • Supercomputers are expensive, hard to

program and hard to manage • Data mining algorithms are centralized

Easier to Push Data into System than getting Information Out of the System

Page 20: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Big Data Computing Trends • Clusters of commodity servers • Distributed file system • Simple and efficient data management

No Schema, no indexing, no transactional support

• High-level programming interfaces • Simplified infrastructure management • Powerful fault tolerance

Page 21: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Big Data Technologies • Hadoop (map-reduce)

Cluster-based parallel processing framework

• Pig and Pig-latin SQL-like interface for creating map-reduce programs

• HBase and Apache Cassandra (BigTable) Non-relational distributed databases aka Key-value stores

• MongoDB High-performance document-oriented storage

• Giraph and GPS (Pregel) Cluster-based Graph Processing Engines

• Pegasus Hadoop-based graph mining tool

Page 22: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Hadoop in Action 30045 90 30602 88 30045 44 30005 60 30062 38 30605 50 30045 58 30005 83 30027 92 30606 66 30602 73 30601 82 30606 44

30045 90 30602 88 30045 44

30005 60 30602 38 30605 50

30606 66 30602 73 30601 82 30606 44

MAPPER 30045 (133, 2) 30602 (88, 1)

30005 (60, 1) 30602 (38, 1) 30605 (50, 1)

30606 (110, 2) 30602 (73, 1) 30601 (82, 1)

REDUCER

MAPPER

MAPPER

REDUCER

30602 66.3 30005 60 30605 50

30045 66.5 30606 55 30601 82

Page 23: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Current Research • Resource-efficient, scalable and quality-

aware data collection mechanisms • High-speed networking • Analytics on globally distributed data with

globally distributed clusters • Approximate analytics • Scalable machine learning and data mining

algorithms that can work in a distributed setting

• Security and privacy

Page 24: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Advanced Topics in Data Intensive Computing

• New course that was offered this semester • Covers many of the Bigdata technologies • Requires Java programming and database

experience

http://www.cs.uga.edu/~laks/courses/adic-fall2012.html

Page 25: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

THANK YOU !!!

Page 26: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Business Analytics Concentration

MIS @ Terry

December 5th, 2012

Page 27: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

MBA Business Analytics Concentration • Five core classes:

– Data Management – Business Process Management – Predictive Analytics – Data Warehousing and Mining – Emerging Analytical Technologies, Platforms & Applications

• Three electives – Energy Informatics – Marketing Analytics and Decision-Making – Introductory Biostatistics – Introduction to Epidemiology – Etc.

Page 28: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Hadoop Implementation • Robert Bearden, CEO of Hortonworks, Terry

Entrepreneur-in-Residence • Hortonworks provided UGA with education and

support for installing a Hadoop cluster to enable big data education and research

• Will be using in Spring 2013 in Data Management & Energy Informatics classes

Page 29: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Big Data and The Changing Nature of Science

(…and the Importance of Cyberinfrastructure Centers)

Nick Berente Terry College of Business

University of Georgia

Page 30: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Traditional Science:

Deduction (abduction) Hypothesize-test

Presenter
Presentation Notes
Lone scientist in a lab
Page 31: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Computational Science: Still (largely) scientific method

birth of a galaxy

hurricane simulation

heart muscle mitochondria Source of images: TACC

Page 32: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Computational Science Cyberinfrastructure

Computational Resources: “Big Iron” – Condo model = Cycles Memory Disciplinary / Interdisciplinary code Parallelized code Gateways & Workflows Visualization

Presenter
Presentation Notes
grad student writing spaghetti code -
Page 33: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Big Data Science: Observation in Natural Sciences

Induction! Pattern identification & matching

Alma Telescope Array: 66 telescopes

Global Ocean Observing System: 3000+ sensors

Presenter
Presentation Notes
grad student writing spaghetti code -
Page 34: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Big Data Science: Observation in Social Sciences

social media User-generated content – social network analysis

Sequence Analysis: “Organizational Genetics”

Presenter
Presentation Notes
grad student writing spaghetti code -
Page 35: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Big Data Science Cyberinfrastructure

Support for Inductive analysis – pattern identification and matching Everything associated with computational science plus increased focus on interpretation: Visualization Next-generation analytic methods Unstructured / multi-source data and, of course: Network throughput & Storage

Presenter
Presentation Notes
grad student writing spaghetti code -
Page 36: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

My Research: Next Generation Computational Science Centers

Presenter
Presentation Notes
Now globally distributed instruments and teams, infrastructural software, data, etc.
Page 37: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Centers

Center: “a facility providing a place for a particular activity or service” (Meriam Webster)

Page 38: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Cyberinfrastructure Innovation Centers

Page 39: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Centers – Significant Value

For universities, regions, nations, and globally - Science - Economic (local) - Cross-disciplinary Knowledge - Technological Innovation

Page 40: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

RCN: Managing Collaborative Centers 1240160

EAGER: Supporting Successful Management- CI Centers 1148996

CI-TEAM: “Science Executive” education 1059153

Three NSF Research Projects

Page 41: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

- Managing CI Centers - Oct. 2011 – UGA, Athens, GA - Virtual Organizations – June 2012 – Case, Cleveland, OH - Managing CI Centers – Feb. 2013 – UM, Ann Arbor, MI - Science Executive Ed – May 2013 – UGA, Atlanta, GA - Scientific Software – Oct. 2013 – UT, Austin, TX - Virtual Organizations – May 2014 – UI, Urbana-Champaign, IL - Scientific Software – May 2015 – CMU, Pittsburgh, PA - Managing CI Centers – May 2016 – UGA, Atlanta, GA

Series of Workshops & Reports

Page 42: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Research Directions

Enabling sustained innovation (centers vs. projects)

Metrics & benchmarking

Funding & human resource issues

Software engineering

CI issues for unstructured data for social science

Page 43: GEORGIA ADVANCED COMPUTING RESOURCE CENTER …GACRC SOFTWARE Software selection, installation, maintenance and troubleshooting, based on researchers' needs utilizing both open source

Thank you! Nick Berente [email protected]

NSF OCI # 1059153