chris churchey principal ats group, llc churchey ... · © 2014 ibm corporation enterprise2014 gpfs...

25
© 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey Principal ATS Group, LLC [email protected] (610-574-0207) October 2014

Upload: others

Post on 13-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

Enterprise2014

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Chris Churchey – Principal ATS Group, LLC

[email protected] (610-574-0207)

October 2014

Page 2: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20142

Why Monitor? (Clusters, Servers, Storage, Net, etc.)

Ensure the services and apps are available to our users (customers)

Ensure they perform optimally

Identify constraints, problems or configuration concerns

Learn from past behaviors and trends

Anticipate/Avoid capacity constraints vs. “reacting” to them and impact to users

It’s our job………I hope…

Page 3: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20143

What to Monitor (for starters)

CPU

(User + System) >= 80%

Waiting on I/O >= 10% Possible IO bottleneck

Memory

Paging Page-In/Swap-In >= 5 per second

Scan/Free Ratio >= 4 Thrashing

Page/Swap Space Used >= 80% >90% Critical

Huge/Large pages Allocated >0 but Used=0 Waste

Network & Fiber Adapters

Running-Speed = Supported-Speed

Read/Write Throughput >= 80% Running-Speed

Load Balanced across adapters

HBA Queue Depth and Transfer Size settings give huge gains

Page 4: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20144

What to Monitor………..

Filesystems

Space Used >= 90% Traditional check

Space Used >= 90% and Free < 1GB less Alerts

“/ and /var” Space Used > 95% and Free < 512MB Critical

I-nodes Used >= 90%

Disks

Write Size < 64KB and Writes/s > 20 and Service Time < 1ms

SAN storage today with write Cache should have all small to medium size writes be

< 1ms on average

Queue Depth, Algorithm and Transfer Size settings give huge gains

Processes

High CPU and/or Memory consumers

Runaway long running processes

Long running gradual memory growth (Memory Leak?)

Page 5: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20145

What to Monitor………..

GPFS

All previously listed plus….

NSD’s are distributed equally and balanced across NSD servers

unless you designated specific Roles to NSD server pairs

Server and Client node GPFS specific Node/Filesystem stats

mmpmon, etc.

Special tuning cases arise with Large clusters, millions to billions of files, mixed large

and small files and the “behavior” access to them often will determine special design

considerations

Use of Meta-only NSD’s on dedicated disks using SSDs or Flash and dedicated

adapters for short size IOps intensive access away from large throughput IO

Contact IBM or the Galileo Performance team for assistance

Worker Threads

Page 6: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20146

Daily Monitoring Steps (Methodology)

1. Cluster view – Check the Dashboard

2. Identify candidates to investigate…e.g. “What to Monitor”

2. Follow the data….charts…views....

3. View over a period of time

4. Determine usage mix and observed Peaks

* Make it easy with Galileo Performance Explorer GPFS and Storage agents…and new

automated Analytics capability!

Page 7: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20147

Cluster view

Immediately 3 observations stand out! (May be ok…May not be….)

Page 8: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20148

Investigate high CPU %Busy……which NODE?

Find out which node it is (Top: 1)…..gvicp8gpfsRH05….Lets look at Processes next

Page 9: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise20149

Investigate high CPU %Busy…found Node…which Process?

Find which Process(s)…(Top: 2)…runaway and every2hrs…3 & 1 Threads……..

* Checked with user…runaway is bad…every2hrs is Scheduled (good)…..

Page 10: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201410

Investigate high IO Wait……which NODE?

Find out which node it is …..gvicp8gpfsaix04….next..look at nodes details

Page 11: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201411

Investigate high IO…found Node…is problem HBA or Disks?

Found (4) HBAs…fcs0/fcs1 each 500MB/s…fcs2=100MB/s…fcs3=0….

* Problem was fcs3 not zoned…corrected…lets see what this improved…..

Page 12: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201412

Investigate high IO…found Node…is problem HBA or Disks?

Corrected fcs3 zoning….now both fcs2 and fcs3 pushing 250MB/s each…

Page 13: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201413

Investigate high IO…found Node…is problem HBA or Disks?

Fixed zoning, increased IO throughput…BUT…now caused a Memory Paging problem…

*……the OLD saying…Fixing one Perf problem often Exposes another!......

Page 14: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201414

Eg. NSD Servers not Balanced (Clients constrained)

Looks like (1) NSD Server is doing all the work (gvicp8gpfsaix01)

Page 15: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201415

NSD Servers not Balanced (Clients constrained) ……..

Identify what “File-System” is heavily used and the Client node(s)

Page 16: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201416

Round-Robin NSD Server-list to Balance load

Changed NSD Server Order to Balance between gvicp8gpfsaix01 and …aix02

Page 17: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201417

Switched the 2 Clients to Direct-attached-Node

Now Data intensive nodes can go Direct storage, major throughput improvement

….Yes…could do an all Infiniband Network…..

Page 18: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201418

Galileo Analytics engine…minutes vs. hours of past 11-Slides….

Page 19: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201419

Galileo Analytics engine…..Booth-22

Page 20: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201420

E.g. Seq. 50/50 Read/Write 256K 8-Threads V7K-SAS

Page 21: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201421

E.g. Seq. 50/50 Read/Write 256K 8-Threads Flash-840

Page 22: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201422

We are seeking Use-Cases for input to Galileo PE Analytics engine for ‘automation’

– Lessons Learned / Best Practices / Thresholds as well

We have an Innovation Center lab where we test, demo and showcase technology

– Ideas to demo, POC, verify claims, etc. you would like to see us perform and share!

[email protected] or [email protected] or [email protected]

…..Please contact us…..!!!!!!

Booth #22

Page 23: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201423

Questions and Answers

Page 24: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201424

We can help analyze and implement. Contact us!

Check-out Galileo Performance Explorer™

– Visit Booth #22 for a hands-on demo

– Sign-up for a trial at www.GalileoSuite.com

– Complimentary* no-strings attached 3 months use for Conference attendees

[email protected] (484-320-4302)

www.GalileoSuite.com

* First time Galileo user

Page 25: Chris Churchey Principal ATS Group, LLC churchey ... · © 2014 IBM Corporation Enterprise2014 GPFS with Flash840 on PureFlex and Power8 (AIX & Linux) Chris Churchey –Principal

© 2014 IBM Corporation

GPFS with Flash840 on PureFlex and Power8 (AIX & Linux)

Enterprise201425

Deploying a big data solution using IBM GPFS-FPOhttp://public.dhe.ibm.com/common/ssi/ecm/en/dcw03051usen/DCW03051USEN.PDF

GPFS tuning guidelines for deploying SAShttp://www.sas.com/content/dam/SAS/en_us/doc/partners/ibm-gpfs-tuning-guidelines.pdf

GPFS Wiki – IBM DeveloperWorkshttps://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29

GSS / ESS https://www.ibm.com/developerworks/community/blogs/5things/entry/gpfs_storage_server?lang=en

Galileo Performance Explorerhttp://www.GalileoSuite.com

* First time Galileo user

Referenced Material