ncar storage accounting and analysis possibilities

20
NCAR storage accounting and analysis possibilities David L. Hart, Pam Gillman, Erich Thanhardt NCAR CISL July 22, 2013 [email protected]

Upload: louie

Post on 23-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

NCAR storage accounting and analysis possibilities. David L. Hart, Pam Gillman, Erich Thanhardt NCAR CISL July 22, 2013 [email protected]. Why storage accounting?. Big Data Increasing cost of storage with respect to compute NSF data management plan mandate Tools for users - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: NCAR storage accounting and analysis possibilities

NCAR storage accounting and

analysis possibilitiesDavid L. Hart, Pam Gillman, Erich Thanhardt

NCAR CISLJuly 22, 2013

[email protected]

Page 2: NCAR storage accounting and analysis possibilities

3

Why storage accounting?

• Big Data– Increasing cost of storage with

respect to compute • NSF data management plan

mandate– Tools for users

• Some info is better than no info– Some process is better than ad

hoc fire drills• Supports allocation processes

Page 3: NCAR storage accounting and analysis possibilities

4

Accounting for archive storage

• NCAR has “charged” users for archive use for many years.– Archive accounting has institutional inertia

• NCAR HPSS details, June-July 2013Date Files

(M)PB

(unique)PB

(2nd copy) Users TB+

6/2/13 137.6 19.5 22.3 991 181

6/9/13 138.2 19.8 22.6 991 307

6/16/13 138.8 20.1 22.9 992 370

6/23/13 141.1 20.5 23.3 998 347

6/30/13 142.4 20.7 23.5 1002 266

7/7/13 142.5 20.9 23.6 1005 135

Page 4: NCAR storage accounting and analysis possibilities

5

Archive storage record• Activity date – date record was collected• Activity type – Read, Write, Storage• Unix uid• Project code – project to charge• Number of files • Bytes – read, written, or stored• Class of service – e.g., single-copy, dual-copy• DNS – of client host• Frequency – interval, in days, between accounting runs

Page 5: NCAR storage accounting and analysis possibilities

6

Collecting data from HPSS

• Read/write activity– Analyze logs from HSI and HTAR (since May 2013). Logs archived

daily, processed weekly.• Storage activity

– Weekly DB2 table scan and separate post-processing steps.• Accounting system impact

– Approx. 6,000 records per week• Major accounting requirements

– Use of HPSS accounting hooks to associate NCAR project code with HPSS file “account”

– Accounting system and HPSS enforce requirement for every user to have a “default project” to which files will be charged if no other project provided

Page 6: NCAR storage accounting and analysis possibilities

7

Accounting for disk storage

• Focus on long-term project spaces, which are allocated– But mechanism captures scratch snapshots, too!

• GLADE total storage, June-July 2013

Date Files (M) PB Users TB+6/8/13 183.05 2.87 2,506 55.3

6/15/13 192.96 2.97 2,525 99.36/22/13 210.32 3.02 2,490 53.16/29/13 212.80 3.11 2,500 89.5

7/6/13 224.76 3.11 2,509 8.8

Page 7: NCAR storage accounting and analysis possibilities

8

Disk storage record• Event time – date record was collected• Project directory• Group — Unix group• Username• Number of files• kB used• Period — reporting interval, in days• QOS — a quality of service field (for future use)

Page 8: NCAR storage accounting and analysis possibilities

9

Collecting data from GPFS

• File systems don’t have concept of “project”, but GPFS has notion of “file sets”– Leverage file sets to map to project spaces– For scratch, work, home: report per-user data

• Process runs weekly, provides a storage snapshot– With GPFS tools, process requires only a few minutes to complete—full

file system scan not required• Accounting system impact

– Approx. 4,000 records per week• Major accounting requirements

– Agreements and processes between GLADE administrators and User Services about how spaces are created

– Deviation would break the system

Page 9: NCAR storage accounting and analysis possibilities

10

ANALYSIS AND REPORTING

Page 10: NCAR storage accounting and analysis possibilities

11

Storage growth over time (1)

HPSS growth in 2013 GLADE growth in 2013

1/6/13

1/31/13

2/25/13

3/22/13

4/16/13

5/11/136/5/13

6/30/130

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

PBPB (w/2nd copy)

0

500

1,000

1,500

2,000

2,500

3,000

3,500/glade/p/work /glade/project /glade/scratch

TB

Page 11: NCAR storage accounting and analysis possibilities

12

Storage growth over time (3)

User reports show project by week and per-user breakdown

Page 12: NCAR storage accounting and analysis possibilities

13

Top consumers

0-1 TB 1-10 TB 10-100 TB

100-1000 TB

>1000 TB

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

% Projects % Files % TB

Project holdings in HPSS

0-0.1 TB

0.1-1 TB

1-10 TB

10-100 TB

>100 TB

0%10%20%30%40%50%60%70%80%90%

100%

% Users % Files % TB

User holdings in GLADE

Page 13: NCAR storage accounting and analysis possibilities

14

Aggregate behavior (1)

-50

0

50

100

150

200

250

300

350

400

450Weekly HPSS growth

TB

Net growth, 3/3-4/7 — ~261 TB

Page 14: NCAR storage accounting and analysis possibilities

15

14-Oct-

08

24-Oct-

08

3-Nov-08

13-Nov-0

8

23-Nov-0

8

3-Dec-0

8

13-Dec-0

8

23-Dec-0

8

2-Jan-09

12-Jan-09

22-Jan-09

1-Feb-09

11-Feb-09

21-Feb-09

3-Mar-

09

13-Mar-

09

23-Mar-

09

2-Apr-0

9

12-Apr-0

9

22-Apr-0

9

2-May-0

9

12-May

-09

22-May

-09

1-Jun-09

11-Jun-09

21-Jun-09

1-Jul-0

90

10,000

20,000

30,000

40,000

50,000

60,000

70,000

TB written daily

Aggregate behavior (2)

Data written, 3/3-4/7 — 594 TB

Page 15: NCAR storage accounting and analysis possibilities

16

Compute v. storage (1)

2012-47

2012-49

2012-51

2012-53

2013-02

2013-04

2013-06

2013-08

2013-10

2013-12

2013-14

2013-16

2013-18

2013-20

2013-22

2013-24

2013-260

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

HPC use Disk GB Tape GB

Year-Week

Core

-hou

rs o

r GB

(mill

ions

)

Page 16: NCAR storage accounting and analysis possibilities

17

Compute v. storage use (2)

2012-43

2012-44

2012-45

2012-46

2012-47

2012-48

2012-49

2012-50

2012-51

2012-52

2012-53

2013-01

2013-02

2013-03

2013-04

2013-05

2013-06

2013-07

2013-08

2013-090.0

500,000.0

1,000,000.0

1,500,000.0

2,000,000.0

2,500,000.0

3,000,000.0

3,500,000.0

4,000,000.0HPC use disk GB tape GB

Year – week

Core

-hou

rs u

sed

or g

igab

ytes

stor

ed (m

illio

ns)

Page 17: NCAR storage accounting and analysis possibilities

18

Big compute != Big data

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

HPC charges GB growth

Users, sorted by HPC charges

Page 18: NCAR storage accounting and analysis possibilities

19

What is “Big Data”?

<0.1 GB<1 GB

<10 GB

<100 GB

<1000 GB0

100

200

300

400

500

Users

Average file size

Num

ber o

f use

rs

<1 GB

<10 GB

<100 GB

<1000 GB

<10000 GB

<100000 GB

<1000000 GB

>1000000 GB

050

100150200250300350400

Users

Data stored per user

Num

ber o

f use

rs

<0.1 GB

<1 GB

<10 GB

<100 GB

<1000 GB

02,000,0004,000,0006,000,0008,000,000

10,000,00012,000,000

GB

Average file size

GB st

ored

(mill

ions

)

<1 GB

<10 GB

<100 GB

<1000 GB

<10000 GB

<100000 GB

<1000000 GB

>1000000 GB

0

2,000,000

4,000,000

6,000,000

8,000,000

GB

Data stored per user

GB st

ored

(mill

ions

)

Average file size vs. Total data holdings

Page 19: NCAR storage accounting and analysis possibilities

20

Managing “orphaned” files

• Verifying accounting records lets site operators identify files owned by inactive users or inactive projects

• On July 7, HPSS accounting showed 177 users with 885 TB of “orphaned” files

• Early outreach to users and project leads does translate to deletions and fewer files for whom an owner cannot be found– Users required to be “actively engaged” in the disposition of

their archive holdings.www2.cisl.ucar.edu/docs/hpss/policies

Page 20: NCAR storage accounting and analysis possibilities

21

QUESTIONS?