write-through cache system policies discussion and a introduction to the system

24
Write-through Cache System Policies discussion and A introduction to the system

Upload: christina-blake

Post on 03-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Write-through Cache System

Policies discussion and

A introduction to the system

Write-through Cache Disk• File system Mount point:

– /cache/halld (read-only cache system: /cache/mss/halld)

• Permission:– owned (writable) by a Unix group halld. Likes a normal file

system, all members in the group can add/delete/modify the files.

• Disk to Tape library /mss mapping: – /cache/halld will map to /mss/halld– Volume set mapping is as same as /mss/halld– raw data volume set – still send CCPR if needs a new raw

data directory

Write-through Disk Manager (WDM)

• Management:– Each group will have a quota and a reservation (quota is

the maximum space a group can use and reservation is the minimum space guaranteed by cache manager).

– Quota is soft number, when quota is exceed, user still can write to the file system and the older file will be delete in next clean cycle.

• Note:– Current jcache command will fail if the requested files are

in the write-through disk pool.

WDM (Backup Policy)• Backup Policy:

– When a file is 12 days old, it will be backed up into tape library and considered as a read-only file. (to modify the file, user has to remove the copy in tape library first).

– File with size lesser than 3 MB will not be backed up. Please tar the important small files so they can be backed up as one large file.

• User can backup a file anytime using utility ‘wcache put’.• 12 days and 3MB is configurable parameters.

– 12 days or other ??– We like to set the min backup file size as larger as possible,

what is the right size ??

WDM (Backup Policy)• Since the new cache disk will map to old

/mss/halld stub directory, the ‘wcache put’ will fail if user create a name conflicted file. CacheManager will marked this file in tape (doesn’t know it is different file). Only when cacheManager going to delete this file, it will discover it is a duplicated file. (see next slide)

WDM (Deletion Policy)• Deletion Policy:

– When disk space is needed, the oldest files that satisfied the criterion "pin count = 0 AND backed up" will be deleted.

– Files less than 3 MB and is not accessed in 2 years will be deleted.

• After file is deleted from disk, it can be staged back to disk using utility ‘wcache get’.

• User can use utility ‘wcache delete’ to tell manager to delete file from disk if it is in tape library.

• Before each file is deleted, manager will make sure the copy on disk is same with the copy in tape library.

(continues to next page)

Duplicated File -emailIf copy on disk is different with the copy in tape library, a email will be send to owner so you can delete the copy in tape or on disk. The copy on disk will be delete if no action is taken within 1 week.)

The cache version of this file is different from the copy in the tape library. A proper action (remove the copy from tape or delete the cache copy) is needed. If no action is taken within 2 weeks, the copy on cache disk will be deleted. Please check this page https://scicomp.jlab.org/docs/wcache on how to use 'wcache tapeRemove' tool to delete a file from Jlab tape library.

/cache/junk/szscl21_xxxx_2316.lime

We can change this behavior. Feedbacks are welcome!!!

WDM (Pin Policy)• Any user in the group can pin files, but pin will

fail if total pin in the group exceed 30% of total quota.

• Project manager can send a request to increase project quota if needed.

• File will not been pinned when ‘get file’ is called. User will pin file after it is on disk.

Best Practice• When create new directories:– Set correct permission.– Use ‘correct name’, which means project related

name, should not rename any top directory.• When create new files:– Check files in /mss/halld (map to /cache/halld)

• When cache the files:- Cache the total data size. Some files maybe

flushed out of disk when large amount (more than the quota) of cache is called.

WDM Utility - wcachewcache projectInfo projectName --- get project info specified by projectName (such as halld)

wcache pin [-D days] life file1 file2 ... fileN --- pin specify file(s) for a given day (default 7 days)

wcache unpin file1 file2 ... fileN --- unpin specify file(s)

wcache pinInfo [-u user] [-n number] --- print user’s newest pin up to given number (default 100)

wcache get file1 file2 ... fileN --- cache specify file(s) from tape library

wcache put [–d] file1 file2 ... fileN --- backup file(s) into tape library and delete from disk if [-d]

wcache requestStatus requestIndex --- get request status specified by request index

wcache pendingRequest [-u user] ---- get unfinished request status

wcache cancelRequest requestIndex ---- cancel a unfinished request specify by request index

wcache checksum file --- print crc32 checksum of specify file

wcache tapeRemove file1 file2 ... fileN --- remove specified file(s) from Jlab tape library

wcache list [options] file1 file2 ... fileN --- list file properties (cacheManager related meta data)

Note: Last two commands will coming soon.

WDM Utility - wcache• Wcache client is installed at /site/bin/wcache.• File path in argument must starts /cache/.• Wild card in file path is not supported in this version.• If there is any error detected before server performs

the work, a text error message will print out.Error: Invalid file /cache/halld/bad_file (not in

/mss)

• The arguments, options and return value may change during next few weeks. Please run ‘wcache –h’ for updated information.

wcache projectInfo/site/bin/wcache projectInfo halldprojectName: halldrootPath: /cache/halld/reserved: 1,000 GBquota: 10,000 GBavailablePin: 3,221,225,472,000cached: 176,426,721,312pinned: 0smallFile: 0

/site/bin/wcache projectInfo hallError: Invalid project name 'hall'

wcache getwcache get /cache/junk/grid13.tar /cache/halld/good_fileError: no permission to create /cache/halld/good_file

wcache get /cache/halld/bad_file /cache/junk/grid3.tarError: Invalid file /cache/halld/bad_file (not in /mss)

wcache get /cache/junk/123 /cache/junk/2316.limeget request: 23 status: pending/cache/junk/123 -> fail (not in tape library)/cache/junk/2316.lime -> pending

wcache getSince /cache/halld maps to /mss/halld. User can get any file under /mss/halld. But file path must starts with /cache/halld/… (not /mss/halld/…).

ls /mss/halld/halld-scratch/hdopsbigfile2 et2evio_000000.evio.001 et2evio_000000.evio.003 et2evio_000000.evio.005 et2evio_000000.evio.007 et2evio_000000.evio.009et2evio_000000.evio.000 et2evio_000000.evio.002 et2evio_000000.evio.004 et2evio_000000.evio.006 et2evio_000000.evio.008 et2evio_000000.evio.011

Wcache get /cache/halld/halld-scratch/hdops/bigfile2

User must creates all parent directories before run ‘wcache get’. If /cache/halld/halld-scratch/hdops doesn’t exist, create all directories with correct group permission first.

wcache putwcache put /cache/junk/2316.lime /cache/junk/1234.lime Error: /cache/junk/1234.lime doesn't exist in cache disk

wcache put /cache/junk/2324.limeput request: -1status: done/cache/junk/2324.lime -> done (file already in mss)

wcache put /cache/junk/2324.lime /cache/junk/2316.lime.save -dput request: 46status: pending/cache/junk/2324.lime -> done (file already in mss) (will be deleted from disk soon)/cache/junk/2316.lime.save -> pending (will be deleted from disk after put finish)

Wcache pendingRequest/site/bin/wcache pendingRequestget request: 37user: ychenstatus: pending/cache/junk/57262.A16 -> pending/cache/junk/57262.A16.f -> pending

get request: 38user: ychenstatus: pending/cache/junk/2316.lime -> pending

put request: 41user: ychenstatus: pending/cache/junk/2316.lime.save -> pending

wcache pendingRequest -u chenchen has no pending request

wcache requestStatus/site/bin/wcache requestStatus 37get request: 37user: ychenstatus: pending/cache/junk/57262.A16 -> pending/cache/junk/57262.A16.f -> pending

/site/bin/wcache requestStatus 230Error: Invalid request index '230'

wcache pinwcache pin /cache/junk/2324.lime /cache/junk/2316.lime.save/cache/junk/2324.lime -> success (pinned for 7 days)/cache/junk/2316.lime.save -> success (pinned for 7 days)

wcache pin -D 12 /cache/junk/2316.lime.save/cache/junk/2316.lime.save -> success (pinned for 12 days)

wcache pinStatus/site/bin/wcache pinInfo ychen pin 3 files and last 3 pins:/cache/junk/szsc_cfg_2324.lime pinned at 2015-03-17 10:54:06.0 for 4 days/cache/junk/2316.lime pinned at 2015-03-17 10:54:06.0 for 4 days/cache/junk/2324.lime pinned at 2015-03-16 10:57:56.0 for 10 days

/site/bin/wcache pinInfo -u chenchen has no pin.

wcache unpinwcache unpin /cache/junk/2324.lime /cache/junk/2316.lime /cache/junk/2324.lime -> success/cache/junk/2316.lime -> failed (not pinned)

Development Tasks• Fix bugs and add new features (report problem and send

suggestions)• Better error handle• ‘wcache tapeRemove’ will available soon.• ‘wcache put -d’ will available soon.• ‘wcache list’ will available soon.• Wcache web display pages will available soon.

• Update Auger to handle ‘wcache get’/‘wcache pin’/‘wcache unpin’ for Input of from write-through cache disk if user think it is necessary.

• Admin page to create new project.

Changes in Batch Job – jsub script• Specify a input file – Use syntax of files on /volatile– INPUT_FILES: /cache/halld/file.dat (not INPUT_FILES: /mss/cache/halld/file.dat )

– <Input src="/cache/halld/file.dat" dest="file.dat"/>not <Input src="mss:/mss/cache/halld/file.dat" dest="file.dat"/>

• Auger will not interact with wcache server to get/pin files under write-through disk pool /cache/halld. It is user’s responsibility to make sure the file is on cache disk

Changes in Batch Job – jsub script• Output file– OUTPUT_DATA: file.out – OUTPUT_TEMPLATE: /cache/halld/outdir/file.outOUTPUT_TEMPLATE: /mss/cache/outdir/file.out – <Output src="file.out" dest="/cache/halld/outdir/file.out"/><Output src=“file.out” dest="mss:/mss/cache/halld/outdir/file.out”/>

• Auger will not jput the file to tape (just copy to cache disk) and cacheManager will do it after 12 days.

Changes in Batch Job – file stage• Input file from /cache/halld/ will be copied to

farm node.• Output file to /cache/halld/ will be copy to

cache disk (not tape library). • At first Auger will not cache/pin/unpin for

these files (assume the file is on disk during the life of farm job).