gus plugin system michael saffitz genomics unified schema workshop july 6-8th, philadelphia,...
TRANSCRIPT
GUS Plugin System
Michael Saffitz
Genomics Unified Schema Workshop
July 6-8th, Philadelphia, Pennsylvania
Plugin Overview
Small Perl programs that load and manipulate data within GUS
Written using the GUS Plugin API and Perl Object Layer
Provide automatic support for: Data Provenance Object layer and database connectivity Standardized documentation Command line argument processing Logging Error Handling
“Supported” and “Community” Plugins provided with GUS
Supported Plugins
Have been tested in Oracle and Postgres and are confirmed to work
Portable
Useful beyond the site that developed them
Meet the GUS Plugin Standard
Community Plugins
Fail to meet one or more of the criteria above Have not been tested
Provided as a general resource to the community
Plugin Life Cycle
Plugin Initialization Documentation Command Line Arguments
Data Loading Reading, Parsing, Querying
Data Manipulation Insert or Update? Restart Logic
Data Submission
GUS Supported Plugins InsertArrayDesignControl.pm InsertAssayControl.pm InsertBlastSimilarities.pm InsertExternalDatabase.pm InsertExternalDatabaseRls.pm InsertGOEvidenceCode.pm InsertGeneOntology.pm InsertGeneOntologyAssoc.pm
InsertRadAnalysis.pm InsertReviewStatus.pm InsertSecondaryStructure.pm InsertSequenceOntology.pm LoadArrayDesign.pm LoadArrayResults.pm LoadFastaSequences.pm LoadGusXml.pm LoadNRDB.pm LoadRow.pm LoadTaxon.pm
Plugin Shell
package GUS::Supported::Plugin::LoadRow;
@ISA = qw(GUS::PluginMgr::Plugin);
use strict;use GUS::PluginMgr::Plugin;
sub new { … }
sub run { … }
Plugin Initialization
sub new {my ($class) = @_;my $self = {};bless($self, $class);
$self->initialize({ requiredDbVersion => 3.5, cvsRevision => '$Revision: 2934 $',
name => ref($self), argsDeclaration => $argsDeclaration, documentation => $documentation });
return $self;}
Declaring Arguments stringArg({name => 'externalDatabaseVersion', descr => 'sres.externaldatabaserelease.version for this instance of
NRDB', constraintFunc => undef, reqd => 1, isList => 0 }),
fileArg({name => 'gitax', descr => 'pathname for the gi_taxid_prot.dmp file', constraintFunc => undef, reqd => 1, isList => 0, mustExist => 1, format => 'Text' }),
Argument Types
String Integer Boolean Table Name Float File Enumeration Controlled Vocab
Local, Database Term Pairs for “dinky” CVs
Declaring Documentation
my $tablesDependedOn = [['GUS::Model::DoTS::NRDBEntry', 'pulls aa_sequence_id from here when id and extDbId match requested']];
my $documentation = {purposeBrief => $purposeBrief,purpose => $purpose,tablesAffected => $tablesAffected,tablesDependedOn => $tablesDependedOn,howToRestart => $howToRestart,failureCases => $failureCases,notes => $notes
};
Plugin Initializationsub new {
my ($class) = @_;my $self = {};bless($self, $class);
$self->initialize({ requiredDbVersion => 3.5, cvsRevision => '$Revision: 2934 $',
name => ref($self), argsDeclaration => $argsDeclaration, documentation => $documentation });
return $self;}
Plugin Shell
package GUS::Supported::Plugin::LoadRow;
@ISA = qw(GUS::PluginMgr::Plugin);
use strict;use GUS::PluginMgr::Plugin;
sub new { … }
sub run { … }
Run Method
“Entry point” for plugin Concise overview/“table of contents” for plugin:
sub run {my ($self) = @_;my $rows = 0;my $rawData = $self->readData();my @parsedData = $self->parseData($rawData);foreach $data (@parsedData) {
$data->submit(); $rows++;
}return “Inserted $rows ”;
}
Accessing Data
Command line arguments: $self->getArg(‘nrdbFile);
Through Objects: my $preExtAASeq =GUS::Model::DoTS::ExternalAASequence->new
({'aa_sequence_id'=>$aa_seq_id});$preExtAASeq->retrieveFromDB();
Direct Database Access: my $dbh = $self->getQueryHandle();
my $sth = $dbh->prepare(…);
Persisting Data
Saving & Updating: $obj->submit(); Will cascade and submit children
Delete: $obj->markDeleted(1);
$obj->submit();
Logging and Error Handling
For general logging, use logging functions Printed to STDERR $self->log(“message”)
For error handling: Either die() immediately or Write errors to a file (for recoverable errors)
Restart functionality Check for object existence Check, but ensure loaded from a valid proper invocation Store data from previous run and use as a filter
Clearing the Cache
Historical: Perl previously had poor garbage collection support
Default capacity of 10000 objects
At the bottom of the outermost loop: $self->undefPointerCache();
Data Provenance
Tracks plugin revisions-- Name, Checksum, Revision
Tracks parameters that a specific plugin is executed with
Algorithm
AlgorithmImplementation
AlgorithmInvocation
AlgorithmParamKey
AlgorithmParamKeyType
AlgorithmParam
Plugin Evolution
Changes abound: Data file formats Schema
Be flexible in writing plugins-- command line configuration
Be clear about what schema objects you use