our friends the utils: a highway traveled by wheels we didn't re-invent
Post on 14-Jul-2015
434 Views
Preview:
TRANSCRIPT
Our Friends the Utils:A highway traveled by wheels we didn't re-invent.
Steven LembarkWorkhorse Computing
lembark@wrkhors.com
Meet the Utils
● Scalar::Util & List::Util were first written in by the ancient Prophet of Barr (c. 1997).
● The modules provide often-requested features that were not worth modifying Perl itself to offer.
● Later, List::MoreUtils added features that List::Util does not include.
● If the Sound of Perl is an un-bloodied wall, the Utils are a superhighway traveled by truly lazy wheels.
Mixing old and new
● Several features in v5.10+ overlap Util features.– Smart matches are the most obvious, and are usually
compared with List::Util::first.
– New features are not replacements, but work well with the modules.
– Examples here show how to use the modules with smart matching, switches.
● What's important to notice is that these modules remain relevant.
Scalar::Util
Provides introspection for scalars:– Is a filehandle [still] open?
– The address, type, and class of a variable.
– Is a value “numeric” according to Perl?
– Does the variable contain readonly or tainted data?
– Tools for managing weak references or modifying prototypes.
● Handling these in Pure Perl is messy, slow, or error-prone.
Dealing with ref's & objects
● Collectively these replace “ref” or stringified references with a simpler, cleaner interface.
● The problem with ref and stringified objects is that they return different data for objects or “plain” refs.– Stringified refs are “Foobar=ARRAY(0x29eba90)”,
unless overloading gets in the way.
– Ref returns the address and base type, unless the reference is blessed.
● blessed, refaddr, & reftype are consistent.
Blessed is the Object
● blessed returns a class or undef.● This simplifies sanity checks:
blessed $_[0] or die 'Non-object...';
● Construction with objects for types:
bless $x, blessed $proto || $proto;
avoids classes like “ARRAY(0xab1234)”.● Check for blessed before “can” to avoid errors:
blessed $x && $x->can( $x ) or die ...
Blessed Structures
● ref does not return the base type of a blessed ref.● reftype returns the data type, regardless of blessing.● Works nicely with switches:
given( reftype $thing ) # blessed or not, same reftype{
when( undef ) { die “Not a reference: '$thing'” }
when( 'ARRAY' ) { ... }when( 'HASH' ) { ... }when( 'SCALAR' ) { ... }
die "Un-usable data type: '$_'";}
Blessed Matches
● Smart-matching an object requires an overloading.● Developers would like to QA their modules to
validate the overload is available.● A generic test is simple: blessed scalars that
can( '~~' ) are usable.● Writing this test with only ref is a pain.● With Scalar::Utils it is blessedly simple:
blessed $var && $var->can( '~~' )or die ...
The guts of “inside out” classes
● Virtual addresses are unique during execution.● Make useful keys for associating external data.● Problem is that stringified refs include too much data:
– Plain : ARRAY(0XEAA750)
– Blessed: Foo=ARRAY(0XEAA750)
– Re-blessed: Bletch=ARRAY(0XEAA750)
● The extra data makes them unusable as keys.● Parsing the ref's to extract the address is too slow.
The key to your guts: refaddr
● refaddr returns only the address portion of a ref:– Previous values all look like: 0XEAA750
● Note the lack of package or type.● This is not affected by [re]blessing the variable.● This leaves $data{ refaddr $ref } a stable over
the life cycle of a ref or object.
use Scalar::Util qw( refaddr );
my %obj2data = (); # private cache for object data.
sub set{ my ( $obj, $data ) = @_; $obj2data{ refaddr $obj } = $data; return}
sub get{ $obj2data{ refaddr $_[0] }}
# have to manually clear out the cache.
DESTROY{ delete $obj2data{ refaddr $_[0] }; $obj->NEXT::DESTROY;}
Circular references are not garbage● In fact, with Perl's reference counting they are
normally memory leaks.● These are any case where a variable keeps alive
some extra reference to itself:– Self reference: $a = \$a
– Linked list: $a->[0] = [ [], \$a, @data ]
● The first is probably a mistake, the second is a properly formed doubly-linked list.
● Both of them prevent $a from ever being released.
Fix: Weak References
● Weak ref's do not increment the var's reference count.
● In this case $backlink does not prevent cleaning $a:
weaken ( my $backlink = $a );
@$a = ( [], $backlink, @data );
● $a->[1] will be undef if $a goes out of scope.● isweak returns true for weak ref's.
Aside: Accidentally getting strong● Copies are strong references unless they are
explicitly weakened.● This can leave you accidentally keeping items alive
with things like:
my @a = grep { defined } @a;
this leaves @a with strong references that have to be explicitly weakened again.
● See Scalar::Util's POD for dealing with this.
Knowing Your Numbers
● We've all seen code that checks for numeric values with a regex like /^\d+$/.
● Aside from being slow, this simply does not work.
Exercse: Come up with a working regex that gracefully handles all of Perl's numeric types including int, float, exponents, hex, and octal along with optional whitespace.
● Better yet, let Perl figure it out for you:
if( looks_like_number $x ) { … }
Switching on numerics
● Switches with looks_like_number help parsing and make the logic more readable:
if( looks_like_number $_ ){
…}elsif( $regex )
# deal with text...
}
Sorting and Sanity Checks
sub generic_minimum{
looks_like_number $_[0]$_[0] ? min @_ : minstr @_
}
sub numeric_input{ my $numstr = get_user_input;
looks_like_number $numstr or die "Not a number: '$numstr'";
$numstr}
Anonymous Prototyping
● set_prototype adjusts the prototype on a subref.– Including anonymous subroutines.
– Allows installation of subs that handle block inputs or multiple arrays – think of import subs.
● Another is removing or modifying mis-guided prototypes in wrappers that call them.– Example is a prototype of “$$” that prevents calling a
wrapped sub with “@_”.
Bi-polar Variables
● dulvar is a fast handler for dealing with multimode string+numeric data.
● Returns stringy or numeric portion depending on context:
$a = dualvar ( 90, '/var/tmp' );
print $a if $a > 80; # prints “/var/tmp”
or
sort { $a <=> $b or $a cmp $b } @list;
● dulvar's are faster than blessed ref's with overloads and offer better encapsulation.
But wait, there's more!!!
● Obvious sanity checks:● openhandle returns true for an open filehandle.
– validate stdin for interactive sessions.
– check for [still] live sockets.
● isvstring returns true for a vstrings (e.g., “v5.16.0”).
● tainted returns true for tainted values.● isreadonly checks for readonly values or variables.
Managing lists
● List::Util provides mostly-obvious functions: sum, max, min, maxstr, minstr, shuffle, first, and reduce.
● max and min compare numbers, maxstr and minstr handle strings.
● shuffle randomized the order of a list – useful for security or simulations.
● first & reduce take a bit more explanation...
First Thing: Why Bother?
● These can all be written in Pure Perl.● Why bother with Yet Another Module and XS?
– Most people think of speed, which is true.
– These all have simple, clean interfaces that Just Work.
– XS encapsulates the in-work data.
– Module provides them in one place, once, with POD.
● So, speed is not the only issue –but it doesn't hurt that these are fast.
Second Thing's first()
● first looks a lot like grep, with a block and list.● Unlike grep, first stops after finding the first match.● It returns the first scalar that leaves the block true – not
the blocks output!● Lists don't have to be data: they can be anything.
my $odd = first { $_ % 2} @itemz;
my $valid= first { /$rx/ } @regexen;
my $found= first { foo $_} @inputz;
my $obj = first { $_->valid($data) } @objz
or die “Invalid data...”;
first with ~~ for validation
● Ever get sick of running through if-blocks for mutually exclusive switches?
● first with smart matching offers is declarative:
● Hash-slicing the arguments array allows comparing invalid values with the same structure.
my @bogus = ( [ qw( fork debug ) ], … ); ...if( my $botched = first { $_ ~~ %argz } @bogus ){
local $” = ' ';die “Mutually exclusive: @$botched”;
}
Working smarter
● First saves overhead by stopping early.● Returning a scalar simplifies the syntax for
assigning a result.● Depending on your data, first on an array may be
faster than exists on a hash key.● Useful for more than iterating data:
– Use a list of regexes to determine what type of data is being processed.
– Lists of objects can be iterated to find the correct parser for general input.
Smart Match ~~ first
● Unlike most Perly boolean operators, smart returns true or false, not the argument value that left it true.
● first returns the value that matched:
my $found = first { $record ~~ $_ } @filterz;
● $found is the first entry from @filterz that matches the record.
● Filters can be regexen, arrays, hashes, or objects with overloaded ~~ matching valid or unusable data.
– Use to check edge-cases in testing data handlers.
Inside-out data for a regex● Use an inside-out structure to associate arbitrary
data or state with the regex.● Smart matching handles blessed regexen properly:
works equally well with std regex or object.
my $regex1 = qr{ ... };my $regex2 = qr{ ... };
$inside{ refaddr $regex1 } = [];
my @filtrz = ( $regex1, $regex2 );my $found = first { $input ~~ $_ } @filtrz;
push @{ $inside{ refaddr $found }, $input;
Use first to pick handlers
● Say you have records with a variety of fields.● A set of arrays with the required fields for handlers
makes it easy to pick the right one:
● Add a bit of inside-out data and you can dispatch the record and its handler in a few lines of code.
my @keyz = ( [ qw( ... ) ], [ qw( ... ) ] );
my $found = first { $record ~~ $_ } @keyzor die 'Record fails minimum key test';
Reducing your workload
● All of the min, max, and sum functions are canned versions of reduce.
● reduce looks like sort, with $a and $b.● Empty returns undef, singletons return themselves.● Otherwise:
– $a, $b are aliased to the first two list values.
– The block's result is assigned to $a.
– $b is cycled through the remaining list values.
Example: min, max, sum, prodmy @list = ( 1 .. 100 );
my $min = reduce { $a < $b ? $a : $b } @list;my $max = reduce { $a > $b ? $a : $b } @list;
# sum, product roll the value forward:
my $sum = reduce { $a += $b } @list;my $prd = reduce { $a *= $b } @list;
# sum of x-squared uses a placeholder:
my $sumx2= reduce { $a += $b**2 } ( 0,@list );
But wait, there's more more!!!
● List::Utils lacks a number of operations that are easy to implement in Pure Perl:– unique
– interleave, every nth record, groups of N records.
● Using XS does have advantages, not the least having none of use re-write the same Pure Perl.
● So... we have List::MoreUtils, written by Adam Kennedy, maintained by Tassilo von Parseval.
Taking lazyness to XS
● This module is a kitchen sink of things you've done at least once:
any all none notall true false firstidxfirst_index lastidx last_index insert_afterinsert_after_string apply indexes afterafter_incl before before_incl firstvalfirst_value lastval last_value each_arrayeach_arrayref pairwise natatime mesh zip uniqdistinct minmax part
Indexes and last items
● first is nice, but to find the last item you need to reverse a list, which is expensive.
● Looking up using indexes with first requires $ary[$_], which also gets expensive.
● last, last_index, first_index do what you'd expect [novel idea, what?].
● before and after are more compact versions of slices using the results of first_index.
If first is false, use any
● first returns a list value, which might be false.● any() returns true the first time its block is true.● Solves tests using first failing on a false list value:
# $x is 0, $y is 1
@list = ( 0, 1, 2 );
$x = first { defined $_ } @list;
$y = any { defined $_ } @list;
Unique lists
● MoreUtil's unique returns a list in its original order (list) or the last value (scalar):
● Using hash keys gives a random order.● Any Pure Perl approach requires sort or lots of index
operations.
# 1 2 3 5 4my @x = uniq 1, 1, 2, 2, 3, 5, 3, 4;# 5my $x = uniq 1, 1, 2, 2, 3, 5, 3, 4;
Relative locations
● insert_after places an item after the first item for which its block passes.
● insert_after_string uses a string compare, avoiding the need for a block.
● Example: post-insert sentinel values into processed lists.
apply: map Without Side-effects
● One downside to map, sort, & grep is that they alias their block variables.– Updating $_ or $a/$b will alter the inputs.
● apply works like map: extracting the result of a block applied to each element in a list.– The difference is that $_ is copied, not aliased.
– The inputs are safe from modification.
Merging Lists
● Pairwise processing of lists uses prototypes to keep the syntax saner:
@sum_xy = pairwise { $a + $b } @x, @y;
@x = pairwise { $a->($b) } @subz, @valz;
● Nice for merging key/value pairs, which is what mesh does without a block:
%y = pairwise{ ($a,$b) } @keyz, @valz;
%y = mesh @keyz, @valz;
● Prototypes require arrays; arrayrefs have to use “@$arrayref” sytax.
Iterating Separate Lists
● each_array generates an iterator that cycles through successive values in multiple lists:
my $each = each_array @a, @b, @c;
while( my( $a, $b, $c ) = $each->() ) { … }
● This avoids having to destroy the lists with shift or the overhead of many index accesses.
● each_arrayref takes arrayref (vs. array) args.● Limitation of prototypes: can't mix arrays & refs.
Breaking up is easy to do
● Partitioning a list is quite doable in Pure Perl but gets messy when handling arbitrary lists.
● part uses a block to select index entries, returning an array[ref] segregated by the block output:
# [ 1, 3, 5, 7 ], [ 2, 4, 6, 8 ]
my @partz = part { $i ++ % 2 } ( 1 .. 8 );
● using %3 generates three lists.● Block can use regexen (including parsing results),
looks_like_number, error levels, whatever.
POD is your friend
● Actually, the module authors are: All of these modules are well documented, with good examples.
● Especially for MoreUtils: Take the time to run the POD code in a debugger to see what it does.
CPAN & the Power of Perl
● Code on CPAN isn't mouldy just because it's old.– The modules are kept up to date.
– The guts of Perl have remained stable enough to keep the XS working.
● This is due to a lot of effort from module owners and Perl hackers.
Summary
● Smart matches did not obviate “first”, they work together.
● Utils work with newer features like smart matching and switches.
● Any time you find yourself hacking indexes, it's probably time to think about these modules.
● POD is your friend – check the modules for examples (and good examples of writing XS).
● Truly lazy wheels are not re-invented.
top related