spl: the undiscovered library - datastructures
DESCRIPTION
Slides from presentation given to the Brighton PHP group on 15th December 2014TRANSCRIPT
SPLThe Undiscovered Library
Exploring DataStructures
Who am I?
Mark BakerDesign and Development ManagerInnovEd (Innovative Solutions for Education) Ltd
Coordinator and Developer of:Open Source PHPOffice library
PHPExcel, PHPWord,PHPPowerPoint, PHPProject, PHPVisioMinor contributor to PHP core
@Mark_Baker
https://github.com/MarkBaker
http://uk.linkedin.com/pub/mark-baker/b/572/171
SPL – Standard PHP Library
• SPL provides a standard set of interfaces for PHP5• The aim of SPL is to implement some efficient data access interfaces
and classes for PHP• Introduced with PHP 5.0.0• Included as standard with PHP since version 5.3.0• SPL DataStructures were added for version 5.3.0
SPL DataStructures
Dictionary DataStructures (Maps)• Fixed Arrays
Linear DataStructures• Doubly-Linked Lists• Stacks• Queues
Tree DataStructures• Heaps
SPL DataStructures – Why use them?• Can improve performance• When the right structures are used in the right place
• Can reduce memory usage• When the right structures are used in the right place
• Already implemented and tested in PHP core• Saves work!
• Can be type-hinted in function/method definitions• Adds semantics to your code
SPL DataStructures
Dictionary DataStructures (Maps)• Fixed Arrays
Linear DataStructuresTree DataStructures
Fixed Arrays
• Predefined Size• Enumerated indexes only, not Associative• Indexed from 0• Is an object• No hashing required for keys
• Implements • Iterator• ArrayAccess• Countable
Fixed Arrays – Uses
• Returned Database resultsets, Record collections• Hours of Day• Days of Month/Year• Hotel Rooms, Airline seats
As a 2-d fixed array
Fixed Arrays – Big-O Complexity
• Insert an element O(1)• Delete an element O(1)• Lookup an element O(1)• Resize a Fixed Array O(n)
Fixed ArraysStandard Arrays SPLFixedArray
Data Record 1
Key 12345
Data Record 2Key 23456
Data Record 4Key 34567
Data Record 3Key 45678
[0]
[1]
[2]
[…]
[…]
[12]
[n-1]
HashFunction
Key 12345Key 23456
Key 45678Key 34567
Data Record 1Key 0
Data Record 2Key 1
Data Record 3Key 2
Data Record 4Key 3
[0]
[1]
[2]
[…]
[…]
[12]
[n-1]
Key 0Key 1
Key 2Key 3
Fixed Arrays
$a = array();
for ($i = 0; $i < $size; ++$i) { $a[$i] = $i;}
// Random/Indexed access for ($i = 0; $i < $size; ++$i) { $r = $a[$i];}
// Sequential access foreach($a as $v) { }
// Sequential access with keysforeach($a as $k => $v) {}
Initialise: 0.0000 sSet 1,000,000 Entries: 0.4671 sRead 1,000,000 Entries: 0.3326 sIterate values for 1,000,000 Entries: 0.0436 sIterate keys and values for 1,000,000 Entries: 0.0839 s
Total Time: 0.9272 sMemory: 82,352.55 k
Fixed Arrays
$a = new \SPLFixedArray($size);
for ($i = 0; $i < $size; ++$i) { $a[$i] = $i;}
// Random/Indexed access for ($i = 0; $i < $size; ++$i) { $r = $a[$i];}
// Sequential access foreach($a as $v) { }
// Sequential access with keysforeach($a as $k => $v) {}
Initialise: 0.0013 sSet 1,000,000 Entries: 0.3919 sRead 1,000,000 Entries: 0.3277 sIterate values for 1,000,000 Entries: 0.1129 sIterate keys and values for 1,000,000 Entries: 0.1531 s
Total Time: 0.9869 sMemory: 35,288.41 k
Initialise (s) Set Values (s) Sequential Read (s)
Random Read (s) Pop (s)0.0000
0.0100
0.0200
0.0300
0.0400
0.0500
0.0600
0.0700
Speed
SPL Fixed Array Standard PHP Array
Fixed Arrays
Current Memory (k) Peak Memory (k)0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Memory Usage
SPL Fixed Array Standard PHP Array
Fixed Arrays
• Faster direct access• Lower memory usage• Faster for random/indexed access than for sequential access
Fixed Arrays – Gotchas
• Can be extended, but at a cost in speed• Standard array functions won’t work with SPLFixedArray
e.g. array_walk(), sort(), array_pop(), implode()
• Avoid unsetting elements if possible• Unlike standard PHP enumerated arrays, this leaves empty nodes that trigger
an Exception if accessed
SPL DataStructures
Dictionary DataStructures (Maps)Linear DataStructures• Doubly-Linked Lists• Stacks• Queues
Tree DataStructures
Doubly Linked Lists
Doubly Linked Lists
• Iterable Lists• Top to Bottom• Bottom to Top
• Unindexed• Good for sequential access
• Not good for random/indexed access
• Implements • Iterator• ArrayAccess• Countable
Doubly Linked Lists – Uses
• Stacks• Queues• Most-recently used lists• Undo functionality• Trees• Memory Allocators• Fast dynamic, iterable arrays (not PHP’s hashed arrays)• iNode maps• Video frame queues
Doubly Linked Lists – Big-O Complexity• Insert an element by index O(1)• Delete an element by index O(1)• Lookup by index O(n)• I have seen people saying that SPLDoublyLinkedList behaves like a hash table
for lookups, which would make it O(1); but timing tests prove otherwise
• Access a node at the beginning of the list O(1)• Access a node at the end of the list O(1)
Doubly Linked Lists
Head Tail
A B C D E
Doubly Linked Lists
$a = array();
for ($i = 0; $i < $size; ++$i) { $a[$i] = $i;}
// Random/Indexed access for ($i = 0; $i < $size; ++$i) { $r = $a[$i];}
// Sequential access for ($i = 0; $i < $size; ++$i) { $r = array_pop($a); }
Initialise: 0.0000 sSet 100,000 Entries: 0.0585 sRead 100,000 Entries: 0.0378 sPop 100,000 Entries: 0.1383 sTotal Time: 0.2346 s
Memory: 644.55 kPeak Memory: 8457.91 k
Doubly Linked Lists
$a = new \SplDoublyLinkedList();
for ($i = 0; $i < $size; ++$i) { $a->push($i); }
// Random/Indexed access for ($i = 0; $i < $size; ++$i) { $r = $a->offsetGet($i); }
// Sequential access for ($i = $size-1; $i >= 0; --$i) { $a->pop(); }
Initialise: 0.0000 sSet 100,000 Entries: 0.1514 sRead 100,000 Entries: 22.7068 sPop 100,000 Entries: 0.1465 sTotal Time: 23.0047 s
Memory: 133.67 kPeak Memory: 5603.09 k
Doubly Linked Lists
• Fast for sequential access• Lower memory usage• Traversable in both directions• Size limited only by memory
• Slow for random/indexed access• Insert into middle of list only available from PHP 5.5.0
SPL DataStructures
Dictionary DataStructures (Maps)Linear DataStructures• Doubly-Linked Lists• Stacks• Queues
Tree DataStructures
Stacks
Stacks
• Implemented as a Doubly-Linked List• LIFO
• Last-In• First-Out
• Essential Operations• push()• pop()
• Optional Operations• count()• isEmpty()• peek()
Stack – Uses
• Undo mechanism (e.g. In text editors)• Backtracking (e.g. Finding a route through a maze)• Call Handler (e.g. Defining return location for nested calls)• Shunting Yard Algorithm (e.g. Converting Infix to Postfix notation)• Evaluating a Postfix Expression• Depth-First Search
Stacks – Big-O Complexity
• Push an element O(1)• Pop an element O(1)
Stacksclass StandardArrayStack {
private $_stack = array();
public function count() { return count($this->_stack); }
public function push($data) { $this->_stack[] = $data; }
public function pop() { if (count($this->_stack) > 0) { return array_pop($this->_stack); } return NULL; }
function isEmpty() { return count($this->_stack) == 0; }
}
Stacks
$a = new \StandardArrayStack();
for ($i = 1; $i <= $size; ++$i) { $a->push($i); }
while (!$a->isEmpty()) { $i = $a->pop(); }
PUSH 100,000 ENTRIESPush Time: 0.5818 sCurrent Memory: 8.75
POP 100,000 ENTRIESPop Time: 1.6657 sCurrent Memory: 2.25
Total Time: 2.2488 sCurrent Memory: 2.25Peak Memory: 8.75
Stacksclass StandardArrayStack {
private $_stack = array();
private $_count = 0;
public function count() { return $this->_count; }
public function push($data) { ++$this->_count; $this->_stack[] = $data; }
public function pop() { if ($this->_count > 0) { --$this->_count; return array_pop($this->_stack); } return NULL; }
function isEmpty() { return $this->_count == 0; }
}
Stacks
$a = new \StandardArrayStack();
for ($i = 1; $i <= $size; ++$i) { $a->push($i); }
while (!$a->isEmpty()) { $i = $a->pop(); }
PUSH 100,000 ENTRIESPush Time: 0.5699 sCurrent Memory: 8.75
POP 100,000 ENTRIESPop Time: 1.1005 sCurrent Memory: 1.75
Total Time: 1.6713 sCurrent Memory: 1.75Peak Memory: 8.75
Stacks
$a = new \SPLStack();
for ($i = 1; $i <= $size; ++$i) { $a->push($i); }
while (!$a->isEmpty()) { $i = $a->pop(); }
PUSH 100,000 ENTRIESPush Time: 0.4301 sCurrent Memory: 5.50
POP 100,000 ENTRIESPop Time: 0.6413 sCurrent Memory: 0.75
Total Time: 1.0723 sCurrent Memory: 0.75Peak Memory: 5.50
Stacks
StandardArrayStack StandardArrayStack2 SPLStack0.0000
0.0200
0.0400
0.0600
0.0800
0.1000
0.1200
0.1400
0
1
2
3
4
5
6
7
8
9
10
0.0796 0.0782
0.0644
0.1244
0.0998
0.0693
8.75 8.75
5.50
Stack Timings
Push Time (s)Pop Time (s)Memory after Push (MB)
Tim
e (s
econ
ds)
Mem
ory
(MB)
Stacks – Gotchas• Peek (view an entry from the middle of the stack)
• StandardArrayStackpublic function peek($n = 0) { if ((count($this->_stack) - $n) < 0) { return NULL; } return $this->_stack[count($this->_stack) - $n - 1]; }
• StandardArrayStack2public function peek($n = 0) { if (($this->_count - $n) < 0) { return NULL; } return $this->_stack[$this->_count - $n - 1]; }
• SPLStack$r = $a->offsetGet($n);
Stacks – Gotchas
StandardArrayStack StandardArrayStack2 SPLStack0.0000
0.0200
0.0400
0.0600
0.0800
0.1000
0.1200
0.1400
0.1600
0.1800
0.00
0.20
0.40
0.60
0.80
1.00
1.20
0.0075 0.0077 0.00640.0111
0.0078
0.1627
0.0124 0.00980.0066
1.00 1.00
0.75
Stack Timings
Push Time (s)Peek Time (s)Pop Time (s)Memory after Push (MB)
Tim
e (s
econ
ds)
Mem
ory
(MB)
Stacks – Gotchas
• PeekWhen looking through the stack, SPLStack has to follow each link in the “chain” until it finds the nth entry
SPL DataStructures
Dictionary DataStructures (Maps)Linear DataStructures• Doubly-Linked Lists• Stacks• Queues
Tree DataStructures
Queues
Queues
• Implemented as a Doubly-Linked List• FIFO
• First-In• First-Out
• Essential Operations• enqueue()• dequeue()
• Optional Operations• count()• isEmpty()• peek()
Queues – Uses
• Job/print/message submissions• Breadth-First Search• Request handling (e.g. a Web server)
Queues – Big-O Complexity
• Enqueue an element O(1)• Dequeue an element O(1)
Queuesclass StandardArrayQueue {
private $_queue = array();
private $_count = 0;
public function count() { return $this->_count; }
public function enqueue($data) { ++$this->_count; $this->_queue[] = $data; }
public function dequeue() { if ($this->_count > 0) { --$this->_count; return array_shift($this->_queue); } return NULL; }
function isEmpty() { return $this->_count == 0; }
}
Queues
$a = new \StandardArrayQueue();
for ($i = 1; $i <= $size; ++$i) { $a->enqueue($i); }
while (!$a->isEmpty()) { $i = $a->dequeue(); }
ENQUEUE 100,000 ENTRIESEnqueue Time: 0.6884Current Memory: 8.75
DEQUEUE 100,000 ENTRIESDequeue Time: 335.8434Current Memory: 1.75
Total Time: 336.5330Current Memory: 1.75Peak Memory: 8.75
Queues
$a = new \SPLQueue();
for ($i = 1; $i <= $size; ++$i) { $a->enqueue($i); }
while (!$a->isEmpty()) { $i = $a->dequeue(); }
ENQUEUE 100,000 ENTRIESEnqueue Time: 0.4087Current Memory: 5.50
DEQUEUE 100,000 ENTRIESDequeue Time: 0.6148Current Memory: 0.75
Total Time: 1.0249Current Memory: 0.75Peak Memory: 5.50
Queues
StandardArrayQueue StandardArrayQueue2 SPLQueue0.0000
0.1000
0.2000
0.3000
0.4000
0.5000
0.6000
0.7000
0.00
0.20
0.40
0.60
0.80
1.00
1.20
0.0075 0.0080 0.00640.0087 0.0070
0.1582
0.6284 0.6277
0.0066
1.00 1.00
0.75
Queue Timings
Enqueue Time (s)Peek Time (s)Dequeue Time (s)Memory after Enqueue (MB)
Tim
e (s
econ
ds)
Mem
ory
(MB)
Queues – Gotchas
• DequeueIn standard PHP enumerated arrays, shift() and unshift() are expensive operations because they re-index the entire arrayThis problem does not apply to SPLQueue
• PeekWhen looking through the queue, SPLQueue has to follow each link in the “chain” until it finds the nth entry
SPL DataStructures
Dictionary DataStructures (Maps)Linear DataStructuresTree DataStructures• Heaps
Heaps
Heaps
• Ordered Lists• Random Input• Ordered Output
• Implemented as a binary tree structure• Essential Operations
• Insert• Extract• Ordering Rule
• Abstract that requires extending with the implementation of a compare() algorithm• compare() is reversed in comparison with usort compare callbacks
• Partially sorted on data entry
Heaps – Uses
• Heap sort• Selection algorithms (e.g. Max, Min, Median)• Graph algorithms• Prim’s Minimal Spanning Tree (connected weighted undirected graph)• Dijkstra’s Shortest Path (network or traffic routing)
• Priority Queues
Heaps – Big-O Complexity
• Insert an element O(log n)• Delete an element O(log n)• Access root element O(1)
Heaps
Heapsclass ExtendedSPLHeap extends \SPLHeap {
protected function compare($a, $b) { if ($a->latitude == $b->latitude) { return 0; } return ($a->latitude < $b->latitude) ? -1 : 1; }
}
$citiesHeap = new \ExtendedSPLHeap();
$file = new \SplFileObject("cities.csv"); $file->setFlags( \SplFileObject::DROP_NEW_LINE | \SplFileObject::SKIP_EMPTY );
while (!$file->eof()) { $cityData = $file->fgetcsv(); if ($cityData !== NULL) { $city = new \StdClass; $city->name = $cityData[0]; $city->latitude = $cityData[1]; $city->longitude = $cityData[2];
$citiesHeap->insert($city); } }
Heaps
echo 'There are ', $citiesHeap->count(), ' cities in the heap', PHP_EOL;
echo 'FROM NORTH TO SOUTH', PHP_EOL; foreach($citiesHeap as $city) { echo sprintf( "%-20s %+3.4f %+3.4f" . PHP_EOL, $city->name, $city->latitude, $city->longitude ); }
echo 'There are ', $citiesHeap->count(), ' cities in the heap', PHP_EOL;
Heaps
echo 'There are ', $citiesHeap->count(), ' cities in the heap', PHP_EOL;
echo 'FROM NORTH TO SOUTH', PHP_EOL; foreach($citiesHeap as $city) { echo sprintf( "%-20s %+3.4f %+3.4f" . PHP_EOL, $city->name, $city->latitude, $city->longitude ); }
echo 'There are ', $citiesHeap->count(), ' cities in the heap', PHP_EOL;
There are 69 cities in the heap
FROM NORTH TO SOUTH
Inverness +57.4717 -4.2254
Aberdeen +57.1500 -2.1000
Dundee +56.4500 -2.9833
Perth +56.3954 -3.4353
Stirling +56.1172 -3.9397
Edinburgh +55.9500 -3.2200
Glasgow +55.8700 -4.2700
Derry +54.9966 -7.3086
Newcastle upon Tyne +54.9833 -1.5833
Carlisle +54.8962 -2.9316
Sunderland +54.8717 -1.4581
Durham +54.7771 -1.5607
Belfast +54.6000 -5.9167
Lisburn +54.5097 -6.0374
Armagh +54.2940 -6.6659
Newry +54.1781 -6.3357
Ripon +54.1381 -1.5223
Heapsclass ExtendedSPLHeap extends \SPLHeap {
const NORTH_TO_SOUTH = 'north_to_south'; const SOUTH_TO_NORTH = 'south_to_north'; const EAST_TO_WEST = 'east_to_west'; const WEST_TO_EAST = 'west_to_east';
protected $_sortSequence = self::NORTH_TO_SOUTH;
protected function compare($a, $b) { switch($this->_sortSequence) { case self::NORTH_TO_SOUTH : if ($a->latitude == $b->latitude) return 0; return ($a->latitude < $b->latitude) ? -1 : 1; case self::SOUTH_TO_NORTH : if ($a->latitude == $b->latitude) return 0; return ($b->latitude < $a->latitude) ? -1 : 1; case self::EAST_TO_WEST : if ($a->longitude == $b->longitude) return 0; return ($a->longitude < $b->longitude) ? -1 : 1; case self::WEST_TO_EAST : if ($a->longitude == $b->longitude) return 0; return ($b->longitude < $a->longitude) ? -1 : 1; } }
public function setSortSequence( $sequence = self::NORTH_TO_SOUTH ) { $this->_sortSequence = $sequence; } }
$sortSequence = \ExtendedSPLHeap::WEST_TO_EAST; $citiesHeap = new \ExtendedSPLHeap(); $citiesHeap->setSortSequence($sortSequence);
$file = new \SplFileObject("cities.csv"); $file->setFlags( \SplFileObject::DROP_NEW_LINE | \SplFileObject::SKIP_EMPTY );
while (!$file->eof()) { $cityData = $file->fgetcsv(); if ($cityData !== NULL) { $city = new \StdClass; $city->name = $cityData[0]; $city->latitude = $cityData[1]; $city->longitude = $cityData[2];
$citiesHeap->insert($city); } }
Heapsclass ExtendedSPLHeap extends \SPLHeap {
protected $_longitude = 0; protected $_latitude = 0;
protected function compare($a, $b) { if ($a->distance == $b->distance) return 0; return ($a->distance > $b->distance) ? -1 : 1;
}
public function setLongitude($longitude) { $this->_longitude = $longitude; }
public function setLatitude($latitude) { $this->_latitude = $latitude; }
…..
public function insert($value) { $value->distance = $this->_calculateDistance($value); parent::insert($value); } }
$citiesHeap = new \ExtendedSPLHeap(); // Latitude and Longitude for Brighton $citiesHeap->setLatitude(50.8300); $citiesHeap->setLongitude(-0.1556);
$file = new \SplFileObject("cities.csv"); $file->setFlags( \SplFileObject::DROP_NEW_LINE | \SplFileObject::SKIP_EMPTY );
while (!$file->eof()) { $cityData = $file->fgetcsv(); if ($cityData !== NULL) { $city = new \StdClass; $city->name = $cityData[0]; $city->latitude = $cityData[1]; $city->longitude = $cityData[2];
$citiesHeap->insert($city); } }
Heaps – Gotchas
• Compare method is reversed logic from a usort() callback• Traversing the heap removes elements from the heap
SPL – Standard PHP Library
E-BookMastering the SPL LibraryJoshua ThijssenAvailable in PDF, ePub, Mobi
http://www.phparch.com/books/mastering-the-spl-library/
SPL DataStructures
?Questions