php static code review
DESCRIPTION
Even nowadays, PHP code is mostly manually audited. Expert pore over actual code, in search for bugs or code smells. Actually, it is possible to have PHP do this work itself ! Strengthened with the internal Tokenizer, bolstered by the manual, it is able to scan thousands of lines of code, without getting bored, and bringing pragmatic pieces of wisdom: official manual recommendations, version migration, code pruning and security. In the end, it deliver a global overview of the code, without reading it.TRANSCRIPT
PHP Static Code Review
München, Deutschland, October 27th
Definition
• A kind of code analysis where the code is reviewed without running it.
• Just like we would do ourselves!
• Where can it help
Who is speaking?
• Damien Seguy
• CTO at exakat
• Phather of the plush toy elePHPant
• Working on automated code audit
PHP tokenizer
<?php function x($a) { return $a;}x(1, 2);?>
[0] => Array ( [0] => 372 [1] => <?php
[2] => 1 )
[1] => Array ( [0] => 334 [1] => function [2] => 2 )
[2] => Array ( [0] => 375 [1] => [2] => 2 )
[3] => Array ( [0] => 307 [1] => x [2] => 2 )
function token
whitespace token
T_STRING
Total : 30 tokens
Internals
Code
Analyze Report
AST
<?php function x($a) { return $a;}x(1, 2);?>
Found• Dead code
• Undefined structures
• Unused structures
• Illogical exp.
• Slow code
• Bad practices
• Unsafe code
• Maintainability
• Bug issue
• Ancient style
• Uninitialized vars
• Taint propagation
<?phpswitch ($this->consume()){ case "\x09": case "\x0A": case "\x0B": case "\x0B": case "\x0C": case "\x20": case "\x3C": case "\x26": case false: break; case "\x23": switch ($this->consume()) { case "\x78": case "\x58": $range = '0123456789ABCDEFabcdef'; return $a++; break; } }?>
<?php
class x extends y { function array_single_quote($array) { return parent::array_map("single_quote", $array); } }
/* Calling each other */ function debug_dump_backtrace($msg='Calling BackTrace',$die=false) { debug_sysmsg($msg); error($msg,'note',null,$die,true); }
/* Defined in another file */ function debug_sysmsg($msg) { system_message(array(‘title'=>_('Debug'),'body'=>$msg,'type'=>'debug')); debug_dump_backtrace($msg, true); }
?>
protected function openString($end, &$out=null, $nestingOpen, $rejectStrs = null) { $nestingLevel = $count = 0; $content = array(); while ($this->match($patt, $m, false)) { $tok = $m[2]; if ($tok == "@{" && $this->interpolation($inter)) { $content[] = $inter; continue; } if (!empty($rejectStrs) && in_array($tok, $rejectStrs)) { $ount = null; break; } $content[] = $tok; $count += strlen($tok); } $this->eatWhiteDefault = $oldWhite; if (count($content) == 0) return false; $out = array("string", "", $content); return true; }
Spot bugs early
Code Test PreProd Production
Run it at commit Run it as audit
When does it help
• Audit external libraries
• Help port to a new system
• Search for weak code fragments
• Hint at refactoring
Report
• Bugs
• Useless code
• Suggestions
Bugs
<?php
if($content = file_get_contents($file)) { $content = trim($content); $content = substr($content, -2) == ‘>’ ? substr($content, 0, -2) : $content;}?>
Useless code
<?php
// inside a legit class $this->module->xmlRequest;$_G['setting']['debug'];
if (!empty($a) && in_array($tokens, $a)) { false;}
?>
Suggestions<?php // Nested ternary should be turned into if then structures $operation == 'ENCODE' ? sprintf('%010d', $expiry ? $expiry + time( ) : 0) . substr(md5($string . $egiskeys), 0, 16) . $string : base64_decode(substr($string, $key_length)) // Multiply by one is useless SetCache($prefix, $key, $row, 60*60*1);
$xtime *= 1;
// Backward compatible syntax $bits = split('.', $string); $y = $bits[0];
// Available syntax with recent PHP versions $y = split('.', $string)[0];?>
Where it doesn’t help
• Unit tests
• Architecture
• Old traditions that won’t change
• Semantic errors
Architecture
• No framework context
• Conception is done before coding
• Of course!
• Static audit will report standards, not norms
Old traditions<?php $pna = explode(')(', $pn); while (list($k, $v) = each($pna)) { $lst = explode('"', $v); if (isset($lst[3])) { $pn[$lst[1]] = $lst[3]; } else { $pn[$lst[1]] = ''; } }?>
10 % of nowadays applications uses this instead of foreach()
Old traditions
<?php defined('WEB_ROOT') || define('WEB_ROOT', dirname(__FILE_));
// also classic usage fopen($pFilename, 'w') or die("can't open file");
?>
Semantic errors<?php $babycarriage = new carriage(); $wheel1 = new Racingwheel(); $wheel2 = new Racingwheel(); $wheel3 = new Racingwheel(); $wheel4 = new Rhinoceros(); $babycarriage->installWheels($wheel1,
$wheel2, $wheel3, $wheel4);
?>
Undefined classes : Vehicle, Racingwheel, Rhinoceros
Available analyzers• PHP code sniffer
• PHP MD
• Scrutinizer-ci
• Fortify
• insight from Sensio
• Exakat
http://www.slideshare.net/dseguy