Download - HipHop Virtual Machine
HipHop Virtual Machine by Radu Murzea
AgendaIntroduction
What is HipHop VM ?History and why it exists
Architecture and FeaturesGeneral ArchitectureCode cacheJITGarbage CollectorAdminServerFastCGIExtensionsHHVM-friendly PHP codeParity
What is HipHop VM ?
High-Level Stack-Based virtual machine that executes PHP code
Created by Facebook in a (successful) attempt to reduce load on their servers
New versions are released every 8 weeks on Thursday. 10 days before a release, the branch is cut and heavily tested.
History of HHVM (I)
Summer 2007: Facebook started developing HPHPc, an PHP to C++ translator.
It worked by:Building an AST based on the PHP codeBased on that AST, equivalent C++ code was
generatedThe C++ code was compiled to binary using g++The binary was uploaded to the webservers where
it was executedThis resulted in significant performance
improvements, up to 500% in some cases compared to PHP 5.2
History of HHVM (II)
The succes of HPHPc was so great, that the engineers decided to give it a developer-friendly brother: HPHPi
HPHPi was just like HPHPc but it ran in interpreted mode only (a.k.a. much slower)
However, it provided a lot of utilities for developers:Debugger (known as HPHPd)Setting watches, breakpointsStatic code analysisPerformance profiling
It also didn’t require the compilation step to run the codeHPHPc ran over 90 % of FB production code by the end
of 2009HPHPc was open-sourced on February 2010
History of HHVM (III)
But good performance came at a cost:Static compilation was very cumbersomeThe binary had 1 GB which was a problem since production code
had to be pushed to the servers DAILYMaintaining compatibility between HPHPc and HPHPi was
getting more and more difficult (they used different formats for their ASTs)
So, at the beginning of 2010, FB started developing HHVM, which was a better, longer-term solution
At first, HHVM replaced only HPHPi, while HPHPc remained in production
But now, all FBs production servers are run by HHVMFB claims a 3x to 10x speed boost and 0.5x – 5x memory
reduction compared to PHP + APC. This, of course, is on their own code, most applications will have a more modest improvement
General Architecture (I)
General architecture is made up of:2 webserversA translatorA JIT compilerA Garbage Collector
HHVM doesn’t support any OS:It supports most flavours of LinuxIt has some support for Mac OS X (only runs with JIT
turned off)There is no Windows supportThe OS must have a 64-bit architecture in order for HHVM
to work
General Architecture (II)
The HHVM will follow the following steps to execute a PHP script:Based on PHP code, build an AST (implementation for
this was reused from HPHPc)Based on the AST, build Hip Hop Bytecode (HHBC),
similar to Java’s or CLR’s bytecodeCache the HHBCAt runtime, pass the HHBC through the JIT compliler
(if enabled) which will transform it to machine codeExecute the machine code or, if JIT is disabled,
execute the HHBC in interpreted mode (not as fast, but still faster than Zend PHP)
Code Cache (I)
When request comes in, HHVM determines which file to serve up, then checks if the file’s HHBC is in SQLite-based cacheIf yes, it’s executedIf no, HHVM compiles it, optimizes it and stores it
in cacheThis is very similar to APCThere’s a warm-up period when new server is
created, because cache is emptyHowever, HHVM’s cache lives on disk, so it
survives server restarts and there will be no more warm-up periods for that file
Code Cache (II)
But warm-up period can be bypassed by doing pre-analysis
Pre-analysis means the cache can be generated before HHVM starts-up
Pre-analyser will actually work a little harder and will do a better job at optimizing code
Code Cache (III)
There is a mode called RepoAuthoritative modeHHVM will check at each request if the PHP file
changed in order to know if cache must be updated
RepoAuthoritative mode means this check is not performed anymore.
But be careful because, if the file is not in cache, you’ll get a HTTP 404 error, even though the PHP file is right there
RepoAuthoritative is recommended for production because it avoides a lot of disk IO and files change rarely anyway
JIT Compiler
Just-in-Time compilation is done during execution, not before
It translates an intermediate form of code (in this case HHBC) to machine code
A JIT compiler will constantly check to see which paths of code are executed more frequently and try to optimize those as best as possible
Since a JIT compiler will compile to machine code at runtime, the resulting machine code will be optimized for that platform or CPU, which will sometimes make it faster than even static compilation
JIT Compiler (II)
HHVM uses so called tracelets as basic unit block of JITA tracelet is usually a loop because most programs
spend most of their time in some “hot loops” and subsequent iterations of those loops take similar paths
A tracelet has 3 parts:Type guard(s): prevents execution for incompatible typesBodyLink to subsequent tracelet(s)
Each tracelet has great freedom, but it is required to restore the VM to a consistent state any time execution escapes
Tracelets have only ONE execution path, which means no control flow, which they’re easy to optimize
Garbage Collector
Most modern languages have automatic memory management
In the case of VMs, this is called Garbage CollectorThere are 2 major types of GCs:
Refcounting: for each object, there is a count that constantly keeps track of how many references point to it
Tracing: periodically, during execution, the GC scans each object and determines if it’s reachable. If not, it deletes it
Tracing is easier to implement and more efficient, but PHP requires refcounting, so HHVM uses refcounting
FB engineers want to move to a tracing approach and they might get it done someday
AdminServer
HHVM will actually start 2 webservers: Regular one on port 80AdminServer on the port you specify
It can be accessed at an URI like http://localhost:9191/check-health?auth=mypasshaha
The AdminServer can turn JIT on/off, show statistics about traffic, queries, memcache, CPU load, number of active threads and many more
FastCGI
HHVM supports FastCGI starting with version 2.3.0 (released in December 2013)
FastCGI is a communication protocol used by webservers to communicate with other applications
The support for FastCGI means we don’t have to use HHVM’s poor webserver, but instead use something like Apache or nginx and let HHVM do what it does best: execute PHP code at lightning speed
Supporting FastCGI will make HHVM enter even more production systems and increase its popularity
Extensions
HHVM supports extensions just like PHP doesThey can be written in PHP, C++ or a combination of
the 2Extensions will be loaded at each request, you don’t
have to keep loading an extension all over your applications
To use custom extensions, you add it to the extensions and then recompile HHVM. The resulting binary will contain your extension and you can then use it
By default, HHVM already contains the most popular extensions, like MySQL, PDO, DOM, cURL, PHAR, SimpleXML, JSON, memcache and many others
Though, it doesn’t include MySQLi at this time
HHVM-friendly Code (I)
Write code that HHVM can understand without running, code that contains as much static detail as possible
Avoid things like:Dynamic function call: $function_name()Dynamic variable name: $a = $$x + 1;Functions like compact(), get_defined_vars(), extract() etcDon't access dynamic properties of an object. If you want
to access it, declare it. Accessing dynamic properties must use hashtable lookups, which are much slower.
Where possible, provide:Type hinting in function parametersReturn type of functions should be as obvious as possible:
return ($x == 4); or like: return ($boolVar ? 1 : -1);
HHVM-friendly Code (II)
Code that runs in global scope is never JIT-ed.Any code anywhere can mutate the variables in the global
scope. So, since PHP is weak-typed, it makes it impossible for the JIT compiler to predict a variable’s type
Example:class B { public function __toString() { $GLOBALS['a'] = 'Hello, world !'; }}$a = 5;$b = new B;echo $b;
Parity (I)
All this is great, but can HHVM actually run real-world code ? Well, in December 2013, it looked like this (taken from HHVM blog):
Parity (II)
HHVM’s engineers main goal is to be able to run all PHP frameworks by Q4 2014 or Q1 2015.
Q & A