Honours Thesis for the degree of Bachelor of Advanced Computing at Australian National University
By Timothy Sergeant, 2014
PHP is a popular dynamic scripting language for webpages, known for ease of development and wide availability. However, little work has been done to optimise PHP's memory management. Two major PHP implementations, PHP5 and HHVM, both use naive reference counting for memory management, which is known to be slow and expensive. However, the semantics of PHP loosely tie the language to naive reference counting.
This thesis argues that high-performance memory management is within reach for PHP.
We perform analysis of the memory characteristics of the HipHop Virtual Machine to determine how it compares to similar virtual machines. We describe and experimentally evaluate the changes required to remove reference counting from HHVM. Finally, we provide a design for a proof-of-concept mark-region tracing collector for HHVM, with a discussion of the issues faced when implementing such a collector.
We find that HHVM has similar memory demographics to PHP5 and Java, and would be well suited to high performance garbage collection algorithms such as Immix. However, we encounter a performance tradeoff associated with removing reference counting, due to the need to weaken the copy-on-write optimisation for arrays. Our proposed alternative, blind copy-on-write, was found to be ineffective for production use, we propose avenues for future work to reduce this tradeoff.
We are confident that these challenges can be overcome to create a high-performance garbage collector for HHVM. While PHP is widely used on small webpages where performance is not critical, it is also used at scale by companies such as Facebook and Wikimedia. In this context, even a small performance gain can have have a significant cost impact. This result has impact beyond just HHVM, as the techniques described in this thesis could be adapted to PHP5 and other PHP runtimes, making a large part of the web faster by improving memory management.