Fast Serialization with Liip Serializer

  • Michelle Sanver

We built a fast serializer in PHP with an overall performance gain of 55% over JMS for our use-case, and it’s awesome. We open sourced it and here it is: Liip Serializer. Let's look more at how it works, and how we made it so much faster!

For Serialization (From PHP objects to JSON, and Deserialization, the other way around), we have been using JMS Serializer for a long time in one of our big Symfony PHP projects, we are still using it for parts of it. We were and still are very happy with the features of JMS Serializer , and would highly recommend it for a majority of use cases.

Some of the functionality we would find difficult to cope without:

  • Different JSON output based on version. So that we can have “this field is here until version 3” etc.
  • Different Serializer groups so we can output different JSON based on whether this is a “detail view” or a “list view”.

The way that JMS Serializer works is that it has “visitors” and a lot of method calls, this in PHP in general cases is fine. But when you have big and complicated JSON documents, it has a huge performance impact. This is a bottleneck in our application we had for years before we built our own solution.

To find the bottleneck blackfire helped us a lot. This is a screenshot from blackfire when we were using JMS serializer, here you can see that we called visitProperty over 60 000 times!!

Our solution removed this and made our application a LOT faster with an overall performance gain of 55%, 390 ms => 175 ms and the CPU and I/O wait both down by ~50%.

Memory gain: 21%, 6.5 MB => 5.15 MB

Let’s look at how we did this!

GOing fast outside of PHP

Having tried a lot of PHP serializer libraries we started giving up, and started to think that it’s simply a bottleneck we have to live with. Then Michael Weibel (Liiper, working in the same team at the time) came with the brilliant idea of using GoLang to solve the problem. And we did. And it was fast!

We were using php-to-go and Liip/sheriff.

How this worked:

  • Use php-to-go to parse the JMS annotations and generate go-structs (basically models, but in go) for all of our PHP models.
  • Use sheriff for serialization.
  • Use goridge to interface with our existing PHP application.

This was A LOT faster than PHP with JMS serializer, and we were very happy with the speed. Integration between PHP and the GO binary was a bit cumbersome however. But looking at this, we thought that it was a bit of an unfair comparison to compare generated go code with the highly dynamic JMS code. We decided to try the approach we did with GO with plain PHP as well. Enter our serializer in PHP.

Generating PHP code to serialize - Liip Serializer

What Liip Serializer does is that it generates code based on PHP models that you specify, parsing the JMS annotations with a parser we built for this purpose.

The generated code uses no objects, and minimal function calls. For our largest model tree, it’s close to 250k lines of code. It is some of the ugliest PHP code I’ve been near in years! Luckily we don’t need to look at it, we just use it.

What it does is that for every version and every group it generates one file for serialization and one for deserialization. Each file contains one single generated function, Serialize or Deserialize.

Then when serializing/deserializing, it uses those generated functions, patching together which filename it should use based on which groups and version we have specified. This way we got rid of all the visitors and method calls that JMS serializer did to handle each of these complex use cases - Enter advanced serialization in PHP, the fast way.

If you use the JMS event system or handlers they won't be supported by the generated code. We managed to handle all our use cases with accessor methods or virtual properties.

One challenge was to make the generated code expose exactly the same behaviour as JMS serializer. Some of the edge cases are neither documented nor explicitly handled in code, like when your models have version annotation and you serialize without a version. We covered all our cases, except for having to do a custom annotation to pick the right property when there are several candidates. (It would have been better design for JMS serializer too, if it would allow for an explicit selection in that case.)

In a majority of cases you will not need to do this, but sometimes when your JSON data starts looking as complicated as ours, you will be very happy there’s an option to go faster.

Feel free to play around! We open sourced our solutions under Liip/Serializer on Github.

These are the developers, besides me, in the Lego team who contributed to this project, with code, architecture decisions and code reviews: David Buchmann, Martin Janser, Emanuele Panzeri, Rae Knowler, Tobias Schultze, and Christian Riesen. Thanks everyone! This was a lot of fun to do with you all, as working in this team always is.

You can read more about the Serializer on the repository on GitHub: Liip/Serializer

And the parser we built to be able to serialize here: Liip/Metadata-Parser

Note: The Serializer and the Parser are Open Sourced as-is. We are definitely missing documentation, and if you have trouble using it, or would like something specific documented, please open an issue on the GitHub issue tracker and we would happily document it better. We are in the process of adding Symfony bundles for the serializer and the parser, and putting the Serializer on packagist, and making it easier to use. Further ideas and contributions are of course always very welcome.

Flash image from: https://www.flickr.com/photos/questlog/16347909278

Qu’en pensez-vous?