How I handle asset serialization in C++

What do I mean by asset serialization?

It’s important to say that by an asset, I’m talking about an object in C++ (and by that I mean a class or a struct) that needs to be written and read from disk during the programs runtime.

In many kinds of software and games applications – it’s important to have a data-driven layer, even for the most basic things like user settings, but also for more complex data like actors and their components, animations, story choices – all sorts of important objects.

There are many ways to serialize code, including the most common methods:

  1. Straight up writing your object into the file in a binary format. (Yes, this works across both 32 bit and 64 bit systems with .NET – a common misconception).
  2. By hand using XML/JSON or another markup language.
  3. Using reflection to view all of our objects members, those members types and then proceeding to write them into a binary or text/markup format into the file.

My solution is closest to #2, but I’ll talk a bit about why I’m not huge on the other methods.

Aims of my serialization methods

  1. Simple syntax to read and write the object to a file.
  2. Simple syntax for the actual reading/writing of the object.
  3. Optional decoupling of the objects type and the object serializer.
  4. Ability to version our object models. This is largely the biggest reason I don’t like automatic serialization.

Choosing a file format / markup language

This wasn’t hard for me. It might be harder for you, as you might be using a custom markup language or a scripting language like Lua. For me, I chose JSON. It has high compatibility for many platforms, the syntax is nice and easy to read, even for non-programmers.

Object serialization and versioning

The biggest issue I have with methods #1 and #3 is that they have no versioning capability for our data structure. What I mean by this, is that during the development or after release of the product – somebody might change the internals of an important data structure that needs to be serialized. For example, consider our simple data structure below:

In this basic configuration data structure, we keep a track on some volume options and an integer describing our current graphical level. If we serialized this data structure using JSON, it looks like this:

Okay, cool. But the data structure above is very simple, and now we need to introduce some more complex options. This is where versioning is so important. Here’s an example of a more complicated data structure:

Here we’ve split our options into separate objects. This makes for a cleaner implementation.. but, if we were using a binary formatter (or a reflection library) for reading the properties from the last JSON example into our object – it would likely now throw an error, as the properties wouldn’t line up.

Losing options might not be the biggest deal (as users can always set them back easily enough), but it’s a frustration that doesn’t need to be had – and there might be more serious things that require versioning then just options, such as game saves.. something you really don’t want the player to lose.

On a side note, here is how our new object would look in JSON:

Notice the “version” property that has turned up. This is the id for our parser – so we know what version of the data structure the file represents.

Decoupling the objects type and the serializer

Warning: Involves templates.

First of all, let’s talk about the structure of the object serializer. We have two static methods, the standard read/write methods that both take a buffer (JSON in this case) parameter and a reference to the object that we’re trying to read from the file. For me, that looks something like this:

This isn’t as important, but in case you were wondering, here is how that .cpp file would look (warning: this was written in notepad on a train and may not compile):

Now what we’re going to do is specify inside of our config class that we now have a custom serializer (our ‘ConfigSerializer’). We do this using typedef by setting our serializers type as the __SERIALIZER alias. Like so:

Essentially what we’re doing here is creating a shortcut for our easy read/write syntax (goal #1).

So now we now have a typedef that can act as a template argument! Now I’ll show you what the serialization interface looks like:

Little note: The JsonBuffer is an interface that derives from ‘Buffer’, a set of methods that allow me to read/write to multiple markup languages without specifying what language we are using. 

This class takes a template argument for the kind of asset we want to read or write – and then a second (optional) template argument that specifies the serializer that we want to use to read/write the asset. If the optional Serializer template argument isn’t specified, it will assume you want to use the __SERIALIZER typedef that exists inside of the Asset type, as we defined before. This fulfills goal #3, the optional decoupling of the serializer and object.

So now we can serialize and deserialize our Config asset, with simple syntax like this:

If we haven’t specified a custom serializer using our typedef __SERIALIZER trick, then we can still specify the serializer using our optional template parameter, like so:

I like the simplicity in this syntax, and I like that I can serialize and deserialize everything from a single point in code.

It’s also worth noting, that if you don’t like the idea of separating your object serialization code and your object – you can actually use the typedef technique for the assets own type and provide the static read/write methods inside of the assets own data structure. Personally, I find that manually writing this code down can become quite a mess, especially over different asset versions.. so I like to split the serializers into their own objects.

Pros, Cons & Conclusion

Pros:

  1. Full control over how the asset is written and read.
  2. Ability to easily implement a versioning system for different asset structures.
  3. Clean syntax to serialize and deserialize an asset.
  4. Follows the rules for single responsibility objects (something I believe in greatly)

Cons:

  1. Manually writing an object is time consuming, especially if the objects structure is changing a lot.
  2. Can hog a lot of file space – if you’re the kind of programmer who hates having lots of different classes and files.

Other considerations:

  1. Asset tampering. Anything client-side can be tampered with. Data validation is important, it’s something else that manual control can give, but you need preventatives and some way to handle the misread if it happens. (Something like a try-catch is fine, but this can get very messy all over the place and every situation may have unique repercussions.)
  2. Nested objects. For example – if you’re writing a game engine, you need to write a scene to a file, that includes all of your actors, scripts, processes and more. Some of these objects will also need to keep their version number and be parsed as if they were individual objects. We can easily do that with our method by just referencing that assets serializer directly or parsing it through the central asset serializer.

Conclusion:

It’s a solid method for serializing and versioning our assets. I like the syntax.. but it’s damn time consuming compared to the automated ways of writing our objects. Some assets may not need to be versioned either. The beauty in this method is that you could easily implement an automatic serializer as a layer ontop of the manual method. Something like this would spring to mind:

 

That’s all folks! I hope you enjoyed this post. If you have any comments, queries or feedback then please leave a comment down below. I’ll do my best to answer any questions you might have. If you have a preferred way of serializing, then let me know – pointing me to some resources could help a great deal.

1 Comment

  1. You may also want to have a look at the JSON library I built: It allows to serialize/deserialize arbitrary types to/from JSON, see https://github.com/nlohmann/json#arbitrary-types-conversions

Leave a Comment

Your email address will not be published. Required fields are marked *

*

*