A Simple Data Serialization System for C++

It’s been a while since I last blogged, as I’ve been quite busy working on Drifter; both the game and the physical rewards for Kickstarter. Many of which, I might add, have already been produced and sent off to my backers. Also a close family member was diagnosed with cancer over the summer and that threw me for a bit of a loop so things have been a bit out of sorts, however while they’re not out of the woods yet things are looking very promising. So in an attempt to return to some level of normalcy I’m going to try and get back to my “hectic” one-blog-a-month-or-so schedule.

Note: I am releasing the code in this blog post into the public domain, you may use it as you see fit and you don’t even have to credit me but it’d be awfully nice if you did 🙂

A word of caution: the code does work but is not quite complete because it will only work on Little-Endian platforms with data types of the indicated sizes. It should be relatively straightforward to add this should it be necessary.

Today I’m going to talk about the method by which I solved the problem of “serializing data” which is a fancy and much shorter way of saying “saving the state of a program in such a way as to allow it to be returned to that state later”. Basically, a game like Drifter needs to be able to save the game’s state to disk so a player can re-load their game and continue playing at a later date.

Now, some programming languages have built-in ways of taking an object and serializing it, unfortunately for me C++, the language predominantly used for Drifter, does not. There are some third-party solutions such as Boost, but Boost is a relatively heavy-weight framework that is a bit overkill for what I’m trying to do here. So I decided to create a lightweight but flexible pair of classes that kinda work like a “smart” version of fwrite to write data to disk. Basically what I’m trying to achieve is a way of quickly writing and reading blocks of data without having to worry about how that data is stored on-disk when it’s done and in such a way that’s flexible enough to allow changes to be made to the underlying data structures without completely breaking old save files.

Basically this works with two different C++ objects, the first of which is called ObjectBlob. ObjectBlobs are containers of “blobs” of basic data types (char, bool, int, float, double, etc.). This is handled in a relatively smart way, where each blob has a named handle and “knows” how much data and what type of data it holds. As well, while not implemented currently, the object has a pass-through function that is intended to “massage” data into a consistent format for disk storage to allow for cross-platform considerations like data type size differences and endianness.

ObjectBlobs then have two functions, one of which exports the contents of the blob as a character array and the other which will take a previously exported character array and loads it back into the blob.

The other object is ObjectBlobList, which is basically a front-end for what amounts to a save file. It contains a list of ObjectBlobs which have their own named handles and can save them out to disk, or once saved to disk read them back.

The best way to show how this works is with an example, I feel. So first, let’s create a simple object with some data in it so I can show how ObjectBlobs and ObjectBlobLists work together to save and load data.

class BunchOfData {
protected:
	int numberOfIntegers;
	int *integerArray;
	int numberOfFloats;
	float *floatArray;
	char *textString;
	bool truth;
public:
	BunchOfData();
	virtual ~BunchOfData();
	void save(ObjectBlob *state);
	void load(ObjectBlob *state);
};

The object is just a number of different data types and arrays of data to show the difference in storing single data types vs. arrays of data in an ObjectBlob.

BunchOfData::BunchOfData()
{
	numberOfIntegers = 100;
	integerArray = (int *)malloc(sizeof(int)*numberOfIntegers);
	for(int i=0;i<numberOfIntegers;i++)
	{
		integerArray[i] = i;
	}
	numberOfFloats = 100;
	floatArray = (float *)malloc(sizeof(float)*numberOfFloats);
	for(int f=0;f<numberOfFloats;f++)
	{
		floatArray[f] = (float)f/3.0f;
	}
	textString = strdup("Test string.");
	truth = false;
}

The destructor, because I’m being a completest.

BunchOfData::~BunchOfData()
{
	free(integerArray);
	free(floatArray);
	free(textString);
}

The function that does the heavy lifting for saving data is addData. You can kind of think of it as a smarter version of fwrite. It has the following format:

void addData(char *name, void *data_in, int type, int length);

name is a string that identifies the data.
data_in is a pointer to the data in question.
type is one of the pre-defined data types that ObjectBlob understands. You can add more, including custom data types.
length is the number of data of type that is stored at data_in. For single-value types, merely use 1.

The way I organized objects that need to be serialized in Drifter was to create both save and load functions that are passed an ObjectBlob that will alternately be populated with or contain the state of that particular object.

void BunchOfData::save(ObjectBlob *state)
{
	state->addData("numberOfIntegers",&numberOfIntegers,D_INT,1);//this is technically un-necessary because "integerArray" knows how many integers there are
	state->addData("integerArray",integerArray,D_INT,numberOfIntegers);
	state->addData("numberOfFloats",&numberOfFloats,D_INT,1);
	state->addData("floatArray",floatArray,D_FLOAT,numberOfFloats);
	state->addData("textString",textString,D_CHAR,strlen(textString)+1);
	state->addData("truth",&truth,D_BOOL,1);
}

To get data back out of the ObjectBlob, you call getData which has the following format:

 void getData(char *name, void *data_out);

Where name is the name of the data you want to retrieve, and data_out is a pointer to the location where you want to put the stored data in question. You must make sure that data_out is properly allocated before calling getData.

void BunchOfData::load(ObjectBlob *state)
{
	//If your arrays are already the correct size, you can just call getData to populate them
	state->getData("numberOfIntegers",&numberOfIntegers);
	integerArray = (int *)realloc(integerArray,sizeof(int)*numberOfIntegers);
	state->getData("integerArray",integerArray);
	state->getData("numberOfFloats",&numberOfFloats);
	floatArray = (float *)realloc(floatArray,sizeof(float)*numberOfFloats);
	state->getData("floatArray",floatArray);
	textString = (char *)realloc(textString,state->getDataLength("textString");
	state->getData("textString",textString);
}

And to tie it all together, here is how you’d create and save this object to disk:

BunchOfData *object1 = new BunchOfData;
BunchOfData *object2 = new BunchOfData;
 
ObjectBlobList *saveData = new ObjectBlobList;
 
object1->save(saveData->addBlob("object1"));
object2->save(saveData->addBlob("object2"));
 
saveData->save("savefile.sav");
 
delete saveData;
delete object1;
delete object2;

And here is how you’d then go about loading that data from disk:

BunchOfData *object1 = new BunchOfData;
BunchOfData *object2 = new BunchOfData;
 
ObjectBlobList *loadData = new ObjectBlobList;
loadData->load("savefile.sav");
 
object1->load(loadData->getBlob("object1"));
object2->load(loadData->getBlob("object2"));
 
delete saveData;

Pretty simple! Or I think so anyway 😉

I apologize for any sins I may have committed while creating this code, but it works, is clean and cross-platform and I hope reasonably easy to understand. Also if I weren’t a horrible human being I’d probably have used const a bunch more in these function definitions, but there you go.

ObjectBlob.h:

#ifndef OBJECTBLOB_H
#define OBJECTBLOB_H
 
/*
bool,char,int,float,double
*/
 
const static int D_BOOL = 0;
const static int D_CHAR = 1;
const static int D_INT = 2;
const static int D_FLOAT = 3;
const static int D_DOUBLE = 4;
 
class ObjectBlob {
protected:
    int numData;
    char **dataName;
    int *dataType;
    int *dataSize;
    int *dataLength;
    char **data;
public:
    ObjectBlob();
    virtual ~ObjectBlob();
    void addData(char *name, void *data_in, int type, int length);
    int getDataSize(char *name);
    int getDataType(char *name);
    int getDataLength(char *name);
    int getDataIdx(char *name);
    void *getData(char *name);
    void getData(char *name, void *data_out);
    int getTypeSize(int type);
    void massageData(int type, void *data_in, int length, int size);
    char *saveBlob(int *size);
    void loadBlob(char *blob);
};
 
#endif

ObjectBlob.cpp:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
 
#include "ObjectBlob.h"
 
ObjectBlob::ObjectBlob()
{
    numData = 0;
    dataName = NULL;
    dataType = NULL;
    dataSize = NULL;
    dataLength = NULL;
    data = NULL;
}
ObjectBlob::~ObjectBlob()
{
    for(int d=0;d<numData;d++)
    {
        free(dataName[d]);
        free(data[d]);
    }
    free(dataName);
    free(data);
    free(dataType);
    free(dataSize);
    free(dataLength);
}
/*
addData takes a named handle, a pointer to data, a known type (as defined in
ObjectBlob.h) and the length or amount of data located at that pointer and
copies it into the blob's storage in a consistent format.
*/
void ObjectBlob::addData(char *name, void *data_in, int type, int length)
{
    numData++;
    dataName = (char **)realloc(dataName, sizeof(char *)*numData);
    dataName[numData-1] = strdup(name);
    dataType = (int *)realloc(dataType, sizeof(int)*numData);
    dataType[numData-1] = type;
    dataSize = (int *)realloc(dataSize, sizeof(int)*numData);
    dataSize[numData-1] = getTypeSize(type);
    dataLength = (int *)realloc(dataLength, sizeof(int)*numData);
    dataLength[numData-1] = length;
    data = (char **)realloc(data, sizeof(char *)*numData);
    data[numData-1] = (char *)malloc(sizeof(char)*dataSize[numData-1]*length);
    massageData(type, data_in, length, dataSize[numData-1]);
}
/*
getDataSize returns the size of the datatype stored at the named handle
*/
int ObjectBlob::getDataSize(char *name)
{
    int idx = getDataIdx(name);
    if(idx >= 0 && idx < numData)
    {
        return dataSize[idx];
    }
    return -1;
}
/*
getDataType returns the type of data stored at the named handle
*/
int ObjectBlob::getDataType(char *name)
{
    int idx = getDataIdx(name);
    if(idx >= 0 && idx < numData)
    {
        return dataType[idx];
    }
    return -1;    
}
/*
getDataLength returns the number of data stored at the named handle
*/
int ObjectBlob::getDataLength(char *name)
{
    int idx = getDataIdx(name);
    if(idx >= 0 && idx < numData)
    {
        return dataLength[idx];
    }
    return -1;
}
/*
getDataIdx returns the internal index of the stored data at the named
handle
*/
int ObjectBlob::getDataIdx(char *name)
{
    for(int d=0;d<numData;d++)
    {
        if(strcmp(dataName[d], name) == 0)
        {
            return d;
        }
    }
    printf("Object Blob Error: Couldn't find data handle '%s'.\n",name);
    return -1;
}
/*
getData(char *name) returns a pointer to the raw data identified at
the named handle
*/
void *ObjectBlob::getData(char *name)
{
    int idx = getDataIdx(name);
    if(idx >= 0 && idx < numData)
    {
        return data[idx];
    }
    return NULL;
}
/*
getData(char *name, void *data_out) loads the stored data identified by
the named handle into the area pointed to by data_out.
 
Ideally this should massage the data into the expected platform format
before copying the stored data into data_out.
*/
void ObjectBlob::getData(char *name, void *data_out)
{
    int idx = getDataIdx(name);
    if(idx >= 0 && idx < numData)
    {
        memcpy(data_out, data[idx], dataSize[idx]*dataLength[idx]);
    }
}
/*
getTypeSize returns the internal storage size of the indicated data type
*/
int ObjectBlob::getTypeSize(int type)
{
    //These are hard-coded in the event I decide to use platforms with different
    //fundamental data type sizes, the save files will be consistent with this
    //to allow interchange of save data
    switch (type) {
        case D_BOOL:
            return 1;
            break;
        case D_CHAR:
            return 1;
            break;
        case D_INT:
            return 4;
            break;
        case D_FLOAT:
            return 4;
            break;
        case D_DOUBLE:
            return 8;
            break;
    }
    return 0;
}
/*
massageData is a wrapper that allows ObjectBlob to be truly cross
platform, currently it is just a simple pass-through but the idea
is this can deal with data size mismatches and endian differences
across platform types.
*/
void ObjectBlob::massageData(int type, void *data_in, int length, int size)
{
    //passthrough for LE (ie x86, iOS)
    //stored data is Little Endian
    memcpy(data[numData-1], data_in, length*size);
}
/*
saveBlob creates a character array that contains the data currently
stored in the ObjectBlob, the integer pointed to by size will be
filled with the size of the array
*/
char *ObjectBlob::saveBlob(int *size)
{
    int blobSize = 0;
    int blobOffset = 0;
    int intTemp;
    char *blobTemp = (char *)malloc(sizeof(int));
    //save number of data blocks in the blob
    memcpy(blobTemp, &numData, sizeof(int));
    blobOffset += sizeof(int);
    blobSize += sizeof(int);
    //save blocks
    //int - name length
    //char * - name
    //int - type
    //int - size
    //int - length
    //char (size*length) - data
    for(int d=0;d<numData;d++)
    {
        blobSize += sizeof(int)*4+strlen(dataName[d])+1+dataLength[d]*dataSize[d];
        blobTemp = (char *)realloc(blobTemp, blobSize);
        intTemp = strlen(dataName[d])+1;
        memcpy(&blobTemp[blobOffset], &intTemp, sizeof(int));
        blobOffset += sizeof(int);
        memcpy(&blobTemp[blobOffset], dataName[d], intTemp);
        blobOffset += intTemp;
        memcpy(&blobTemp[blobOffset], &dataType[d], sizeof(int));
        blobOffset += sizeof(int);
        memcpy(&blobTemp[blobOffset], &dataSize[d], sizeof(int));
        blobOffset += sizeof(int);
        memcpy(&blobTemp[blobOffset], &dataLength[d], sizeof(int));
        blobOffset += sizeof(int);
        memcpy(&blobTemp[blobOffset], data[d], dataLength[d]*dataSize[d]);
        blobOffset += dataLength[d]*dataSize[d];
    }
    *size = blobSize;
    return blobTemp;
}
/*
loadBlob will take a character array that was previously created by
saveBlob and fill the ObjectBlob with its contents.
*/
void ObjectBlob::loadBlob(char *blob)
{
    int numBlocks = 0;
    int blobOffset = 0;
    memcpy(&numBlocks, &blob[blobOffset], sizeof(int));
    blobOffset += sizeof(int);
    int nameLength;
    char *nameTemp;
    int typeTemp;
    int sizeTemp;
    int lengthTemp;
    char *dataTemp;
    for(int d=0;d<numBlocks;d++)
    {
        memcpy(&nameLength, &blob[blobOffset], sizeof(int));
        blobOffset+=sizeof(int);
        nameTemp = &blob[blobOffset];
        blobOffset += nameLength;
        memcpy(&typeTemp, &blob[blobOffset], sizeof(int));
        blobOffset += sizeof(int);
        memcpy(&sizeTemp, &blob[blobOffset], sizeof(int));
        blobOffset += sizeof(int);
        memcpy(&lengthTemp, &blob[blobOffset], sizeof(int));
        blobOffset += sizeof(int);
        dataTemp = &blob[blobOffset];
        addData(nameTemp, dataTemp, typeTemp, lengthTemp);
        blobOffset += lengthTemp*sizeTemp;
    }
}

ObjectBlobList.h:

#ifndef OBJECTBLOBLIST_H
#define OBJECTBLOBLIST_H
 
#include "ObjectBlob.h"
 
class ObjectBlobList {
protected:
    int numBlobs;
    char **blobName;
    ObjectBlob **blobs;
public:
    ObjectBlobList();
    virtual ~ObjectBlobList();
    int getNumBlobs();
    ObjectBlob *addBlob(char *name);
    ObjectBlob *getBlob(int idx);
    ObjectBlob *getBlob(char *name);
    int getBlobIdx(char *name);
    bool save(char *filename);
    bool load(char *filename);
};
 
#endif

ObjectBlobList.cpp:

#include "ObjectBlobList.h"
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
 
ObjectBlobList::ObjectBlobList()
{
    numBlobs = 0;
    blobName = NULL;
    blobs = NULL;
}
ObjectBlobList::~ObjectBlobList()
{
    for(int b=0;b<numBlobs;b++)
    {
        free(blobName[b]);
        delete blobs[b];
    }
    free(blobName);
    free(blobs);
}
int ObjectBlobList::getNumBlobs()
{
    return numBlobs;
}
/*
addBlob will create a new ObjectBlob identified by name and return it
*/
ObjectBlob *ObjectBlobList::addBlob(char *name)
{
    if(getBlobIdx(name) >= 0)return NULL;
    printf("Adding Blob: %s\n",name);
    numBlobs++;
    blobName = (char **)realloc(blobName, sizeof(char *)*numBlobs);
    blobName[numBlobs-1] = strdup(name);
    blobs = (ObjectBlob **)realloc(blobs, sizeof(ObjectBlob *)*numBlobs);
    blobs[numBlobs-1] = new ObjectBlob;
    return blobs[numBlobs-1];
}
/*
getBlob(int idx) returns the ObjectBlob located at idx
*/
ObjectBlob *ObjectBlobList::getBlob(int idx)
{
    if(idx >= 0 && idx < numBlobs)
    {
        return blobs[idx];
    }
    return NULL;
}
/*
getBlob(char *name) returns the named ObjectBlob
*/
ObjectBlob *ObjectBlobList::getBlob(char *name)
{
    for(int b=0;b<numBlobs;b++)
    {
        if(strcmp(blobName[b],name) == 0)
        {
            return blobs[b];
        }
    }
    printf("Object Blob List Error: Couldn't find data handle '%s'.\n",name);
    return NULL;
}
/*
getBlobIdx get the internal index of the named ObjectBlob
*/
int ObjectBlobList::getBlobIdx(char *name)
{
    for(int b=0;b<numBlobs;b++)
    {
        if(strcmp(blobName[b],name) == 0)
        {
            return b;
        }
    }
    printf("Object Blob List Error: Couldn't find data handle '%s'.\n",name);
    return -1;
}
/*
save writes all of the contained ObjectBlobs to the file identified by filename
*/
bool ObjectBlobList::save(char *filename)
{
    FILE *outfile = fopen(filename, "wb");
    if(outfile == NULL)return false;
    printf("Writing out %i blobs.\n",numBlobs);
    fwrite(&numBlobs, sizeof(int), 1, outfile);
    int blobNameLength = 0;
    int blobSize = 0;
    char *blobData;
    for(int b=0;b<numBlobs;b++)
    {
        blobNameLength = strlen(blobName[b])+1;
        fwrite(&blobNameLength, sizeof(int), 1, outfile);
        fwrite(blobName[b], 1, blobNameLength, outfile);
        blobData = blobs[b]->saveBlob(&blobSize);
        fwrite(&blobSize, sizeof(int), 1, outfile);
        fwrite(blobData, 1, blobSize, outfile);
        free(blobData);
    }
    fclose(outfile);
    return true;
}
/*
load reads the ObjectBlobList file identified by filename
*/
bool ObjectBlobList::load(char *filename)
{
    FILE *infile = fopen(filename, "rb");
    if(infile == NULL)return false;
    int numBlobsTemp = 0;
    fread(&numBlobsTemp, sizeof(int), 1, infile);
    printf("Reading in %i blobs.\n",numBlobsTemp);
    int blobNameLength = 0;
    int blobSize = 0;
    char *blobNameTemp = (char *)malloc(1);
    blobNameTemp[0] = 0;
    ObjectBlob *newBlob;
    for(int b=0;b<numBlobsTemp;b++)
    {
        fread(&blobNameLength, sizeof(int), 1, infile);
        printf("Reading in blob name of length %i.\n",blobNameLength);
        blobNameTemp = (char *)realloc(blobNameTemp, blobNameLength);
        fread(blobNameTemp, 1, blobNameLength, infile);
        printf("Blob name temp = %s\n",blobNameTemp);
        newBlob = addBlob(blobNameTemp);
        if(newBlob == NULL)
        {
            fclose(infile);
            free(blobNameTemp);
            return false;
        }
        fread(&blobSize, sizeof(int), 1, infile);
        char *blobData = (char *)malloc(blobSize);
        fread(blobData, 1, blobSize, infile);
        newBlob->loadBlob(blobData);
        free(blobData);
    }
    free(blobNameTemp);
    return true;
}

2 thoughts on “A Simple Data Serialization System for C++

  1. Robert Basler

    I built a system like this, here are a couple code-reviewey comments from what I learned using mine. I hope you’ll take these as the constructive comments they are intended.

    For addData, I would get rid of the type parameter and instead have several type-specific implementations which set the type internally. So you would have for example:

    void addData(char *name, float *data_in, int length = 1);
    void addData(char *name, bool *data_in, int length = 1);
    void addData(char *name, int *data_in, int length = 1);

    This way, if you change the data type of a member in a class, you don’t have to worry about making a mistake with updating your calls to addData, the compiler will take care of making sure the type is correct for you. I’d also put a default length parameter since most of the time it is going to be one anyway.

    The other thing I’d do is change your D_* constants to be an enum so that the compiler will manage them for you automagically.

    Also, since your field name strings look to be the variable names, you might consider making a macro with the “stringize” operator to save typing and get the parameter count down to just one.

    ADDDATA(truth);

    One more comment, getTypeSize should use sizeof. I see what you’re trying to do, however if you have different type sizes on different platforms, you’re going to end up transferring garbage (for example if your library thinks a float is 4 when it is actually 8, all the float data will end up corrupted.)

  2. Colin Post author

    Hi Robert,

    Thanks for the comment! I appreciate the pointers. I honestly had no idea there was a stringize preprocessor macro and at the very least that will make this a whole lot cleaner and easier to use. I should have thought to look for it though, hah.

    The idea with the size filter is that it returns the size ObjectBlob’s internal representation of the data type, so ObjectBlobs will always store a float as 4 bytes for example. So if the platform has an 8 byte float, it will load the internal 4 byte float and massage it into the native representation.

Comments are closed.