Using SQLite for game data storage

Welcome back for another technical adventure!

Game data storage is hard. AAA games typically use proprietary binary formats, often compressed, whereas smaller productions often settle for the likes of text-based formats like XML or JSON, as far as levels are concerned.

But levels in Jaakan are a bit peculiar. They’re not really sequential, or even independent entities: they all exist in the same world, and are connected by bidirectional links, with a certain set of constraints that allow seamless scrolling.

So far, we’ve been storing rooms as json file in a directory, one for each room, with a unique increasing numeric identifier. To keep track of links between rooms, and the last identifier (to be able to generate the next one), there is a universe.json file. Even with hundreds of rooms, this allows the world editor to display a map without having to load all the individual room files, which can be expensive because of edge computation, render baking, etc.

But that way of working sounds awfully familiar: many different instances of a piece of data with the same schema, addressed by unique identifiers, which we want to retrieve by precise groups of up to 5… it sounds a real lot like a database! And that’s why I think it’s a good idea to store that kind of data in a real database.

SQLite

SQLite is quite an amazing piece of software. I’m going to copy their description word from word from their homepage, because it’s quite good:

SQLite is an in-process library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine. The code for SQLite is in the public domain and is thus free for use for any purpose, commercial or private. SQLite is currently found in more applications than we can count, including several high-profile projects.

So far so good! As a developer, I’m usually looking for BSD-licensed libraries, LGPL at the worst, Public Domain is a bit of a grey area in some countries but since I’m based in France, no worries on this front.

SQLite is an embedded SQL database engine. Unlike most other SQL databases, SQLite does not have a separate server process. SQLite reads and writes directly to ordinary disk files. A complete SQL database with multiple tables, indices, triggers, and views, is contained in a single disk file. The database file format is cross-platform - you can freely copy a database between 32-bit and 64-bit systems or between big-endian and little-endian architectures. These features make SQLite a popular choice as an Application File Format. Think of SQLite not as a replacement for Oracle but as a replacement for fopen()

Endianness and bit width independence is a very nice feature: being able to copy around data and save files seamlessly across systems is definitely a must-have for a game that must run on major OSes and recent-to-slightly-old hardware.

SQLite is a compact library. With all features enabled, the library size can be less than 500KiB, depending on the target platform and compiler optimization settings. (64-bit code is larger. And some compiler optimizations such as aggressive function inlining and loop unrolling can cause the object code to be much larger.) If optional features are omitted, the size of the SQLite library can be reduced below 300KiB. SQLite can also be made to run in minimal stack space (4KiB) and very little heap (100KiB), making SQLite a popular database engine choice on memory constrained gadgets such as cellphones, PDAs, and MP3 players. There is a tradeoff between memory usage and speed. SQLite generally runs faster the more memory you give it. Nevertheless, performance is usually quite good even in low-memory environments.

This is good to know. We can get first-class relational database functionality in our game without making it too heavy.

SQLite is very carefully tested prior to every release and has a reputation for being very reliable. Most of the SQLite source code is devoted purely to testing and verification. An automated test suite runs millions and millions of test cases involving hundreds of millions of individual SQL statements and achieves 100% branch test coverage. SQLite responds gracefully to memory allocation failures and disk I/O errors. Transactions are ACID even if interrupted by system crashes or power failures. All of this is verified by the automated tests using special test harnesses which simulate system failures. Of course, even with all this testing, there are still bugs. But unlike some similar projects (especially commercial competitors) SQLite is open and honest about all bugs and provides bugs lists including lists of critical bugs and minute-by-minute chronologies of bug reports and code changes.

Stability is also an important point: how frustrating would it be to create hundreds of rooms just to have the asset database be corrupted? We use git for everything, so the worst case scenario is, we just roll back to the last commit anyway.

The SQLite code base is supported by an international team of developers who work on SQLite full-time. The developers continue to expand the capabilities of SQLite and enhance its reliability and performance while maintaining backwards compatibility with the published interface spec, SQL syntax, and database file format. The source code is absolutely free to anybody who wants it, but professional support is also available.

Let’s hope it won’t come to that :)

We the developers hope that you find SQLite useful and we charge you to use it well: to make good and beautiful products that are fast, reliable, and simple to use. Seek forgiveness for yourself as you forgive others. And just as you have received SQLite for free, so also freely give, paying the debt forward.

Words of wisdom.

duklite

Usually, when I have to use a C library from Jaakan’s JavaScript game code, I just have to write an ooc binding on top of it, and let my ruby glue generator take care of the rest, as I’ve explained in weekly update 12

For our graphics library, dye, it’s easy enough: a sprite, for example, has a texture, a position, an angle, a color, an opacity, etc. — it’s the same structure all the time.

However, in this case, it won’t cut it. The results of an SQL statement can vary wildly: from a table creation, or a row insertion, where we just want to make sure that everything went smoothly, to a select, where we want to retrieve data which format is determined by our query string.

In the case of SQLite integration, we want to take advantage of the fact that JavaScript is a highly dynamic language that allows you to store basically anything in an object.

To do so, we need to have a layer that communicates with sqlite, and directly uses Duktape’s API to translate it into a form that we can use from JavaScript.

That’s where duklite comes in: it uses a new functionality of collar that allows an ooc library to access the Duktape context directly.

It looks a little bit like this:

It allows us to use all the nice things about collar: the fact that classes are automatically mapped to JavaScript prototypes, inheritance is taken care of, they’re exposed in neatly isolated modules, etc. — and we can get down to the bare metal when we need to.

Separating view and model

Now, changing the way we store rooms in the game is no small undertaking. When I began refactoring, the game had fifteen thousand lines of code. By comparison, Photoshop 1.0 had about 128K lines of code - in terms of code size, we’re about 0.1 photoshops deep in the hole.

I intended to finish the refactoring before finishing this article, but it took much longer than expected and no amount of coffee could allow me to complete the task in a reasonable timeframe without seriously missing out on work and other life duties.

However, I’ll share with you how far I got, and how I got there.

First off, relational databases store structured data, so we need a schema that defines what each table can store. It’s relatively straightforward for rooms — an identifier, an (optional) name, references to each neighbor room, and so on.

full

But it’s not so easy for layers. Let’s take polygon layers, as they’re the most complex. Each polygon layer contains a grid of cells, 40 columns by 22 rows, which is a total of 880 cells.

My first instinct was to go for a naive approach and store those cells in their own table, with one row per cell. But then it gets worse: each cell has 8 control points that can be enabled or disabled. Again, I went with the naive approach of having 8 columns, one per control point.

But then, as of very recently, each control point can have its own sprinkle set. So that’s eight additional columns, and since we don’t want to store a string for every control point (that seems excessive), I went with integer corresponding to the primary key of another table, tilesets.

However, the naive approach proved to be quite slow. I could’ve tried to improve on it by:

  • Re-using preparing SQLite statements instead of having it parse them every time
  • Inserting several rows at once with a single statement
  • and other cool SQLite performance tricks

But the fact is: it’s quite silly. There’s really very few cases when you want to retrieve only part of the cells of a layer. In most cases, you either need everything or none at all (e.g. you’re just looking for a layer property like its tint or its parallax plane).

No amount of complications would reduce that silliness, so instead, let’s opt for a more practical approach: storing cell data as a binary blob. After all, its size never changes, it’s easy to pack 8 booleans in a byte, and it greatly minimizes the amount of back-and-forth with sqlite needed to create a layer or read it.

The only negative point is that the previous approach would have allowed to update only some cells, when painting for example. But considering that we might want to wait for a paint operation to be completed before saving to the database (so that the paint motion is smooth and lag-free), it’s no big deal.

In the middle of all that questioning and performance measuring, I got curious and opened the database with SQLite studio, a very nice GUI:

full

And I have to admit: even if I spending tens of hours on this major engine change has made excitement leave place for doubt, just the fact of seeing game data in an external tool, ready to be queried, filtered, sorted, and generally messed with programmatically, reassured me that it was all worthwile.

The future

As I mentioned, this article is not the end of this story, but perhaps it’s for the best. As I worked on this change, I started having more radical ideas. After all, while we’re changing things that touch maybe a third of a medium-size codebase, why not go a bit further?

For example, why separate polygon layers and tile layers? As it is, often we need a polygon layer of a certain tint at a certain depth, and just a few tiles here and there, of the same tint, at the same parallax depth. So far we’ve had to create a separate layer, which means using more disk space, more RAM, more VRAM, slower level loading, etc.

Instead, how about a single layer type: when there is no tile at a certain position, it’s filled by default. And when there is a tile, it’s masked by the polygons. This way, we could mask parts of a tile, to allow for greater variety in the level design.

Heck, there could even be a tool in the editor that infers a polygon shape from a given tile, by starting with an 8-point square and disabling control points to find a good fit. For example:

However, displaying such a layer, with a combination of polygon clipping and tile textures, is more complicated than before, and could be quite slow if done from JavaScript as before. It would probably be a good occasion to rewrite the layer drawing logic in ooc: it’s unlikely that the layer format will change from now on, which makes me more accepting of the idea to reimplement it natively.

That way, packing and unpacking layer cell data into a blob for storage in SQLite could be done directly in ooc instead of having to deal with binary data from JavaScript, which is not impossible, but awkward to say the least.

And while we’re cleaning things up, why not store objects directly at the room level? What purpose does it serve to have them in distinct layers? Most of them are either sensors (which have no visual existence), or characters, which live in the same plane as the player, there’s really no need for objects to exist in-between two other layers. And that simplifies our model some more.

Conclusion

In short, there’s a lot more work awaiting, but I’m rather optimistic as to the outcome:

  • Game data will be stored in a compact, robust, open binary format
  • Game data will be easy to query and process
  • The previous universe system which was fragile and prone to corruption will disappear
  • Level editing will be even more streamlined
  • Load and save times will be lower

Only good. See you next week!