Building a fast tile engine — pt. 1

A few weeks ago, I explained how Jaakan’s technical stack is built. We saw that it was a bunch of native libraries, with some ooc in between, and then a whole lot of JavaScript (15K lines) for the various editors and the game itself.

I also launched the idea to merge two types of layers: polygon layers and tile layers, and to port their rendering logic to ooc for better performance. As a reminder, ooc is compiled down to C, which is then compiled to native code, meaning that it’s much faster than interpreting JavaScript code from within Duktape, but it also means every time we change it we have to recompile it.

In this article series, we’re going to take a look at how we can build ourself a fast tiling engine. Here’s a few basic facts why it matters:

  • Each room is composed of between 4 and 10 layers
  • Each layer is 40 tiles wide and 22 tiles high (with 32x32 tiles, that gives us a 1280x704 resolution, which is not by chance!)
  • Each tile can have up to 8 borders that are painted with sprinkles

For this first article of the series, I’m going to explain how we were doing it up until now.

Sprites and atlases

When making a 2D game, one tends to think in terms of sprites: an image displayed somewhere on the screen. When implementing a 2D engine on top of a 3D accelerated graphics library such as OpenGL, it usually is implemented by drawing a textured quad, which is, in fact, two triangles:

In this diagram, “xy” represent vertex coordinates, and “uv” represents texture coordinates. By default, texture coordinates range from 0 to 1 on the x and y axis, no matter what size the texture (image) actually is.

To draw the quad above, we can go the naive way, with 6 vertices:

  • triangle #1: bottom-left, top-left, bottom-right
  • triangle #2: top-left, top-right, bottom-right

Or, we can get a bit smarter about it, and use triangle strips, which makes a triangle out of the first 3 vertices we send, and then for each additional vertex, it takes the two previous ones to make up a triangle.

  • triangle #1: bottom-left, top-left, bottom-right
  • triangle #2: top-right

And you get away with only 4 vertices to draw a quad, which isn’t too bad.

Now, in the example above, texture coordinates were either 0 or 1, which means we were using the entire texture. In the case of a map made up of tiles, it’s much more efficient to store multiple tiles in the same texture.

To illustrate what I mean, here’s a tileset for a Super Mario Brothers 3 clone:

As you can see, many tiles are stored in the same 1024x1024 texture. One name for it is a texture atlas. With that, when displaying a tile, we have to make sure that the texture coordinates are just right.

Let’s take a 4x4 atlas for instance:

In that case, our texture coordinates would be:

  • bottom-left: (.25, .5)
  • bottom-right: (.5, .5)
  • top-right: (.5, .75)
  • top-left: (.25, .75)

Why is it important to pack multiple tiles in a single texture?

  • Loading: it’s cheaper to load one big image rather than to load 1024 small ones.
  • OpenGL is a state machine: it’s cheaper to draw 1024 shapes with the same texture than having to bind a different texture for every draw

Matrix transformations and draw calls

I’m not going to let this article get too technical, I promise, but basically, for every object you can draw in dye (which is a glorified 2D scenegraph on top of OpenGL 2.x+), there’s a 4x4 transformation matrix, which allows us to translate, rotate, and scale objects any way we want.

The matrix code in dye isn’t really optimized, and it’s not that cheap to pass matrices around, especially to GLSL shaders (which we’ll cover better later). So if for every layer we have 40 * 22 = 880 tiles to draw, and we have 10 layers, that means we have 8800 draw calls, 8800 texture binds (because, we might be using texture atlases, but dye doesn’t know that), 8800 matrix computations in ooc, and we have to pass 8800 a bunch of uniforms like the modelview matrix, the color, etc.

That’s not optimized at all — in fact, it won’t run at 60FPS even on relatively modern hardware. But clearly you’ve seen Jaakan run at 60FPS in previous weekly updates, so how did we pull that off? Well, in the current codebase, each layer is rendered to a 1280x704 texture - that way, each room is only 1 quad with 1 texture per layer.

That approach has drawbacks though:

  • It could get pretty expensive, VRAM-wise: but compared to the kind of 3D games out there, I’d say we’re pretty safe on that front.
  • Rendering every layer to texture is still pretty expensive, ergo, longer load times, and for no good reason.

In part 2, we’ll solve at least half of our problem. Our goal:

  • To avoid caching layers to a texture (for dynamic updates)
  • Have non-existent load times
  • Use only 1 draw call per layer

In the meantime, don’t hesitate to share this article, or follow us so you don’t miss an update. See you next week!