In this post, we will learn about basic GPU programming & Julia by coding Conway's Game of Life. You may want to try to search “Conway's Game of Life” on Google ;).

What is Life? Tl;dr version

For each iteration

  • Under-population: any cell with fewer than 2 neighbours dies
  • Over-population: any cell with more than 3 neighbours dies
  • Reproduction: any cell with exact 3 neighbours becomes alive

Implementation

Basic

Writing the code for those rules is quite trivial. For each cell, check its neighbours, then do stuff. However, in this tutorial, we want to bring all the logic to GPU. A lot of operations on matrices can be done efficiently in parrallel in GPU. So instead of write a naive simulation in low level CUDA or OpenCL, we can just leverage some higher level code by thinking in linear algebra perspective.

Convolution

You can think convolution as a running window, when the kernel (filter) runs over some region it will apply some kind of function and create a new value. This is used heavily in image processing.

Animation about convolution
Convolution with a 3x3 kernel
Convolution is sum of product
It works by multiplying element wise and sum them all

How does this make our simulation faster?

This operation is available to use in most of linear algebra libraries. We just need to find a GPU library supporting 2D convolution and a kernel calculating number of neighbours around a cell.

If you don't know

High school polynominal multiplication requires around n^2 operations (for each term, multiply with all the terms from other polynomial). Can we do it better? Actually, you just need around n log n operations with fft. Recently, a research shows that we can do integer multiplication in O(n log n) too. Tl;dr: convolution is likely to be faster than your typical for loops.

The Code

using ArrayFire
using LinearAlgebra

let
    kernel = convert(Array{Float32}, [1 1 1; 1 0 1; 1 1 1]) |> AFArray
    state = rand(Bool, 30, 30) + Float32(0) |> AFArray
    step = 30
    for i = 1:step
        nb = convolve2(state, kernel, UInt32(0), UInt32(0))
        a = (nb == 2)
        b = (nb == 3)
        state = ((state .* a .+ b) > 0) + Float32(0)
        save_image(string(i, ".tiff"), state)
    end
end

To gif: convert -scale 2000% -delay 20 -loop 0 *.tiff image.gif

Animated result (loop)
Game of Life