Repository: https://github.com/tcsullivan/happy-fractal

I was first exposed to fractals in elementary school. The tech teacher at the time was familiar with open-source software, installing programs like TuxType and TuxPaint on the school's computers. This collection also included XaoS, a fun, easy-to-use program that lets you spend hours zooming around many different fractals.

Fractal generation and exploration makes for a great programming challenge, requiring smooth graphical performance and fast, precise calculations. I first got a taste of the code through a simple Mandelbrot viewer that I contributed to on GitHub. Following this, I was inspired to give the challenge my own go.

This also gave me a chance to study OpenCL, which allows for fast parallel computing on GPUs. The speed difference between this and CPU-bound calculations is undeniable, and can be tested by changing a compile-time flag.

`double`

is a very precise type, but its limits can be found after zooming for a while. The image will begin to stretch and distort, before falling apart into a mess of colored bars:

XaoS does not suffer like this, and can actually zoom in much further. In an effort to match these capabilities, I began implementing a fixed-point alternative to `double`

.

The largest type I could send to the GPU through OpenCL is a `uint64_t`

; however, this is the same bit-size as `double`

, so a pair of `uint64_t`

s are needed to achieve superior precision. This lead me to fixed-point arithmetic and a library called r128, which uses two `uint64_t`

s to provide Q64.64 fixed-point numbers.

However, `double`

already uses 52 bits for its mantissa; a jump to 64 bits is not that significant. By modifying r128, I was able to give more bits to the fractional part of the number. Since the Mandelbrot set is also contained to just a small region around the origin (0, 0), the fixed-point format could be changed to Q4.124 to provide twice as many fractional bits while still rendering the entire fractal.

## floating-point vs. fixed-point

Below is the Mandelbrot set generated with `double`

(left) and Q4.124 fixed-point (right). There are some major differences in the larger outlines of the fractal, but the fixed-point version shows additional details further inside.

Returning to the point where the floating-point implementation fell apart, the fixed-point version is able to zoom much closer to the center of the image. It even reaches a change from pink to blue that is barely visible in the original shot:

## fixed-point vs. XaoS

The fixed-point implementation puts up a decent fight against XaoS. Unlike the original `double`

implementation that stretches and collapses into horizontal bars, the fixed-point version sees a pixelated distortion instead:

Pushing in further, the fixed-point image becomes very poor.

We actually find ourselves near the limits of XaoS though: zooming to the center of this design results in a similar distortion.

### The secrets of XaoS

There are two secrets to the performance of XaoS, one of precision and another of speed. For precision, XaoS uses the 80-bit extended-precision format built into x86 processors. I chose to avoid this since it is not OpenCL compatible.

For my program, using OpenCL was a necessity for smooth framerates. `double`

calculations could be done at 60+ FPS all the way to its limits; fixed-point calculations started around 50 FPS and slowly decreased with zooming. Keeping to the CPU resulted in a performance drop of at least 50%.

XaoS overcame its CPU-bound limitation by realizing a unique algorithm that predicts future pixels based on previous calculations. Reimplementing this strategy would be a whole new challenge on its own.

## a benchmark

happy-fractal has compile-time switches for OpenCl and the fixed-point implementation. A third switch will do a "benchmark" that times a specific zoom sequence. This is how the four combinations run on my computer (Ryzen 1700X, 750 Ti):

**floating-point, OpenCL:** 4.95962s

**floating-point, CPU:** 10.30790s

**fixed-point, OpenCL:** 10.12300s

**fixed-point, CPU:** 34.11120s