I was first exposed to fractals in elementary school. The tech teacher at the time was familiar with open-source software, installing programs like TuxType and TuxPaint on the school's computers. This collection also included XaoS, a fun, easy-to-use program that lets you spend hours zooming around many different fractals.
Fractal generation and exploration makes for a great programming challenge, requiring smooth graphical performance and fast, precise calculations. I first got a taste of the code through a simple Mandelbrot viewer that I contributed to on GitHub. Following this, I was inspired to give the challenge my own go.
This also gave me a chance to study OpenCL, which allows for fast parallel computing on GPUs. The speed difference between this and CPU-bound calculations is undeniable, and can be tested by changing a compile-time flag.
double
is a very precise type, but its limits can be found after zooming for a while. The image will begin to stretch and distort, before falling apart into a mess of colored bars:
XaoS does not suffer like this, and can actually zoom in much further. In an effort to match these capabilities, I began implementing a fixed-point alternative to double
.
The largest type I could send to the GPU through OpenCL is a uint64_t
; however, this is the same bit-size as double
, so a pair of uint64_t
s are needed to achieve superior precision. This lead me to fixed-point arithmetic and a library called r128, which uses two uint64_t
s to provide Q64.64 fixed-point numbers.
However, double
already uses 52 bits for its mantissa; a jump to 64 bits is not that significant. By modifying r128, I was able to give more bits to the fractional part of the number. Since the Mandelbrot set is also contained to just a small region around the origin (0, 0), the fixed-point format could be changed to Q4.124 to provide twice as many fractional bits while still rendering the entire fractal.
Below is the Mandelbrot set generated with double
(left) and Q4.124 fixed-point (right). There are some major differences in the larger outlines of the fractal, but the fixed-point version shows additional details further inside.
Returning to the point where the floating-point implementation fell apart, the fixed-point version is able to zoom much closer to the center of the image. It even reaches a change from pink to blue that is barely visible in the original shot:
The fixed-point implementation puts up a decent fight against XaoS. Unlike the original double
implementation that stretches and collapses into horizontal bars, the fixed-point version sees a pixelated distortion instead:
Pushing in further, the fixed-point image becomes very poor.
We actually find ourselves near the limits of XaoS though: zooming to the center of this design results in a similar distortion.
There are two secrets to the performance of XaoS, one of precision and another of speed. For precision, XaoS uses the 80-bit extended-precision format built into x86 processors. I chose to avoid this since it is not OpenCL compatible.
For my program, using OpenCL was a necessity for smooth framerates. double
calculations could be done at 60+ FPS all the way to its limits; fixed-point calculations started around 50 FPS and slowly decreased with zooming. Keeping to the CPU resulted in a performance drop of at least 50%.
XaoS overcame its CPU-bound limitation by realizing a unique algorithm that predicts future pixels based on previous calculations. Reimplementing this strategy would be a whole new challenge on its own.
happy-fractal has compile-time switches for OpenCl and the fixed-point implementation. A third switch will do a "benchmark" that times a specific zoom sequence. This is how the four combinations run on my computer (Ryzen 1700X, 750 Ti):
floating-point, OpenCL: 4.95962s
floating-point, CPU: 10.30790s
fixed-point, OpenCL: 10.12300s
fixed-point, CPU: 34.11120s