Building Up Perlin Noise

For my inaugural post, I’d like to discuss Perlin noise. There are a number of explanations of Ken Perlin’s classic noise function, and most tend to lean heavily on viewing it as a weighted average of gradients, but I don’t think this is really the best way to get a constructive intuition for how it works. Perlin once mentioned that, “In fact, the noise function can be viewed as the surflet decomposition of a random density function sampled at a rectangular grid.” Though cryptic, I believe that this is the key to a far more useful understanding of Perlin noise. This post will be partly an explorable explanation of that idea and will show how to build up the noise function step-by-step.

Theory

Let’s start with the interpolant. The original Perlin noise algorithm used a cubic Hermite spline of the form s(t) = 3t² − 2t³. This particular function is also sometimes known as smoothstep. It describes an s-shape, ramping smoothly up from 0 to 1 over the range of 0 to 1. It’s also symmetrical around the center of this square; that is, s(t) = 1 − s(1 − t).

So let’s flip it around, 1 − s(t), so that it’s 1 at 0 and falls off to 0 at 1. Then use the absolute value, 1 − s(|t|), so that it’s symmetric and falls off smoothly to 0 at -1 too. Call this new variation f(t):

f(t) = 1 − (3 − 2|t|)t²

Next we’ll extend that to two dimensions by applying it separately to x and y and then taking the product. (I focus exclusively on 2D noise in this post, but it’s trivial to extend all this to higher dimensions.) The figure below shows how this looks. In this and all the next figures, I’ve used white for values of 1 and above, black for -1 and below, and grey for 0:


	×		=
f(x)		f(y)		f(x) f(y)

There are a couple of things to note here. First, it obviously evaluates to 1 at the origin. But this falls off fairly smoothly to exactly 0 at the boundaries of the -1 to 1 square. The central lobe isn’t circular by any means, but it’s a reasonable approximation here.

Next, let’s multiply that falloff kernel by a gradient. This is the gradient that will make this proper gradient noise and not just some other kind of noise function. The gradient here is computed by taking the dot product of an arbitrary vector, G, with each (x,y) coordinate as a vector offset from the origin. (If you’re not familiar with the vector dot product, you multiply the x-values from the two vectors, then the y-values, etc. and then sum.) The figure below shows the falloff, the gradient and their product. Try clicking or dragging to change vector G and see how it affects the result.


	×		=
f(x) f(y)		G ⋅ (x,y)		f(x) f(y) (G ⋅ (x,y))

If you played around with that, you may have noticed some things. First, when looking at the gradient, there’s always a line for which it remains 0 (i.e., grey in these figures) and it always passes through the origin perpendicular to the vector G, regardless of the magnitude of vector. Rotating G around changes the direction of the gradient, while changing the length controls the sharpness of the gradient (longer vectors make sharper gradients, shorter vectors make softer gradients). Nonetheless, the 2×2 square here is always divided into equal halves, one positive and one negative.

When you factor in the falloff, there’s still a line of zeroes through the origin, perpendicular to G and dividing the square into positive and negative halves. But now it meets the borders of the square and those are also zeroes. The equal-but-opposite minima and maxima lie within the interior of each half of the square. Effectively, it’s a dipole. Note that the direction of G now exclusively determines the shape of the two regions (and the positions of the minima and maxima). The length of G makes no difference to the shape; it only determines the intensity.

So given this, we can just focus on the direction of G and always use unit length vectors. If we clamp the product of the falloff kernel and the gradient to 0 at all points beyond the 2×2 square, this gives us the surflet mentioned in that cryptic sentence.

Now that we’ve got that basic element, we can center one at each point on the rectangular integer grid (such that they’ll overlap like shingles) and then simply sum them all up. If we do that, we have our Perlin noise! The figure below shows a random patch of noise with the integer grid and the gradient vectors for the surflets overlayed. You can click or drag on it to change the vectors and see how it affects the noise.

Incidentally, this is why I’m not so fond of the reduced table of vectors in Perlin’s Improving Noise paper. If one naively takes a 2D slice through the 3D noise, along z = 0 for example, each gradient is only in one of the 8 basic directions and it’s biased towards the cardinal directions. This can produce ugly runs in the noise that go at 45° angles:

Instead of dealing with the complexity of gradients and surflets like this, one alternative is to just place random values between -1 and 1 on a grid and then upsample it using bilinear or bicubic interpolation. Some tutorials mistakenly call that Perlin noise, though it really isn’t; instead it’s considered “value noise”, as opposed to the “gradient noise” category that Perlin’s belongs to.

So why bother with gradient noise? Both value noise and gradient noise vary smoothly. The upper limit for how quickly they can change really depends on the density of the grid. As a result, they both have similar upper limits on the main frequencies they contain.

But value noise may give you runs where the values at the grid points are similar and the noise hardly changes (giving a “splotchy” appearance). In fact, a really unlucky case where all the values at the grid points are the same would contain only the zero frequency. Also, if you integrate a patch of value noise over some area (e.g., for downsampling it for mipmaps), it will approach 0 statistically but it’s highly unlikely to be exactly 0.

By contrast, for any given surflet, the line perpendicular to the gradient vector slices the surflet into equal but opposite halves that that are next to each other. No matter how the gradients lie, there will always be at least some wiggle to a gradient noise function, and again it will be dependent on the grid density. As a result, gradient noise has a lower limit on the main frequencies it contains; it is approximately bandlimited whereas value noise is not. Also, since the halves of each surflet are equal and opposite integrating over the whole surflet yields exactly 0. A periodic patch of gradient noise therefore also integrates to exactly 0.

Code

That’s the theory. It’s time to turn that into working code. For this implementation, I’ll stick with very basic C++ in order to make this as easy as possible to port to other languages.

First, we’ll define three tables and a function to initialize them:

static int const size = 256;
static int const mask = size - 1;
int perm[ size ];
float grads_x[ size ], grads_y[ size ];
void init() {
    for ( int index = 0; index < size; ++index ) {
        int other = rand() % ( index + 1 );
        if ( index > other )
            perm[ index ] = perm[ other ];
        perm[ other ] = index;
        grads_x[ index ] = cosf( 2.0f * M_PI * index / size );
        grads_y[ index ] = sinf( 2.0f * M_PI * index / size );
    }
}

To give consistent results across each invocation of the noise function, the direction of the gradient vector at each point needs to remain constant. The traditional way to do this is to use a random permutation vector to help hash the point and then use the hash to look up a vector in a table. In this implementation, the table sizes needs to be a power of two so that we can use bitmasking to wrap indexes into the table. We could use the modulo operator instead, but then we’d have to add extra handling for negative coordinates.

The perm table here is just the numbers 0, 1, 2, 3… in randomly shuffled order, and grad_x together with grad_y give unit vectors evenly distributed around the circle. Perlin’s original implementation instead used random vectors from the 2×2 square around the origin and normalized them, but this is both clumpy and biased towards diagonals.

Now for the noise evaluation itself:

float f( float t ) {
    t = fabsf( t );
    return t >= 1.0f ? 0.0f : 1.0f -
        ( 3.0f - 2.0f * t ) * t * t;
}
float surflet( float x, float y, float grad_x, float grad_y ) {
    return f( x ) * f( y ) * ( grad_x * x + grad_y * y );
}
float noise( float x, float y ) {
    float result = 0.0f;
    int cell_x = floorf( x );
    int cell_y = floorf( y );
    for ( int grid_y = cell_y; grid_y <= cell_y + 1; ++grid_y )
        for ( int grid_x = cell_x; grid_x <= cell_x + 1; ++grid_x ) {
            int hash = perm[ ( perm[ grid_x & mask ] + grid_y ) & mask ];
            result += surflet( x - grid_x, y - grid_y,
                               grads_x[ hash ], grads_y[ hash ] );
        }
    return result;
}

This is pretty straightforward, given the approach outlined above. First we define a falloff function, f(), that evaluates to 1 at 0, and falls off to 0 at -1 and 1. Note that it’s also clamped to return 0 beyond that.

Next, a surflet() function that takes a point (relative to the center of the surflet) and a gradient vector and returns the value of the surflet there.

Finally, there’s the noise() function itself which takes the point to evaluate, loops over the nearest four grid points, hashes them to look up their gradient vectors, and then sums the values of the surflets that are centered on those grid points. Why the nearest four? Because those are the only surflets that can influence the results; the others are too far away. If you’ve played around with the noise patch in the previous figure, this should be fairly easy to see.

So there it is. So long as the permutation and gradient tables match (and floating point arithmetic notwithstanding) this should give exactly the same results as the original Perlin noise function, even though it computes it slightly differently. The code here is certainly not as optimized; Perlin’s original code eliminates some redundant table lookups for hashing and exploits the symmetry of s(t) = 1 − s(1 − t) to compute the falloffs more efficiently. It’s possible to turn the above code step-by-step into something very like the original noise code. However, the less optimized form here makes many modifications quite easy.

Modifications

Since this approach to implementing Perlin noise tends to factor things out more, it’s quite easy to make small changes to the different building blocks. Here are some possible modifications.

Stronger hash — If you look carefully at how the points are hashed via the permutation table to get a gradient, you’ll see that each column on the grid repeats the same sequence of gradient vectors in order. They are simply offset up and down from each other. It tends to be less noticeable when the table is large enough, but it’s still possible to do better.

One easy solution is to have separate permutation tables, one for each dimension, and exclusive-or the lookups from each. This still wraps at the ends of the table, but at least it’s a 2D- rather than 1D-wrapping:

int hash = perm_x[ grid_x & mask ] ^ perm_y[ grid_y & mask ];

Another possibility is to dispense with the permutation table entirely. Zafar et al. suggest using the Tiny Encryption Algorithm to compute the hash algorithmically. This approach is nice because it doesn’t wrap at all in any reasonable period.

Quintic interpolant — Remember that we based f() off of a cubic polynomial, s(t) = 3t² − 2t³, that we modified to be symmetrical around 0 and flipped upside down. This has non-zero second derivatives at the edges that can cause visual creases along grid lines. Perlin suggested an alternative quintic polynomial, s(t) = 6t⁵ − 15t⁴ + 10t³, to fix this. Substituting this in is easy:

( ( 6.0f * t - 15.0f ) * t + 10.0f ) * t * t * t;

Wide interpolant — In an old tech report, I once suggested an alternative quintic interpolant, s(t) = 4(1 - t²/4)⁵ - 3(1 - t²/4)⁴, designed to be used over twice the range (that is, for 4×4 sized surflets). Here’s what the new body of f() should like to implement this:

t = 1.0f - 0.25f * t * t;
return t < 0.0f ? 0.0f :
    ( 4.0f * t - 3.0f ) * t * t * t * t;

Note that because the surflets are now twice as wide, you’ll need to extend the loop for grid_y by 1 in each direction (i.e., from cell_y - 1 to cell_y + 2) and similarly for grid_x. Otherwise, the surflets will be clipped.

Here is a plot showing how the three versions of f() compare:

Cubic Quintic Wide

Radial surflets — We computed the falloff kernel by applying f() to each dimension and multiplying the result. This makes it a a separable filter. However, we can also apply it as a radial filter instead:

return f( hypotf( x, y ) ) * ( grad_x * x + grad_y * y );

When we do this, the surflets now fall off to 0 at the unit circle instead of the 2×2 square. The nice thing about this is that the surflets now rotate perfectly to align with the direction of their gradient, G.

Improved vectors — As I mentioned above, I’m not a fan of the reduced set of 12 gradient vectors in Improving Noise and prefer the vectors from the classic version. However, if you really want to use them, you can you can set the tables explicitly rather than in init():

float grads_x[] = { -1.0f, -1.0f, -1.0f, -1.0f,  0.0f,  0.0f,
                     0.0f,  0.0f,  1.0f,  1.0f,  1.0f,  1.0f };
float grads_y[] = { -1.0f,  0.0f,  0.0f,  1.0f, -1.0f, -1.0f,
                     1.0f,  1.0f, -1.0f,  0.0f,  0.0f,  1.0f };

and then do the lookups from those tables in noise() modulo 12, as long as you don’t mind a slight bias.

Value noise — If you modify the surflet() function to simply use a single value like grad_x rather than the dot product that produces the gradient, you’ll get value noise:

return f( x ) * f( y ) * grad_x;

Note, however, that grad_x is biased away from 0; if you try this, you may also want to change init() to set it to uniform random numbers between -1 and 1.

Jittered grid — There’s no reason that the surflet centers have to be confined to the points on the grid. We can add an offset to randomize their positions within the grid cells:

result += surflet( x - ( grid_x + jit_x[ hash ] ),
                   y - ( grid_y + jit_y[ hash ] ),
                   grads_x[ hash ], grads_y[ hash ] );

Here, jit_x and jit_y are new tables initialized with random numbers between 0 and 1. Note that the start of the loop over grid_y will have to be extended to include cell_y - 1 and similarly for grid_x. One nice thing about this is that it breaks up the regular pattern of the grid points always being 0.

Tilable noise — At this point it should be fairly clear how to make tilable noise: tile over an integer number of cells and just make sure the choice of gradient directions wraps across the tile boundaries. If we want to repeat every repeat_x and repeat_y cells in x and y, respectively, (and where both are size or smaller) then it will look something like this, where the extra modulos help to handle negative grid coordinates correctly:

int wrap_x = grid_x % repeat_x + repeat_x;
int wrap_y = grid_y % repeat_y + repeat_y;
int hash = perm[ ( perm[ wrap_x % repeat_x ] + wrap_y ) % repeat_y ];

Simplicial grid — Instead of a grid of unit squares, simplex noise uses a grid of equilateral triangles (or tetrahedrons in 3D) with unit-length sides. It’s a bit more involved to implement since you need to map coordinates onto the simplex grid to find the neighborhood of nearest surflets and then map their positions back to evaluate the them. Still, if you don’t mind evaluating more surflets than strictly necessary it’s not too hard:

int cell_x = roundf( x * 2.0f / sqrtf( 3.0f ) );
int cell_y = roundf( y + x / sqrtf( 3.0f ) );
for ( int grid_y = cell_y - 1; grid_y <= cell_y + 1; ++grid_y )
    for ( int grid_x = cell_x - 1; grid_x <= cell_x + 1; ++grid_x ) {
        int hash = perm[ ( perm[ grid_x & mask ] + grid_y ) & mask ];
        result += surflet( x - ( grid_x * sqrtf( 3.0f ) * 0.5f ),
                           y - ( grid_y - grid_x * 0.5f ),
                           grads_x[ hash ], grads_y[ hash ] );
    }

Design Your Own

With a little care many of these possible modifications can be combined to derive new variations on noise with different characteristics. Here’s a small sampler to explore some possibilities. You can click on the noise to generate new sets of random gradient vectors.

If you’re interested in reading more, one particularly interesting choice of elements is the Gabor Noise of Lagae et al. Their approach uses a Gabor kernel for the surflet function, multiple jittered surflets within each grid cell, and a selectable distribution for the gradients vectors that can shape them towards certain directions. Iñigo Quilez also uses jitter in a neat way with his voronoise.

Edited Dec 19, 2015 - Added the plot of the three versions of f().