28 January, 2017

Reflect-Trick

Baseline

2D rendering APIs must define how gradient or texture should be rendered when coordinates accessing it are out of bounds. In 2D this is usually called repeat or spread mode, and here are the most used ones:

  • Pad - Coordinate is saturated [result = pad(x, width)]
  • Repeat - Coordinate is repeated [result = repeat(x, width)]
  • Reflect - Coordinate is reflected [result = reflect(x, width)]

Where result, x, and width are integers and result is always within [0, width) range (note that width is exclusive). A baseline implementation of pad, repeat, and reflect functions is below:

inline int pad(int x, int width) {
  if (x < 0)
    return 0;

  if (x >= width)
    return width - 1;

  return x;
}

inline int repeat(int x, int width) {
  // Repeat from 0 to width.
  x = x % width;
  if (x < 0)
    x += width;

  return x;
}

inline int reflect(int x, int width) {
  // Repeat from 0 to width * 2.
  x = x % (width * 2);
  if (x < 0)
    x += width * 2;

  // Reflect from 0 to width.
  if (x >= width)
    x = width * 2 - 1 - x;

  return x;
}

When Width is a Power of 2

In many cases width would be a power of 2, especially when rendering gradients. In that case the expensive MOD operator (%) could be replaced by AND operator (&), which would make the implementation shorter and generally faster:

inline int repeat(int x, int width) {
  return x & (width - 1);
}

inline int reflect(int x, int width) {
  // Repeat from 0 to width * 2.
  x = x & ((width * 2) - 1);

  // Reflect from 0 to width.
  if (x >= width)
    x = width * 2 - 1 - x;

  return x;
}

AND solves two problems - it implicitly handles the x < 0 condition, because the width is never negative (for example -2 & 255 == 254, which is the same as -2 + 256 in 2's complement arithmetic) and it replaces a very expensive MOD operator by a totally cheap AND operator, that does the same job in our case, because of power of 2 constraint. However, the reflect function is still pretty long and could be simplified even more, especially if we want to vectorize it.

Reflect-Trick Suitable for SIMD

We know that width can be used to precalculate other values that can be used instead of it. For example if we always use width * 2 - 1 as a mask it would be better to just calculate it once and reuse the precalculated value. For the following reflect-trick we precalculate it so it becomes a mask instead of width:

inline int precalc(int width) {
  return width * 2 - 1;
}

We also know that SIMD instructions like more min/max approach than if/mask approach, so instead of using the condition the reflection itself could be rewritten to use a single subtraction and minimum:

// y is the precomputed value, equal to `width * 2 - 1`.
inline int reflect(int x, int y) {
  // Repeat from 0 to width * 2.
  x = x & y;

  // Reflect by using the reflect-trick.
  return min(x, y - x);
}

The following reasons summarize why this implementation is generally better for SIMD implementation (and probably for scalar too):

  • It uses only one constant (y), which could be kept in register.
  • It doesn't use branching, which would need to be translated to comparison and masking.

Handling Both Repeat & Reflect at the Same Time

If we introduce another variable, it would be possible to handle both repeat and reflect modes inside a single function. This would be extremely beneficial for 2D pipelines that allow different repeat modes for X and Y. We name our constants as repeatValue and reflectValue, and precalculate them the following way:

inline int precalcRepeatValue(int width, bool isReflecting) {
  return isReflecting ? width * 2 - 1 : width - 1;
}

inline int precalcReflectValue(int width, bool isReflecting) {
  return width * 2 - 1;
}

And use such values in a single function:

inline int repeatOrReflect(int x, int repeatValue, int reflectValue) {
  x = x & repeatValue;
  return min(x, reflectValue - x);
}

Based on the initialization, the repeatOrReflect() function will either repeat or reflect.

Reflect-Trick that uses SAR and XOR

On X86 architecture the reflection can also be performed by using a SAR (arithmetic shift right) and XOR instructions. The idea is to shift the interval of the value to be reflected from [0...width*2) to [-width..width). If we do so we can use the following:

inline int reflect(int x) {
  const int kShift = (sizeof(int) * 8 - 1);
  return (x >> kShift) ^ x;
}

The function in this case expects value in a correct range and will always output a value within [0..width). On X86 the function would typically translate to 3 instructions:

mov idx, x   ; Copy x to idx
sar idx, 31  ; Copy sign bits across idx
xor idx, x   ; Reflect

Or 2 instructions if we can make sure that the input x is in eax register and output in edx register:

cdq          ; Sign-extend eax to edx:eax
xor edx, eax ; Reflect

Conclusion

These tricks can be used to improve the performance of gradient and texture rendering. In this post I focused mostly on repeating and reflection constrained by a power of 2, which is very useful for gradients, but it's not that much for textures. If the repeatValue is removed from repeatOrReflect function and the implementation handles the repeat during advancing x and y coordiantes the same code could be used to repeat/reflect a value of any range, which is exactly how b2dpipe works.

No comments:

Post a Comment