24 August, 2018

C++ Compilers and FP Rounding on X86

Historically a floating point rounding was always a painful job on X86 targets. The main reason is lacking support from HW point of view as X86 initially offered only instructions to round by using the current rounding mode, which is in most cases assumed to be round-to-even. Round-to-even is the best option for intermediate computations (all floating computations you do would be rounded this way implicitly), however, it's probably the least used "explicit" rounding mode - think how many times have you used it? I guess zero. Please note that by explicit I mean code that you write to round something like std::floor(x).

As I work with floating point numbers often I always tend to check what compilers do with my code to identify if I can help them a bit to generate what I want. I decided to take a look and compare how different compilers handle rounding operations. These functions are covered in this article:

  • nearbyint
  • trunc
  • floor
  • ceil

Baseline

The following code is a baseline that is used for testing:

#include <cmath>

double my_nearby(double x) noexcept { return std::nearbyint(x); }
double my_trunc(double x) noexcept { return std::trunc(x); }
double my_floor(double x) noexcept { return std::floor(x); }
double my_ceil(double x) noexcept { return std::ceil(x); }

SSE2 Code Generation

This is a baseline code generation targeting pre-SSE4.1 hardware.

GCC [-O2]

my_nearby(double):
  jmp       nearbyint

my_trunc(double):
  movsd     xmm2, QWORD PTR .LC1[rip]
  movapd    xmm3, xmm0
  movsd     xmm4, QWORD PTR .LC0[rip]
  movapd    xmm1, xmm0
  andpd     xmm3, xmm2
  ucomisd   xmm4, xmm3
  jbe     .L4
  cvttsd2si rax, xmm0
  pxor      xmm1, xmm1
  andnpd    xmm2, xmm0
  cvtsi2sdq xmm1, rax
  orpd      xmm1, xmm2
  .L4:
  movapd    xmm0, xmm1
  ret

my_floor(double):
  movsd     xmm3, QWORD PTR .LC1[rip]
  movapd    xmm2, xmm0
  movsd     xmm4, QWORD PTR .LC0[rip]
  movapd    xmm1, xmm0
  andpd     xmm2, xmm3
  ucomisd   xmm4, xmm2
  jbe     .L6
  cvttsd2si rax, xmm0
  pxor      xmm2, xmm2
  movsd     xmm4, QWORD PTR .LC2[rip]
  andnpd    xmm3, xmm0
  cvtsi2sdq xmm2, rax
  movapd    xmm5, xmm2
  cmpnlesd  xmm5, xmm0
  movapd    xmm1, xmm5
  andpd     xmm1, xmm4
  subsd     xmm2, xmm1
  movapd    xmm1, xmm2
  orpd      xmm1, xmm3
  .L6:
  movapd    xmm0, xmm1
  ret

my_ceil(double):
  movsd     xmm3, QWORD PTR .LC1[rip]
  movapd    xmm2, xmm0
  movsd     xmm4, QWORD PTR .LC0[rip]
  movapd    xmm1, xmm0
  andpd     xmm2, xmm3
  ucomisd   xmm4, xmm2
  jbe     .L8
  cvttsd2si rax, xmm0
  pxor      xmm2, xmm2
  movsd     xmm4, QWORD PTR .LC2[rip]
  andnpd    xmm3, xmm0
  cvtsi2sdq xmm2, rax
  cmpnlesd  xmm1, xmm2
  andpd     xmm1, xmm4
  addsd     xmm1, xmm2
  orpd      xmm1, xmm3
  .L8:
  movapd    xmm0, xmm1
  ret

.LC0:
  .long     0
  .long     1127219200

.LC1:
  .long     4294967295
  .long     2147483647
  .long     0
  .long     0

.LC2:
  .long     0
  .long     1072693248

Clang [-O2]

my_nearby(double):
  jmp nearbyint

my_trunc(double):
  jmp trunc

my_floor(double):
  jmp floor

my_ceil(double):
  jmp ceil

MSVC [/O2]

double my_nearby(double) PROC
  jmp nearbyint
double my_nearby(double) ENDP

double my_trunc(double) PROC
  jmp trunc
double my_trunc(double) ENDP

double my_floor(double) PROC
  jmp floor
double my_floor(double) ENDP

double my_ceil(double) PROC
  jmp ceil
double my_ceil(double) ENDP

Interpreting the Results

As we can see GCC inlined SSE2 code and other compilers just tail-called C functions that would handle the problem at a library level. The GCC inlined code shows how complicated the rounding really is on a pre-SSE4.1 hardware. It's a lot of code to cover a lot of cases and to perform rounding by using instructions not really dedicated to that task.

Next experiment will only cover GCC and Clang compilers as I haven't found a way to tell MSVC to use SSE4.1 instruction set for that, but I would add some tips for MSVC and AVX code generation afterwards.

SSE4.1 Code Generation

SSE4.1 output of GCC and Clang by using an additional -msse4.1 option.

GCC [-O2 -msse4.1]

my_nearby(double):
  jmp nearbyint

my_trunc(double):
  roundsd xmm0, xmm0, 11
  ret

my_floor(double):
  roundsd xmm0, xmm0, 9
  ret

my_ceil(double):
  roundsd xmm0, xmm0, 10
  ret

Clang [-O2 -msse4.1]

my_nearby(double):
  roundsd xmm0, xmm0, 12
  ret

my_trunc(double):
  roundsd xmm0, xmm0, 11
  ret

my_floor(double):
  roundsd xmm0, xmm0, 9
  ret

my_ceil(double):
  roundsd xmm0, xmm0, 10
  ret

Interpreting the Results

As we can see most of the time we've got exactly what we wanted. Both compilers have chosen wisely the instructions dedicated for rounding so you get virtually no overhead here except in GCC's nearbyint() case, which I don't quite understand. It appears to be just a missed optimization, which was already reported as #71278. As I mentioned earlier, I don't know any way how to tell MSVC to generate such code so I cannot cover this compiler here (MSVC /arch parameter can only be used to switch to AVX code generation, which is far more advanced than SSE4.1)

AVX Code Generation to Cover MSVC Inability to Generate SSE4.1

The only way to tell MSVC to use instructions dedicated to rounding is to turn on AVX code generation. I initially didn't want to go this way, but I have found a very interesting thing that people should know. Even when AVX/AVX2 is turned on MSVC would not use it for rounding by default!

MSVC [/O2 /arch:AVX]

double my_nearby(double) PROC
  jmp nearbyint
double my_nearby(double) ENDP

double my_trunc(double) PROC
  jmp trunc
double my_trunc(double) ENDP

double my_floor(double) PROC
  jmp floor
double my_floor(double) ENDP

double my_ceil(double) PROC
  jmp ceil
double my_ceil(double) ENDP

Surprise! To be honest I don't know why MSVC cannot do that, but I have found a solution for some cases, but it's compiler-specific. We can use pragmas to temporarily tell MSVC that we want "fast" floating point operations and MSVC would then output instructions that we would expect [in some cases]. There are also compiler options to do this globally, but I personally don't recommend it as it gives the compiler so much power to reorder floating point operations that sometimes shouldn't be reordered.

MSVC Specific Code

#include <cmath>

#ifdef _MSC_VER
#pragma float_control(precise, off, push)
#endif

double my_nearby(double x) noexcept { return std::nearbyint(x); }
double my_trunc(double x) noexcept { return std::trunc(x); }
double my_floor(double x) noexcept { return std::floor(x); }
double my_ceil(double x) noexcept { return std::ceil(x); }

#ifdef _MSC_VER
#pragma float_control(pop)
#endif
double my_nearby(double) PROC
  jmp nearbyint
double my_nearby(double) ENDP

double my_trunc(double) PROC
 jmp trunc
double my_trunc(double) ENDP

double my_floor(double) PROC
  vxorpd   xmm2, xmm2, xmm2
  vroundsd xmm0, xmm2, xmm0, 1
  ret      0
double my_floor(double) ENDP

double my_ceil(double) PROC
  vxorpd   xmm2, xmm2, xmm2
  vroundsd xmm0, xmm2, xmm0, 2
  ret      0
double my_ceil(double) ENDP

Well, the code is much better for floor/ceil case, but the use of vxorpd instruction is just nonsensical as it zeroes a part of the register that is irrelevant for both rounding and return. This is in general what MSVC does often without any good reason. Why MSVC didn't inline also vroundsd instruction for nearby and trunc cases is not clear to me, but compilers often behave irrationally.

SSE2 Code Generation Through Intrinsics

As a bonus I have decided to also provide a pure SSE2 intristics solution that implements all four rounding operations in a branchless manner. I use this code in Blend2D for pre-SSE4.1 hardware and it's not the same as the code that GCC generates. In addition this code can be easily used to convert 2 double precision floating numbers at a time in cases in which you need to round vectors. The code can be further optimized as _mm_set_sd() is not really an efficient way to address constants on X86.

#include <stdint.h>
#include <emmintrin.h>

static inline __m128d my_blend(const __m128d& x, const __m128d& y, const __m128d& mask) noexcept {
  return _mm_or_pd(_mm_and_pd(mask, x),
                   _mm_and_pd(y, mask));
}

double my_nearby(double x) noexcept {
  __m128d src = _mm_set_sd(x);
  __m128d maxn = _mm_set_sd(4503599627370496.0);
  __m128d magic = _mm_set_sd(6755399441055744.0);

  __m128d msk = _mm_cmpnlt_sd(src, maxn);
  __m128d rnd = _mm_sub_sd(_mm_add_sd(src, magic), magic);

  return _mm_cvtsd_f64(my_blend(rnd, src, msk));
}

double my_trunc(double x) noexcept {
  static const uint64_t kSepMask[1] = { 0x7FFFFFFFFFFFFFFFu };

  __m128d src = _mm_set_sd(x);
  __m128d sep = _mm_load_sd(reinterpret_cast<const double*>(kSepMask));

  __m128d src_abs = _mm_and_pd(src, sep);
  __m128d sign = _mm_andnot_pd(sep, src);

  __m128d maxn = _mm_set_sd(4503599627370496.0);
  __m128d magic = _mm_set_sd(6755399441055744.0);

  __m128d msk = _mm_or_pd(_mm_cmpnlt_sd(src_abs, maxn), sign);
  __m128d rnd = _mm_sub_sd(_mm_add_sd(src_abs, magic), magic);
  __m128d maybeone = _mm_and_pd(_mm_cmplt_sd(src_abs, rnd), _mm_set_sd(1.0));

  return _mm_cvtsd_f64(my_blend(_mm_sub_sd(rnd, maybeone), src, msk));
}

double my_floor(double x) noexcept {
  __m128d src = _mm_set_sd(x);
  __m128d maxn = _mm_set_sd(4503599627370496.0);
  __m128d magic = _mm_set_sd(6755399441055744.0);

  __m128d msk = _mm_cmpnlt_sd(src, maxn);
  __m128d rnd = _mm_sub_sd(_mm_add_sd(src, magic), magic);
  __m128d maybeone = _mm_and_pd(_mm_cmplt_sd(src, rnd), _mm_set_sd(1.0));

  return _mm_cvtsd_f64(my_blend(_mm_sub_sd(rnd, maybeone), src, msk));
}

double my_ceil(double x) noexcept {
  __m128d src = _mm_set_sd(x);
  __m128d maxn = _mm_set_sd(4503599627370496.0);
  __m128d magic = _mm_set_sd(6755399441055744.0);

  __m128d msk = _mm_cmpnlt_sd(src, maxn);
  __m128d rnd = _mm_sub_sd(_mm_add_sd(src, magic), magic);
  __m128d maybeone = _mm_and_pd(_mm_cmpnle_sd(src, rnd), _mm_set_sd(1.0));

  return _mm_cvtsd_f64(my_blend(_mm_add_sd(rnd, maybeone), src, msk));
}

GCC [-O2]

my_nearby(double):
  movq     xmm0, xmm0
  movq     xmm3, QWORD PTR .LC1[rip]
  movapd   xmm2, xmm0
  movapd   xmm1, xmm0
  cmpnltsd xmm1, QWORD PTR .LC0[rip]
  addsd    xmm2, xmm3
  subsd    xmm2, xmm3
  andpd    xmm0, xmm1
  andpd    xmm1, xmm2
  orpd     xmm0, xmm1
  ret

my_trunc(double):
  movq     xmm1, QWORD PTR .LC2[rip]
  movq     xmm0, xmm0
  movq     xmm4, QWORD PTR .LC1[rip]
  movapd   xmm2, xmm0
  andpd    xmm2, xmm1
  movapd   xmm3, xmm1
  movapd   xmm1, xmm2
  andnpd   xmm3, xmm0
  cmpnltsd xmm1, QWORD PTR .LC0[rip]
  orpd     xmm1, xmm3
  movapd   xmm3, xmm2
  addsd    xmm3, xmm4
  andpd    xmm0, xmm1
  subsd    xmm3, xmm4
  cmpltsd  xmm2, xmm3
  andpd    xmm2, XMMWORD PTR .LC3[rip]
  subsd    xmm3, xmm2
  andpd    xmm1, xmm3
  orpd     xmm1, xmm0
  movapd   xmm0, xmm1
  ret

my_floor(double):
  movq     xmm0, xmm0
  movq     xmm3, QWORD PTR .LC1[rip]
  movapd   xmm2, xmm0
  movapd   xmm1, xmm0
  cmpnltsd xmm1, QWORD PTR .LC0[rip]
  addsd    xmm2, xmm3
  subsd    xmm2, xmm3
  movapd   xmm3, xmm0
  andpd    xmm0, xmm1
  cmpltsd  xmm3, xmm2
  andpd    xmm3, XMMWORD PTR .LC3[rip]
  subsd    xmm2, xmm3
  andpd    xmm1, xmm2
  orpd     xmm0, xmm1
  ret

my_ceil(double):
  movq     xmm0, xmm0
  movq     xmm3, QWORD PTR .LC1[rip]
  movapd   xmm2, xmm0
  movapd   xmm1, xmm0
  cmpnltsd xmm1, QWORD PTR .LC0[rip]
  addsd    xmm2, xmm3
  subsd    xmm2, xmm3
  movapd   xmm3, xmm0
  andpd    xmm0, xmm1
  cmpnlesd xmm3, xmm2
  andpd    xmm3, XMMWORD PTR .LC3[rip]
  addsd    xmm2, xmm3
  andpd    xmm1, xmm2
  orpd     xmm0, xmm1
  ret

.LC0:
  .long    0
  .long    1127219200
  .long    0
  .long    0

.LC1:
  .long    0
  .long    1127743488
  .long    0
  .long    0

.LC2:
  .long    4294967295
  .long    2147483647
  .long    0
  .long    0

.LC3:
  .long    0
  .long    1072693248
  .long    0
  .long    0

Clang [-O2]

.LCPI0_0:
  .quad    4841369599423283200     # double 4503599627370496
.LCPI0_1:
  .quad    4843621399236968448     # double 6755399441055744
.LCPI0_2:
  .quad    -4379750637617807360    # double -6755399441055744

my_nearby(double):
  movq     xmm1, xmm0
  addsd    xmm0, qword ptr [rip + .LCPI0_1]
  addsd    xmm0, qword ptr [rip + .LCPI0_2]
  orpd     xmm0, xmm1
  cmpnltsd xmm1, qword ptr [rip + .LCPI0_0]
  andpd    xmm0, xmm1
  ret

.LCPI1_0:
  .quad    4841369599423283200     # double 4503599627370496
.LCPI1_1:
  .quad    4843621399236968448     # double 6755399441055744
.LCPI1_2:
  .quad    -4379750637617807360    # double -6755399441055744

my_trunc(double):
  movq     xmm1, xmm0
  movabs   rax, 9223372036854775807
  movq     xmm2, rax
  pand     xmm2, xmm1
  movabs   rax, -9223372036854775808
  movq     xmm3, rax
  pand     xmm3, xmm1
  movdqa   xmm4, xmm2
  cmpnltsd xmm4, qword ptr [rip + .LCPI1_0]
  movsd    xmm0, qword ptr [rip + .LCPI1_1]
  addsd    xmm0, xmm2
  addsd    xmm0, qword ptr [rip + .LCPI1_2]
  orpd     xmm4, xmm3
  cmpltsd  xmm2, xmm0
  movabs   rax, 4607182418800017408
  movq     xmm3, rax
  pand     xmm3, xmm2
  subsd    xmm0, xmm3
  orpd     xmm0, xmm1
  andpd    xmm0, xmm4
  ret

.LCPI2_0:
  .quad    4841369599423283200     # double 4503599627370496
.LCPI2_1:
  .quad    4843621399236968448     # double 6755399441055744
.LCPI2_2:
  .quad    -4379750637617807360    # double -6755399441055744

my_floor(double):
  movq     xmm1, xmm0
  addsd    xmm0, qword ptr [rip + .LCPI2_1]
  addsd    xmm0, qword ptr [rip + .LCPI2_2]
  movdqa   xmm2, xmm1
  cmpltsd  xmm2, xmm0
  movabs   rax, 4607182418800017408
  movq     xmm3, rax
  pand     xmm3, xmm2
  subsd    xmm0, xmm3
  orpd     xmm0, xmm1
  cmpnltsd xmm1, qword ptr [rip + .LCPI2_0]
  andpd    xmm0, xmm1
  ret

.LCPI3_0:
  .quad    4841369599423283200     # double 4503599627370496
.LCPI3_1:
  .quad    4843621399236968448     # double 6755399441055744
.LCPI3_2:
  .quad    -4379750637617807360    # double -6755399441055744

my_ceil(double):
  movq     xmm1, xmm0
  addsd    xmm0, qword ptr [rip + .LCPI3_1]
  addsd    xmm0, qword ptr [rip + .LCPI3_2]
  movdqa   xmm2, xmm1
  cmpnlesd xmm2, xmm0
  movabs   rax, 4607182418800017408
  movq     xmm3, rax
  pand     xmm3, xmm2
  addsd    xmm0, xmm3
  orpd     xmm0, xmm1
  cmpnltsd xmm1, qword ptr [rip + .LCPI3_0]
  andpd    xmm0, xmm1
  ret

MSVC [/O2]

double my_nearby(double) PROC
  movaps   xmm1, xmm0
  xorps    xmm4, xmm4
  movsd    xmm4, xmm1
  movaps   xmm2, xmm4
  movaps   xmm0, xmm4
  addsd    xmm0, QWORD PTR __real@4338000000000000
  cmpnltsd xmm2, QWORD PTR __real@4330000000000000
  subsd    xmm0, QWORD PTR __real@4338000000000000
  andps    xmm0, xmm2
  andps    xmm2, xmm4
  orps     xmm0, xmm2
  ret      0
double my_nearby(double) ENDP

double my_trunc(double) PROC
$LN6:
  sub      rsp, 24
  movsd    xmm3, QWORD PTR unsigned __int64 const * const `double my_trunc(double)'::`2'::kSepMask
  movaps   xmm1, xmm0
  movsd    xmm0, QWORD PTR __real@3ff0000000000000
  movaps   xmm5, xmm3
  movaps   XMMWORD PTR [rsp], xmm6
  xorps    xmm6, xmm6
  movsd    xmm6, xmm1
  andnpd   xmm3, xmm6
  andps    xmm5, xmm6
  movaps   xmm4, xmm5
  cmpnltsd xmm4, QWORD PTR __real@4330000000000000
  orps     xmm4, xmm3
  movaps   xmm3, xmm5
  addsd    xmm3, QWORD PTR __real@4338000000000000
  subsd    xmm3, QWORD PTR __real@4338000000000000
  cmpltsd  xmm5, xmm3
  andps    xmm5, xmm0
  subsd    xmm3, xmm5
  andps    xmm3, xmm4
  andps    xmm4, xmm6
  movaps   xmm6, XMMWORD PTR [rsp]
  orps     xmm3, xmm4
  movaps   xmm0, xmm3
  add      rsp, 24
  ret      0
double my_trunc(double) ENDP

double my_floor(double) PROC
  movaps   xmm1, xmm0
  xorps    xmm5, xmm5
  movsd    xmm0, QWORD PTR __real@3ff0000000000000
  movsd    xmm5, xmm1
  movaps   xmm3, xmm5
  movaps   xmm4, xmm5
  addsd    xmm4, QWORD PTR __real@4338000000000000
  cmpnltsd xmm3, QWORD PTR __real@4330000000000000
  movaps   xmm2, xmm5
  subsd    xmm4, QWORD PTR __real@4338000000000000
  cmpltsd  xmm2, xmm4
  andps    xmm2, xmm0
  subsd    xmm4, xmm2
  andps    xmm4, xmm3
  andps    xmm3, xmm5
  orps     xmm4, xmm3
  movaps   xmm0, xmm4
  ret      0
double my_floor(double) ENDP

double my_ceil(double) PROC
  movaps   xmm1, xmm0
  xorps    xmm5, xmm5
  movsd    xmm0, QWORD PTR __real@3ff0000000000000
  movsd    xmm5, xmm1
  movaps   xmm3, xmm5
  movaps   xmm4, xmm5
  addsd    xmm4, QWORD PTR __real@4338000000000000
  cmpnltsd xmm3, QWORD PTR __real@4330000000000000
  movaps   xmm2, xmm5
  subsd    xmm4, QWORD PTR __real@4338000000000000
  cmpnlesd xmm2, xmm4
  andps    xmm2, xmm0
  addsd    xmm4, xmm2
  andps    xmm4, xmm3
  andps    xmm3, xmm5
  orps     xmm4, xmm3
  movaps   xmm0, xmm4
  ret      0
double my_ceil(double) ENDP

Interpreting the Results

There is not much to add as it's pretty close to the C++ code. Some compilers add unnecessary moves, but that's it in general.

SSE4.1 Code Generation Through Intrinsics

SSE4.1 offers dedicated instructions for rounding purposes, can C++ compilers screw up?

#include <nmmintrin.h>

template<int ControlFlags>
static inline __m128d my_round_sse4_1(double x) noexcept {
  __m128d y = _mm_set_sd(x);
  return _mm_round_sd(y, y, ControlFlags | _MM_FROUND_NO_EXC);
}

double my_nearby(double x) noexcept { return _mm_cvtsd_f64(my_round_sse4_1<_MM_FROUND_CUR_DIRECTION>(x)); }
double my_trunc (double x) noexcept { return _mm_cvtsd_f64(my_round_sse4_1<_MM_FROUND_TO_ZERO      >(x)); }
double my_floor (double x) noexcept { return _mm_cvtsd_f64(my_round_sse4_1<_MM_FROUND_TO_NEG_INF   >(x)); }
double my_ceil  (double x) noexcept { return _mm_cvtsd_f64(my_round_sse4_1<_MM_FROUND_TO_POS_INF   >(x)); }

GCC [-O2 -msse4.1]

my_nearby(double):
  movq    xmm0, xmm0
  roundsd xmm0, xmm0, 12
  ret

my_trunc(double):
  movq    xmm0, xmm0
  roundsd xmm0, xmm0, 11
  ret

my_floor(double):
  movq    xmm0, xmm0
  roundsd xmm0, xmm0, 9
  ret

my_ceil(double):
  movq    xmm0, xmm0
  roundsd xmm0, xmm0, 10
  ret

Clang [-O2 -msse4.1]

my_nearby(double):
  roundsd xmm0, xmm0, 12
  ret

my_trunc(double):
  roundsd xmm0, xmm0, 11
  ret

my_floor(double):
  roundsd xmm0, xmm0, 9
  ret

my_ceil(double):
  roundsd xmm0, xmm0, 10
  ret

MSVC [/O2]

double my_nearby(double) PROC
  movaps  xmm1, xmm0
  xorps   xmm2, xmm2
  movsd   xmm2, xmm1
  roundsd xmm2, xmm2, 12
  movaps  xmm0, xmm2
  ret     0
double my_nearby(double) ENDP

double my_trunc(double) PROC
  movaps  xmm1, xmm0
  xorps   xmm2, xmm2
  movsd   xmm2, xmm1
  roundsd xmm2, xmm2, 11
  movaps  xmm0, xmm2
  ret     0
double my_trunc(double) ENDP

double my_floor(double) PROC
  movaps  xmm1, xmm0
  xorps   xmm2, xmm2
  movsd   xmm2, xmm1
  roundsd xmm2, xmm2, 9
  movaps  xmm0, xmm2
  ret     0
double my_floor(double) ENDP

double my_ceil(double) PROC
  movaps  xmm1, xmm0
  xorps   xmm2, xmm2
  movsd   xmm2, xmm1
  roundsd xmm2, xmm2, 10
  movaps  xmm0, xmm2
  ret     0
double my_ceil(double) ENDP

Interpreting the Results

Do you also have mixed feelings about the results? We have avoided function calls in all cases and that's cool, but I would say that the only compiler that does what I would expect is Clang. GCC is not that bad as it just emits the extra movq instruction that acts as vxorpd in MSVC case and I could live with that. However, what MSVC does here is pure insanity and it shows how badly designed its SIMD pipeline really is.

Conclusion

Watch your compiler especially if you use MSVC for any reason :)

Compiler Exporer

Here are some links related to the compiler exporer so you can play with it yourself:

No comments:

Post a Comment