04 June, 2017

Idio[ma]tic Cmake

Welcome to Trial and Error Scripting

If I counted the time I spent on figuring out why CMake doesn't work as expected I would have probably counted weeks. I don't think that CMake itself is a bad tool, however, I think that the CMake script is the most idiotic language ever invented, seriously. The other problem is also the documentation, as I have never solved anything by reading CMake docs (I probably wasted more time by reading it actually).

Consider a very small example that should describe an issue I was dealing with. We have some files that are always compiled as a part of some target, and some files that require specific compiler flags, for example `-mavx2` and some compile-time constants:

project(simple_app C CXX)          # CMake project.

set(SOURCE_FILES main.cpp)         # Source files that are always compiled.
set(CUSTOM_FLAGS -DCUSTOM_IMPL=1)  # Custom compiler flags we want to append to a specific file.

if (BUILD_CUSTOM_FILES)
  set(CUSTOM_FILES impl_avx2.cpp)
  set(AVX2_FLAGS ${CUSTOM_FLAGS} -DAVX2_AVAILABLE=1 -mavx2)
  set_property(SOURCE ${CUSTOM_FILES} APPEND PROPERTY COMPILE_FLAGS ${CUSTOM_FLAGS})

  # Add all arch-specific files to SOURCE_FILES...
  list(APPEND SOURCE_FILES ${CUSTOM_FILES})
endif()

add_executable(test_app ${SOURCE_FILES})

The problem is that it will not work and you will have hard time figuring it out. The compiler command CMake generates would look like this for compiling impl_avx2.cpp:

/usr/bin/c++ -DCUSTOM_IMPL=1;-DAVX2_AVAILABLE=1;-mavx2 -o impl_avx2.o -c impl_avx2.cpp

Which is of course completely broken and contains semicolons instead of spaces. The reason behind this is that CMake script doesn't really support arrays, all arrays are strings separated by semicolons. Actually, these two lines are the same:

set(SOMETHING A B)
set(SOMETHING "A;B")

And there is no way to distinguish between these two. To make it clearer what is happening I wrote a simple test script:

function(my_func PREFIX FIRST)
  message("${PREFIX} FIRST=${FIRST}")
  SET(ARG_INDEX 0)
  foreach(ARG_VA ${ARGN})
    message("${PREFIX} #${ARG_INDEX} ${ARG_VA}")
    math(EXPR ARG_INDEX "${ARG_INDEX}+1")
  endforeach()
endfunction()

my_func("1:" arg)
my_func("2:" arg second)
my_func("3:" arg second third)
my_func("4:" arg "second;third")
my_func("5:" arg "second third")

Which outputs:

1: FIRST=arg
2: FIRST=arg
2: #0 second
3: FIRST=arg
3: #0 second
3: #1 third
4: FIRST=arg
4: #0 second
4: #1 third
5: FIRST=arg
5: #0 second third

Okay, so we know that cmake treats semicolons as separators, so what we can do is simply foreach() each flag and append it, so let's modify the first example:

project(simple_app C CXX)

set(SOURCE_FILES main.cpp)
set(CUSTOM_FLAGS -DCUSTOM_IMPL=1)

if (BUILD_CUSTOM_FILES)
  set(CUSTOM_FILES impl_avx2.cpp)
  set(AVX2_FLAGS ${CUSTOM_FLAGS} -DAVX2_AVAILABLE=1 -mavx2)
  foreach(flag ${CUSTOM_FLAGS})
    set_property(SOURCE ${CUSTOM_FILES} APPEND PROPERTY COMPILE_FLAGS ${flag})
  endforeach()

  # Add all arch-specific files to SOURCE_FILES...
  list(APPEND SOURCE_FILES ${CUSTOM_FILES})
endif()

add_executable(test_app ${SOURCE_FILES})

Well, the output would be the same as before, just try it:

/usr/bin/c++ -DCUSTOM_IMPL=1;-DAVX2_AVAILABLE=1;-mavx2 -o impl_avx2.o -c impl_avx2.cpp

Would you expect this? CMake developers are actually aware of it and to make things even more confusing we have APPEND and APPEND_STRING options. APPEND just appends the given property making it a list, which is then stringified with the semicolons and we are at the beginning. APPEND_STRING always appends to a RAW string instead:

project(simple_app C CXX)

set(SOURCE_FILES main.cpp)
set(CUSTOM_FLAGS -DCUSTOM_IMPL=1)

if (BUILD_CUSTOM_FILES)
  set(CUSTOM_FILES impl_avx2.cpp)
  set(AVX2_FLAGS ${CUSTOM_FLAGS} -DAVX2_AVAILABLE=1 -mavx2)
  foreach(flag ${CUSTOM_FLAGS})
    set_property(SOURCE ${CUSTOM_FILES} APPEND_STRING PROPERTY COMPILE_FLAGS ${flag})
  endforeach()

  list(APPEND SOURCE_FILES ${CUSTOM_FILES})
endif()

add_executable(test_app ${SOURCE_FILES})

Which yields:

/usr/bin/c++ -DCUSTOM_IMPL=1-DAVX2_AVAILABLE=1-mavx2 -o impl_avx2.o -c impl_avx2.cpp

Cool, we got rid off semicolons but have no spaces between our flags as a side effect. The problem is that CMake's COMPILE_FLAGS is in fact a string, not a list, so to append the flag properly we must append a space before it, which will of course insert a leading space if the property was empty:

project(simple_app C CXX)

set(SOURCE_FILES main.cpp)
set(CUSTOM_FLAGS -DCUSTOM_IMPL=1)

if (BUILD_CUSTOM_FILES)
  set(CUSTOM_FILES impl_avx2.cpp)
  set(AVX2_FLAGS ${CUSTOM_FLAGS} -DAVX2_AVAILABLE=1 -mavx2)
  foreach(flag ${CUSTOM_FLAGS})
    set_property(SOURCE ${CUSTOM_FILES} APPEND_STRING PROPERTY COMPILE_FLAGS " ${flag}")
  endforeach()

  list(APPEND SOURCE_FILES ${CUSTOM_FILES})
endif()

add_executable(test_app ${SOURCE_FILES})

Which is quasi working:

/usr/bin/c++  -DCUSTOM_IMPL=1 -DAVX2_AVAILABLE=1 -mavx2 -o impl_avx2.o -c impl_avx2.cpp

Now I would like to ask you, would you write the working version at the beginning? Because for me this was simply a trial and error until I found a solution that worked; and I personally don't like this approach of solving problems.

Time to Migrate Away?

CMake should really switch to a sane language otherwise I can't see using it in the future. I have already checked Meson as it was mentioned on several sites that I visit. Is it better? It probably is, but it's another one that employs a home-grown language that you probably cannot debug and forces you to write weird shell scripts as part of your project definition. I mean why to invent a language that cannot do the task and requires to run a shell script to list files in a directory?

I'm Looking for a project generator that uses embedded JavaScript and can be debugged like a normal programming language or something really close to it. It would be similar to C/C++ syntactically and could be linted by existing tools. I don't see a reason why to invent a new language for something like a project generator. It's kind of paradox that all C/C++ project generators use languages that are not even close to C and require you to write 5 lines to implement a simple if/else construct.