12 July, 2016

AsmJit and Instruction Validation


AsmDB is an X86/X64 instruction database in a JSON-like format, that I started after I saw the complexity of AVX-512 instruction set. I thought, initially, that I would just add it manually to the AsmJit database, but after few hours I realized that it is extremely complex and not that straightforward as I thought.

AsmDB -> AsmJit

The solution was to create a database that contains all instructions in a similar format that is used by instruction-set manuals and to write a tool that can index the database and create all tables AsmJit is using programatically. At the moment I would say that 50% of the work is done - AsmJit tool that generates parts of x86inst.cpp file now uses AsmDB to generate a space-efficient operand tables that can be used to validate operands of any x86 and x64 instruction supported by AsmJit. This replaces the old operand tables that were basically useless as they combined all possibilities of all possible instruction encodings.

The new validation API is still a work-in-progress, but in general you can do something like this:

Error err = X86Inst::validate(
  kArchX64,                        // Architecture - kArchX86, kArchX64.
  kX86InstIdVpunpckhbw,            // Instruction id, see X86InstId enum.
  0,                               // Instruction options, see X86InstOptions enum.
  x86::xmm0, x86::xmm1, x86::eax); // Individual operands, or operands[] and count.

The call to validate() will return an error in this case, because vpunpckhbw instruction is defined for either [xmm, xmm, xmm/mem] or [ymm, ymm, ymm/mem] operands. The validator is very strict and has access to a very detailed information about every instruction - it knows about implicit operands, operands that require a specific register, and possible immediate and memory sizes. It can be used to implement an asm parser as well, and probably much more in the future.

X86Assembler enhancements

At the moment I'm still integrating the validation code into the assembler and compiler classes. What I can say is that I can simplify and remove most of the validation code from the assembler in favor of the new validation code. The reason is that the assembler was always a kind of lenient in terms of validation - it cares about performance and omits everything that is not necessary. This means, for example, that it allows something like mov eax, al and encodes it as mov eax, eax. Basically it checks the size of the destination register and doesn't care much about the source register except for its index for that particular instruction.

This gets much more problematic when using an unsafe API (API that allows to use untyped operands). It's possible to emit a ridiculous combination by doing for example a.emit(kX86InstIdAdd, x86::eax, x86::xmm5). Such combination doesn't exist at all, but AsmJit will encode it as add eax, ebp as the operands match the REG/REG signature (ebp register has the same index as xmm5), and since the 'add' instruction is only defined for GP registers the assembler can omit the register type check, because there is no typed-API that provides the 'add' instruction with such operands.

The new validation API changes the game since operands can now be validated and the assembler can validate each instruction before it actually tries to encode it. This means that such cases can be checked by the assembler without making it more complex. The only problem is that this kind of validation is more expensive, thus the assembler needs a new option to enable and disable it.

X86Compiler enhancements

Anybody who ever used AsmJit's compiler knows that it's not really trivial to debug it when something goes wrong. Compiler stores each instruction, processes it, and then serializes it to the assembler. If the instruction is invalid from the beginning it will be serialized as-is to the assembler as well, which would fail. The problem is that sometimes it's too late and you have to use debug output to figure out the exact place where the instruction was generated. The new validation API should solve most of the issues mentioned as the compiler can now validate each instruction before it stores it. This allows to find a code that misuses the compiler much faster.


There are still many things to do in AsmJit, but the library is slowly getting better and I hope that AsmJit users will find these new features useful. The good news is that since I reorganized the instruction tables there is still some space I can fill without increasing the library size. There are many things that can be put into these tables, but the first candidate is an SSE to AVX translation, which is very likely to be implemented first. Next goal is to have finalized AVX-512 of course :)

No comments:

Post a Comment