14 July, 2016

Introducing AsmTK

AsmTK - A Toolkit Based on AsmJit

AsmJit library provides a low-level and high-level JIT functionality that allows applications to generate code at run-time. The library was designed from scratch to be efficient and highly dynamic. Efficiency is achieved by having a single (dispatch) function that can encode all supported instructions without jumping to other helper functions. This function is actually pretty big, but I always tried to keep it organized and consistent. Dynamism is achieved by using a structure called Operand, which is a base class for any operand that can be used by the assembler, and guarantees that each Operand has the same size (16 bytes) regardless of its type and content.

The dynamic nature of AsmJit is actually what makes it much more powerful than other JIT assemblers out there. It's also a feature that makes it possible to have X86Compiler as a part of AsmJit without a significant library size increase; and it also makes it possible to create tools that use AsmJit as a base library to generate and process assembly at run-time. One missing feature that I have been frequently asked was to assemble code from a string. This is now provided by AsmTK library!

AsmParser

The AsmTK's AsmParser exploits what AsmJit offers - it parses the input string and constructs instruction operands on-the-fly, then passes the whole thing to the instruction validator, and finally passes it to the assembler itself. The AsmTK supports all instructions provided by AsmJit, because it uses AsmJit API for instruction name to id conversion and strict validation.

Here is a result of a sample application that I wrote in less than 15 minutes - it's basically on-the-fly X86/X64 instruction encoder based on AsmTK and AsmJit. You enter instruction and it tries to encode it and outputs its binary representation:

=========================================================
AsmTK-Test-Cmd - Architecture = x64 (use --x86 and --x64)
---------------------------------------------------------
Usage:
  1. Enter instruction and its operands to be encoded.
  2. Enter empty string to exit.
=========================================================
mov eax, ebx
8BC3
mov rax, rbx
488BC3
mov r15, rax
4C8BF8
cmp ah, al
3AE0
vandpd ymm0, ymm10, ymm13
C4C12D54C5
movdqa xmm0, [rax + rcx * 8 + 16]
660F6F44C810
movdqa rax, xmm0
ERROR: 0x0000000B (Illegal instruction)

The tool can be used to quickly verify if an instruction encodes correctly and also to check if the encoding is optimal (for example if AsmJit encodes it the shortest way possible, etc). I have already made two fixes in AsmJit to use shorter encoding of [mov gpq, u32 imm] and [and gpq, u32 imm] instructions.

Conclusion

The AsmTK library is a fresh piece of software that currently contains less than 1000 lines of code. It relies on AsmJit heavily and uses its new instruction validation API. It also serves as a demonstration of AsmJit capabilities that are not obvious from the AsmJit documentation.

12 July, 2016

AsmJit and Instruction Validation

AsmDB

AsmDB is an X86/X64 instruction database in a JSON-like format, that I started after I saw the complexity of AVX-512 instruction set. I thought, initially, that I would just add it manually to the AsmJit database, but after few hours I realized that it is extremely complex and not that straightforward as I thought.

AsmDB -> AsmJit

The solution was to create a database that contains all instructions in a similar format that is used by instruction-set manuals and to write a tool that can index the database and create all tables AsmJit is using programatically. At the moment I would say that 50% of the work is done - AsmJit tool that generates parts of x86inst.cpp file now uses AsmDB to generate a space-efficient operand tables that can be used to validate operands of any x86 and x64 instruction supported by AsmJit. This replaces the old operand tables that were basically useless as they combined all possibilities of all possible instruction encodings.

The new validation API is still a work-in-progress, but in general you can do something like this:

Error err = X86Inst::validate(
  kArchX64,                        // Architecture - kArchX86, kArchX64.
  kX86InstIdVpunpckhbw,            // Instruction id, see X86InstId enum.
  0,                               // Instruction options, see X86InstOptions enum.
  x86::xmm0, x86::xmm1, x86::eax); // Individual operands, or operands[] and count.

The call to validate() will return an error in this case, because vpunpckhbw instruction is defined for either [xmm, xmm, xmm/mem] or [ymm, ymm, ymm/mem] operands. The validator is very strict and has access to a very detailed information about every instruction - it knows about implicit operands, operands that require a specific register, and possible immediate and memory sizes. It can be used to implement an asm parser as well, and probably much more in the future.

X86Assembler enhancements

At the moment I'm still integrating the validation code into the assembler and compiler classes. What I can say is that I can simplify and remove most of the validation code from the assembler in favor of the new validation code. The reason is that the assembler was always a kind of lenient in terms of validation - it cares about performance and omits everything that is not necessary. This means, for example, that it allows something like mov eax, al and encodes it as mov eax, eax. Basically it checks the size of the destination register and doesn't care much about the source register except for its index for that particular instruction.

This gets much more problematic when using an unsafe API (API that allows to use untyped operands). It's possible to emit a ridiculous combination by doing for example a.emit(kX86InstIdAdd, x86::eax, x86::xmm5). Such combination doesn't exist at all, but AsmJit will encode it as add eax, ebp as the operands match the REG/REG signature (ebp register has the same index as xmm5), and since the 'add' instruction is only defined for GP registers the assembler can omit the register type check, because there is no typed-API that provides the 'add' instruction with such operands.

The new validation API changes the game since operands can now be validated and the assembler can validate each instruction before it actually tries to encode it. This means that such cases can be checked by the assembler without making it more complex. The only problem is that this kind of validation is more expensive, thus the assembler needs a new option to enable and disable it.

X86Compiler enhancements

Anybody who ever used AsmJit's compiler knows that it's not really trivial to debug it when something goes wrong. Compiler stores each instruction, processes it, and then serializes it to the assembler. If the instruction is invalid from the beginning it will be serialized as-is to the assembler as well, which would fail. The problem is that sometimes it's too late and you have to use debug output to figure out the exact place where the instruction was generated. The new validation API should solve most of the issues mentioned as the compiler can now validate each instruction before it stores it. This allows to find a code that misuses the compiler much faster.

Conclusion

There are still many things to do in AsmJit, but the library is slowly getting better and I hope that AsmJit users will find these new features useful. The good news is that since I reorganized the instruction tables there is still some space I can fill without increasing the library size. There are many things that can be put into these tables, but the first candidate is an SSE to AVX translation, which is very likely to be implemented first. Next goal is to have finalized AVX-512 of course :)