llvm-project
dafffe26 - [LLVM][MC][DecoderEmitter] Add support to specialize decoder per bitwidth (#154865)

Commit
8 days ago
[LLVM][MC][DecoderEmitter] Add support to specialize decoder per bitwidth (#154865) This change adds an option to specialize decoders per bitwidth, which can help reduce the (compiled) code size of the decoder code. **Current state**: Currently, the code generated by the decoder emitter consists of two key functions: `decodeInstruction` which is the entry point into the generated code and `decodeToMCInst` which is invoked when a decode op is reached while traversing through the decoder table. Both functions are templated on `InsnType` which is the raw instruction bits that are supplied to `decodeInstruction`. Several backends call `decodeInstruction` with different `InsnType` types, leading to several template instantiations of these functions in the final code. As an example, AMDGPU instantiates this function with type `DecoderUInt128` type for decoding 96/128-bit instructions, `uint64_t` for decoding 64-bit instructions, and `uint32_t` for decoding 32-bit instructions. Since there is just one `decodeToMCInst` in the generated code, it has code that handles decoding for *all* instruction sizes. However, the decoders emitted for different instructions sizes rarely have any intersection with each other. That means, in the AMDGPU case, the instantiation with InsnType == DecoderUInt128 has decoder code for 32/64-bit instructions that is *never exercised*. Conversely, the instantiation with InsnType == uint64_t has decoder code for 128/96/32-bit instructions that is never exercised. This leads to unnecessary dead code in the generated disassembler binary (that the compiler cannot eliminate by itself). **New state**: With this change, we introduce an option `specialize-decoders-per-bitwidth`. Under this mode, the DecoderEmitter will generate several versions of `decodeToMCInst` function, one for each bitwidth. The code is still templated, but will require backends to specify, for each `InsnType` used, the bitwidth of the instruction that the type is used to represent using a type-trait `InsnBitWidth`. This will enable the templated code to choose the right variant of `decodeToMCInst`. Under this mode, a particular instantiation will only end up instantiating a single variant of `decodeToMCInst` generated and that will include only those decoders that are applicable to a single bitwidth, resulting in elimination of the code duplication through instantiation and a reduction in code size. Additionally, under this mode, decoders are uniqued only within a given bitwidth (as opposed to across all bitwidths without this option), so the decoder index values assigned are smaller, and consume less bytes in their ULEB128 encoding. As a result, the generated decoder tables can also reduce in size. Adopt this feature for the AMDGPU and RISCV backend. In a release build, this results in a net 55% reduction in the .text size of libLLVMAMDGPUDisassembler.so and a 5% reduction in the .rodata size. For RISCV, which today uses a single `uint64_t` type, this results in a 3.7% increase in code size (expected as we instantiate the code 3 times now). Actual measured sizes are as follows: ``` Baseline commit: 72c04bb882ad70230bce309c3013d9cc2c99e9a7 Configuration: Ubuntu clang version 18.1.3, release build with asserts disabled. AMDGPU Before After Change ====================================================== .text 612327 275607 55% reduction .rodata 369728 351336 5% reduction RISCV: ====================================================== .text 47407 49187 3.7% increase .rodata 35768 35839 0.1% increase ```
Author
Parents
Loading