llvm-project
45c54985 - [IR2Vec] Refactor vocabulary to use canonical type IDs (#155323)

Commit
67 days ago
[IR2Vec] Refactor vocabulary to use canonical type IDs (#155323) Refactor IR2Vec vocabulary to use canonical type IDs, improving the embedding representation for LLVM IR types. The previous implementation used raw Type::TypeID values directly in the vocabulary, which led to redundant entries (e.g., all float variants mapped to "FloatTy" but had separate slots). This change improves the vocabulary by: 1. Making the type representation more consistent by properly canonicalizing types 2. Reducing vocabulary size by eliminating redundant entries 3. Improving the embedding quality by ensuring similar types share the same representation (Tracking issue - #141817)
Author
Parents
Loading