Generate Charsmaps During Build (#597)
* Generate Charsmaps During Build
Use ICU package during build to generate CharsMaps with Sentencepiece builder. Don't include the ICU package in the Tokenizers distribution.
* Update Tests
Delete tests for word level tokenizer - model is no longer on hub.
* Keep ICU dll during build
* Update Linux Build
* Add C++17 Requirement
* Add generated charsmaps to repo
* Remove ICU Debug Build
* Link sentencepiece-train only for charsmaps regeneration
* Remove redundant charsmaps
* Reuse one charsmap function for all node types
* Update Charsmaps
* Remove reinterpret_cast
* Update Benchmark to Report Warmup Metrics
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>