Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191)
* Remove user-defined tokens which can be obtained through merges
* Remove debug line
* formatting
* Refactor spm slow -> fast converter
* revert unnecessary refactor
* set comprehension
* remove test files
* Use `vocab_scores`
* Always replace spiece underline with space in decode
* we no longer need token filtering
* Add save fast load slow unit test
* Remove tokenizers version check
* Remove duplicate code
* Make `<start_of_turn>` and `<end_of_turn>` special tokens
* Bias merge priority with length if score is the same
* Add unit test for merge priority
* CI