Fix BPE tokenization for weird whitespace characters (Closes #199) (#208)
* Add new tokenizer unit test (#199)
* Perform `NFKC` normalization for sentencepiece models w/ precompiled charmap
* Fix JSDoc indentation
* Add problematic string to unit tests
* Use consistent BPE split token
* Add second problematic string