refactor: large refactor of text processor
- use Unicode PAD symbol instead of _ for padding
- move text_processor out of init
- refactor all TextProcessor methods
- refactor pf calculator to use numpy and be consistent wrt punctuation
- update tests
- add support for blank space and underscore in ipatok.tokenise