tokenizers
Add spaces_between_special_tokens and cleanup tokenization spaces
#1095
Closed

Add spaces_between_special_tokens and cleanup tokenization spaces #1095

ArthurZucker
ArthurZucker Add spaces_between_special_tokens argument to decode
9a1cc310
HuggingFaceDocBuilderDev
ArthurZucker update non optionnal args
de95525d
ArthurZucker more fixes
bdae9ff7
ArthurZucker update doc
d0c33312
ArthurZucker style
45651297
ArthurZucker missing coma
5be06cca
ArthurZucker cargo fmt
1ed8a966
ArthurZucker update
db34331b
ArthurZucker add missing arg again in node native
be0b3742
ArthurZucker ArthurZucker marked this pull request as ready for review 3 years ago
ArthurZucker small nit
3b4ee082
ArthurZucker Merge branch 'main' of https://github.com/huggingface/tokenizers into…
ffb4afe8
ArthurZucker ArthurZucker requested a review from Narsil Narsil 3 years ago
Narsil
Narsil commented on 2022-12-15
ArthurZucker Update bindings/python/py_src/tokenizers/implementations/base_tokeniz…
6a1f005f
ArthurZucker update code, argument was missinging in init
38e28266
ArthurZucker some styling
14291821
ArthurZucker styling on bindings python
d0b4ed9c
ArthurZucker update tokenizer
e2aa9534
ArthurZucker add arg to bindings
214be003
ArthurZucker update code
36da76a8
ArthurZucker Merge branch 'main' of https://github.com/huggingface/tokenizers into…
a30a0db3
ArthurZucker typo
aa5da6f7
ArthurZucker lint binding nodes
b337a792
ArthurZucker style
1f881dda
ArthurZucker fix typo in argument
ac01425d
ArthurZucker update pipeline tests
e4438a00
ArthurZucker default to True everywhere
4e5235a2
ArthurZucker update test
18509810
ArthurZucker fix test
990d5242
ArthurZucker fmnt tokenizer native
895b20e6
ArthurZucker update stub
f46e4713
ArthurZucker add tests in rust
4d008c33
ArthurZucker add cleanup_tokenization_spaces argument
914f5dd6
ArthurZucker fix
de78d9cb
ArthurZucker style python code
9e0325e9
ArthurZucker update init
be14c101
ArthurZucker clippy
35d5807b
ArthurZucker ArthurZucker changed the title Add spaces_between_special_tokens argument to decode Add spaces_between_special_tokens and cleanup tokenization spaces 2 years ago
ArthurZucker simpler function
718a747f
ArthurZucker small nit
432ae1c0
ArthurZucker full update
14b145fe
ArthurZucker style
724a0d49
ArthurZucker clippy
72b9fe62
ArthurZucker clippy last files
f1b2a433
ArthurZucker update init
09530ba3
ArthurZucker ArthurZucker requested a review from Narsil Narsil 2 years ago
RomanCast
ArthurZucker
lucasjinreal
ArthurZucker
ArthurZucker update
067d1b80
ArthurZucker update mod.rs
f356b4af
ArthurZucker add getter
4abe82d3
ArthurZucker dont add special tokens getter in this PR
8a6a03a8
ArthurZucker add `get_added_tokens` to init
3800288f
ArthurZucker style
913896cb
ArthurZucker correct the names of the attribute: spaces_between_added_tokens -> sp…
bb565db2
ArthurZucker decalre added tokens
330c4b21
ArthurZucker
ArthurZucker commented on 2023-06-08
ArthurZucker Update tokenizers/tests/documentation.rs
57c6376f
ArthurZucker update etst
95ee6602
ArthurZucker
ArthurZucker commented on 2023-06-09
ArthurZucker update mod code
c3ba21b7
ArthurZucker spaces between added tokens
58a7c7ee
ArthurZucker Merge branch 'add_spaces_between_special_tokens_arg' of https://githu…
7bc08414
ArthurZucker more updates
bff4f56b
ArthurZucker remove cleanup ptokenizations paces for lestt breaking changes
10c31225
ArthurZucker update tests and cli
ad1f0853
ArthurZucker style
436ea14b
ArthurZucker if custome decoder, dont push spaces
b883cb10
ArthurZucker Merge branch 'main' into add_spaces_between_special_tokens_arg
cab37bad
ArthurZucker update name
0953b927
ArthurZucker Merge branch 'add_spaces_between_special_tokens_arg' of https://githu…
c73e8a85
ArthurZucker lint
1d41a457
ArthurZucker revert breaking changes
51837f1b
Narsil
Narsil commented on 2023-06-12
ArthurZucker refactor based on reviez
4154d90a
ArthurZucker remove slow functoin
5475d3d6
ArthurZucker big refactor
6d8318f0
ArthurZucker lint
3f6bd32b
ArthurZucker more linting
bb82e619
ArthurZucker nits
e1a90d87
ArthurZucker update to build
35d3c7f0
ArthurZucker Merge branch 'main' of https://github.com/huggingface/tokenizers into…
3fe30407
ArthurZucker fix binfings
8f72420a
ArthurZucker updates
3cadd54d
ArthurZucker style
29fb13f0
ArthurZucker update init bindings
aaea3055
Narsil
Narsil commented on 2023-06-12
ArthurZucker update tests
90c09695
ArthurZucker more in depth testing
ce19d51c
ArthurZucker update
c23fea1c
ArthurZucker refactor
4515cd53
ArthurZucker fmrt
43b00511
ArthurZucker nits
fdc3313e
ArthurZucker default to True
a97269e7
ArthurZucker simplify code
1f2ba2e1
ArthurZucker cleanup
82b91b81
ArthurZucker todo
461120f1
ArthurZucker use state machine
49832e7b
ArthurZucker python stub.py
e39dbf23
ArthurZucker fixing
b72ed33c
ArthurZucker test with decoder
fa002b28
ArthurZucker update test
4ae308b0
ArthurZucker style
386ec4d2
ArthurZucker correct fmt
108a4852
ArthurZucker fix fmt
5c001a01
ArthurZucker fix failihng test
849291a2
ArthurZucker revert removing , spaces_between_added_tokens = True from init
137f9200
ArthurZucker update styles and pystub
9bf39350
ArthurZucker cargo +stable fmt --manifest-path ./bindings/python/Cargo.toml
e65d3a9c
ArthurZucker fix clippy
514444e6
ArthurZucker make clippy happy
d38cc575
ArthurZucker fix test
f9ba4d6d
ArthurZucker finalk
c4ffb773
ArthurZucker
ArthurZucker ArthurZucker closed this 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone