Add spaces_between_special_tokens and cleanup tokenization spaces #1095
Add spaces_between_special_tokens argument to decode
9a1cc310
update non optionnal args
de95525d
more fixes
bdae9ff7
update doc
d0c33312
style
45651297
missing coma
5be06cca
cargo fmt
1ed8a966
update
db34331b
add missing arg again in node native
be0b3742
ArthurZucker
marked this pull request as ready for review 3 years ago
small nit
3b4ee082
Merge branch 'main' of https://github.com/huggingface/tokenizers into…
ffb4afe8
Narsil
commented
on 2022-12-15
Update bindings/python/py_src/tokenizers/implementations/base_tokeniz…
6a1f005f
update code, argument was missinging in init
38e28266
some styling
14291821
styling on bindings python
d0b4ed9c
update tokenizer
e2aa9534
add arg to bindings
214be003
update code
36da76a8
Merge branch 'main' of https://github.com/huggingface/tokenizers into…
a30a0db3
typo
aa5da6f7
lint binding nodes
b337a792
style
1f881dda
fix typo in argument
ac01425d
update pipeline tests
e4438a00
default to True everywhere
4e5235a2
update test
18509810
fix test
990d5242
fmnt tokenizer native
895b20e6
update stub
f46e4713
add tests in rust
4d008c33
add cleanup_tokenization_spaces argument
914f5dd6
fix
de78d9cb
style python code
9e0325e9
update init
be14c101
clippy
35d5807b
ArthurZucker
changed the title Add spaces_between_special_tokens argument to decode Add spaces_between_special_tokens and cleanup tokenization spaces 2 years ago
simpler function
718a747f
small nit
432ae1c0
full update
14b145fe
style
724a0d49
clippy
72b9fe62
clippy last files
f1b2a433
update init
09530ba3
update
067d1b80
update mod.rs
f356b4af
add getter
4abe82d3
dont add special tokens getter in this PR
8a6a03a8
add `get_added_tokens` to init
3800288f
style
913896cb
correct the names of the attribute: spaces_between_added_tokens -> sp…
bb565db2
decalre added tokens
330c4b21
Update tokenizers/tests/documentation.rs
57c6376f
update etst
95ee6602
update mod code
c3ba21b7
spaces between added tokens
58a7c7ee
Merge branch 'add_spaces_between_special_tokens_arg' of https://githu…
7bc08414
more updates
bff4f56b
remove cleanup ptokenizations paces for lestt breaking changes
10c31225
update tests and cli
ad1f0853
style
436ea14b
if custome decoder, dont push spaces
b883cb10
Merge branch 'main' into add_spaces_between_special_tokens_arg
cab37bad
update name
0953b927
Merge branch 'add_spaces_between_special_tokens_arg' of https://githu…
c73e8a85
lint
1d41a457
revert breaking changes
51837f1b
Narsil
commented
on 2023-06-12
refactor based on reviez
4154d90a
remove slow functoin
5475d3d6
big refactor
6d8318f0
lint
3f6bd32b
more linting
bb82e619
nits
e1a90d87
update to build
35d3c7f0
Merge branch 'main' of https://github.com/huggingface/tokenizers into…
3fe30407
fix binfings
8f72420a
updates
3cadd54d
style
29fb13f0
update init bindings
aaea3055
Narsil
commented
on 2023-06-12
update tests
90c09695
more in depth testing
ce19d51c
update
c23fea1c
refactor
4515cd53
fmrt
43b00511
nits
fdc3313e
default to True
a97269e7
simplify code
1f2ba2e1
cleanup
82b91b81
todo
461120f1
use state machine
49832e7b
python stub.py
e39dbf23
fixing
b72ed33c
test with decoder
fa002b28
update test
4ae308b0
style
386ec4d2
correct fmt
108a4852
fix fmt
5c001a01
fix failihng test
849291a2
revert removing , spaces_between_added_tokens = True from init
137f9200
update styles and pystub
9bf39350
cargo +stable fmt --manifest-path ./bindings/python/Cargo.toml
e65d3a9c
fix clippy
514444e6
make clippy happy
d38cc575
fix test
f9ba4d6d
finalk
c4ffb773
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub