transformers
parallelism goes brrr
#37877
Merged

parallelism goes brrr #37877

ArthurZucker merged 59 commits into main from nouamane/nanotron
NouamaneTazi
NouamaneTazi accept custom device_mesh
3d90a99d
HuggingFaceDocBuilderDev
NouamaneTazi fix device_map
df1eaee8
NouamaneTazi assert that num_heads % tp_size == 0
b9298864
Rocketknight1
NouamaneTazi todo.
1df751bc
NouamaneTazi ReplicateParallel
5887ffc1
NouamaneTazi handle tied weights
924cceec
NouamaneTazi handle dtensor in save_pretrained with safe_serialization
cfacec55
NouamaneTazi tp test works
98333058
NouamaneTazi doesnt work
7d7b3636
ArthurZucker ArthurZucker added Tensor Parallel
ArthurZucker ArthurZucker added Core: Modeling
ArthurZucker
ArthurZucker commented on 2025-05-01
S1ro1
S1ro1 commented on 2025-05-01
NouamaneTazi fix shard_and_distribute_module's rank should be local_rank
11f02a59
NouamaneTazi tp=4 is correct
317c0276
NouamaneTazi dp+tp is broken
f3b4ae81
NouamaneTazi todo allreduce with dtensors on another dim is annoying
f6a49ee8
NouamaneTazi workaround to sync dp grads when using dtensors
eaa65921
NouamaneTazi loading a checkpoint works
7c6219bc
NouamaneTazi wandb and compare losses with different tp/dp
6ceabe01
NouamaneTazi cleaning
a9a15925
NouamaneTazi NouamaneTazi requested a review from ArthurZucker ArthurZucker 255 days ago
NouamaneTazi NouamaneTazi requested a review from S1ro1 S1ro1 255 days ago
NouamaneTazi NouamaneTazi marked this pull request as ready for review 255 days ago
NouamaneTazi cleaning
4e323a51
qubvel
qubvel commented on 2025-05-02
NouamaneTazi .
7f327b13
NouamaneTazi .
c3e5c5ed
NouamaneTazi logs
810bd51a
NouamaneTazi CP2 DP2 no mask works after commenting attn_mask and is_causal from s…
82348732
NouamaneTazi DP=2 TP=2 now works even with tied embeddings
29c2a9ca
NouamaneTazi model.parameters() and model.module.parameters() are empty..
8fa760be
NouamaneTazi reformat sanity_check_tensor_sync
610e6bb0
NouamaneTazi set atol=1e-4 for CP to pass
75cad51d
NouamaneTazi try populate _parameters from named_modules
b816a3cc
NouamaneTazi refactors
688107c0
NouamaneTazi is_causal=True and pack sequences, no attn mask, and preshuffle dataset
cfe688b4
NouamaneTazi fix packing
83095210
NouamaneTazi CP=4 doesn't work
c0f616ee
NouamaneTazi fix labels and position_ids for CP
011d981e
NouamaneTazi DP CP works with transformers 🥳🥳🥳
265f90dc
kmehant
ArthurZucker refactor
afa72e24
ArthurZucker add example cp
75176794
ArthurZucker fixup
835726da
ArthurZucker revert sdpa changes
0ad2a156
ArthurZucker example cleared
5b119645
ArthurZucker add CP, DP to the mesh init
7855d102
ArthurZucker nit
0b2bd157
ArthurZucker
ArthurZucker commented on 2025-05-15
NouamaneTazi clean
c82d39ce
ArthurZucker use `ALL_PARALLEL_STYLES`
957c351e
ArthurZucker Merge branch 'nouamane/nanotron' of github.com:huggingface/transforme…
6d462e9f
ArthurZucker style
43c175d0
NouamaneTazi FSDP works
378b2e7b
NouamaneTazi log on 1 rank
30752c63
NouamaneTazi .
9c1e1fc2
ArthurZucker fix?
3f683b6e
ArthurZucker Merge branch 'nouamane/nanotron' of github.com:huggingface/transforme…
d36acced
NouamaneTazi FSDP1 also has .parameters() bug
780d74d3
NouamaneTazi reported gradnorm when using FSDP1 is wrong, but loss is correct so i…
9e549694
NouamaneTazi .
ba01287a
NouamaneTazi
ArthurZucker style and fixup
677ce533
ArthurZucker move stuff around
81c21de9
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into nouam…
656277c5
ArthurZucker fix tests
e27ddb85
ArthurZucker style
d702d94d
ArthurZucker let's make it a check
5083c0b0
ArthurZucker
ArthurZucker warning should be an info
67a81826
ArthurZucker ArthurZucker enabled auto-merge (squash) 237 days ago
disabled auto-merge 237 days ago
Manually disabled by user
ArthurZucker ArthurZucker merged 1c2f36b4 into main 237 days ago
ArthurZucker ArthurZucker deleted the nouamane/nanotron branch 237 days ago
manueldeprada
manueldeprada
LysandreJik LysandreJik restored the head branch 237 days ago
ArthurZucker
ydshieh

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone