transformers
parallelism goes brrr
#37877
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
59
Changes
View On
GitHub
parallelism goes brrr
#37877
ArthurZucker
merged 59 commits into
main
from
nouamane/nanotron
accept custom device_mesh
3d90a99d
fix device_map
df1eaee8
assert that num_heads % tp_size == 0
b9298864
todo.
1df751bc
ReplicateParallel
5887ffc1
handle tied weights
924cceec
handle dtensor in save_pretrained with safe_serialization
cfacec55
tp test works
98333058
doesnt work
7d7b3636
ArthurZucker
added
Tensor Parallel
ArthurZucker
added
Core: Modeling
ArthurZucker
commented on 2025-05-01
S1ro1
commented on 2025-05-01
fix shard_and_distribute_module's rank should be local_rank
11f02a59
tp=4 is correct
317c0276
dp+tp is broken
f3b4ae81
todo allreduce with dtensors on another dim is annoying
f6a49ee8
workaround to sync dp grads when using dtensors
eaa65921
loading a checkpoint works
7c6219bc
wandb and compare losses with different tp/dp
6ceabe01
cleaning
a9a15925
NouamaneTazi
requested a review
from
ArthurZucker
255 days ago
NouamaneTazi
requested a review
from
S1ro1
255 days ago
NouamaneTazi
marked this pull request as ready for review
255 days ago
cleaning
4e323a51
qubvel
commented on 2025-05-02
.
7f327b13
.
c3e5c5ed
logs
810bd51a
CP2 DP2 no mask works after commenting attn_mask and is_causal from s…
82348732
DP=2 TP=2 now works even with tied embeddings
29c2a9ca
model.parameters() and model.module.parameters() are empty..
8fa760be
reformat sanity_check_tensor_sync
610e6bb0
set atol=1e-4 for CP to pass
75cad51d
try populate _parameters from named_modules
b816a3cc
refactors
688107c0
is_causal=True and pack sequences, no attn mask, and preshuffle dataset
cfe688b4
fix packing
83095210
CP=4 doesn't work
c0f616ee
fix labels and position_ids for CP
011d981e
DP CP works with transformers 🥳🥳🥳
265f90dc
refactor
afa72e24
add example cp
75176794
fixup
835726da
revert sdpa changes
0ad2a156
example cleared
5b119645
add CP, DP to the mesh init
7855d102
nit
0b2bd157
ArthurZucker
commented on 2025-05-15
clean
c82d39ce
use `ALL_PARALLEL_STYLES`
957c351e
Merge branch 'nouamane/nanotron' of github.com:huggingface/transforme…
6d462e9f
style
43c175d0
FSDP works
378b2e7b
log on 1 rank
30752c63
.
9c1e1fc2
fix?
3f683b6e
Merge branch 'nouamane/nanotron' of github.com:huggingface/transforme…
d36acced
FSDP1 also has .parameters() bug
780d74d3
reported gradnorm when using FSDP1 is wrong, but loss is correct so i…
9e549694
.
ba01287a
style and fixup
677ce533
move stuff around
81c21de9
Merge branch 'main' of github.com:huggingface/transformers into nouam…
656277c5
fix tests
e27ddb85
style
d702d94d
let's make it a check
5083c0b0
warning should be an info
67a81826
ArthurZucker
enabled auto-merge (squash)
237 days ago
disabled auto-merge
237 days ago
Manually disabled by user
ArthurZucker
merged
1c2f36b4
into main
237 days ago
ArthurZucker
deleted the nouamane/nanotron branch
237 days ago
LysandreJik
restored the head branch
237 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
ArthurZucker
S1ro1
qubvel
Assignees
No one assigned
Labels
Core: Modeling
Tensor Parallel
Milestone
No milestone
Login to write a write a comment.
Login via GitHub