first pass
9f107d37
trigger ci on dummy channel and dummy repo id
4cf64ab1
add a model jobs wokflow and reduce model splits
f10b1960
add runner scale set
7815fc4a
add docker image
7896ef36
test with model jobs only
8dcbe6b8
remove mounting
584e697e
from model jobs as well
25e19127
test again
67c566f6
check
3e79fdc0
use runner groups
2d3cd447
checkout transformers code
1d2e747f
remove clean up
ab7bd005
use curr dir
920e1bf0
test again
50b280b5
test single workflow
1a1e70aa
test
faa3d032
test withput the ids
d6dfa081
fix
be06f2ec
again
7b5ba3a6
test
f9de3144
fix
7174fe01
fix
4c39c9c4
matrix folders
544222c5
two splits
75347394
fix
98d53b69
test
ee94b556
fix
1b45fde5
run other tests
7bbde0ee
fix dep
e55636d3
use canonical job names
79caf899
fix and non lazy mode
cd30bc39
add fsdp tests and disable model ci entirely for now
003cdce6
add librosa and soundfile
27102d59
use model jobs for fsdp tests
d881b66f
test model jobs as well
039a9b8c
fix
d0d0fb12
quant matrix
0308c71c
fix
249b0780
remove omp num threads
0ff48353
default to hpu_backend if not passed to torch.compile
cff1bb66
fix
7bc0e12e
enable int64
401ed188
test mounting the cache
f1f843d2
remove parallelism flags
8847e35a
fix device dispatch
a97732b5
fix sdpa atol/rtol for hpu
edd35f14
force hpu_backend all the time
8b6d0a02
add run_first decorator and disable fsdp2
7bc8e378
add deepspeed run_first decorators
d9c2cb70
fix multiprocessing on habana
38b35958
fix more distributed tests that require running first
23b34de0
Merge branch 'main' into gaudi-ci
95b9ac6e
fix machine types
bf706f6b
Merge branch 'main' into gaudi-ci
b3ec8e8c
skip parallelism tests
db46c855
use new slack channel and report repo
ab951b1e
add cap_sys
a8ec639b
fix bug in test_trainer_distributed
e385d118
push
4189a566
remove forced test splits
4e97c343
Merge branch 'main' into gaudi-ci
4cdb35b9
ydshieh
approved these changes
on 2025-06-19
added comment for hpu_backend_compile patch
9301161e
added comment for squad_convert_examples_to_features patch
60bbe09f
test
30a69ede
remove require_torch_gpu from fsdpv2 tests
eab20c99
update synapse ai version
e9c4cee0
fix fp8
a0c10767
run all models
fb986b27
add parallelism
6e3f5994
style
0eb7a80b
regisss
approved these changes
on 2025-06-20
Merge branch 'main' into gaudi-ci
d7478dcd
Apply suggestions from code review
9d867f16
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub