Distill megatron - test Draft WIP #352
use relu
b986a833
first commit
d8a0d94f
modify example file
27b014ad
modify
e76572b6
modify
dcbc5ca5
modify
2b278356
replace function name
20e7b3b4
hack to support teacher student loading
3145fb49
modify script
03c80b51
fix arg name
5e5ea3dc
fix arg name
06aeeb01
update args
8b55d485
modify
d082ee69
fix
90fcc4f5
change order
0555a1c0
Merge branch 'main' of https://github.com/bigscience-workshop/Megatro…
ab659e39
Merge branch 'bigscience-workshop-main' into distill_megatron
7e15c4c9
Revert "use relu"
170e55ad
add attn mask type
868fa5ce
remove unused files
4472deb4
add distill step
5c98b4a5
update print
85dc0af3
small update
1786b9e3
fix kwarg
1f3a523e
oops
d0eeca49
add kwarg
cf395683
add student
ca0f3581
add kwagr
6a933e08
fix num heads
7f920c36
uncomment assert
6f4da7be
add new args
de4bd828
modify
65039461
update to student model
26d9ab06
attempt to add teacher model
854ac094
add print statements
76c21a22
add teacher
8c301579
revert
fbd49145
attempt
adbe9d7a
add import
f3bcd588
new try
3f97a09e
fix import
9b8ae00b
try
846f187e
try
f031522c
fix
c54b751f
fix
6c457202
try
07fdbd38
try with module
58d9c34a
remove unused args
f3c838b4
add print
94c63a1d
add tuple
a9c529b6
add print
210a66d3
print
cfab3ce5
print rank
e21bffd1
modify
7b73794d
hack test
e0fe73b7
add ce loss
4ed1dc72
add mean
faa0bac8
add eval
067286e3
update specs
dc3402dc
add print
2ce7c804
update loss
66ab26b3
use torch.sum
29607f49
add prints
9d5ef883
add correct shapes
ad292489
remove print
da2e0161
detach logits
16234431
replace by mean
6a6aa179
change batch size
ccd71a33
change micro batch size
500d4bd1
change bs
31412a96
add teacher path
b3349fb3
add teacher load - v1
9da68d13
add student load
f30512be
discard student load
36cfbd5a
correct place checkpoint
c414a3d5
add exit
e9b96600
print( class name
d2cabf52
add not load lr scheduler
0054aa8c
put 176 config
e39156ec
try new pp
37b7ccf8
fix gpu per node
a74ca6f1
fix bf16
f34c111e
add bf16
57e82c79
put correct student size
f4406cc0
del intermediate arg
41d5d3c0
add new pp size
f9476ec1
try new tp/pp
50771b38
remove twice
0baf0878
use correct tp/pp
6ff1a194
remove exit
4ed8e82f
try
2e965d26
fix
5c179e2c
student
51003589
fix
9bd591d7
modify
c0ea4795
add print
572dd4eb
fix
e506c799
remove load
27e683d1
add exit
98c94d61
remove exit
aaf6b18b
mbs=1
9105b446
remove assert
8bd3e6c6
add prints for debug
3e97852c
remove useless prints
db6c30ff
debug
cdd6753e
debug
94243ccc
try without pos ids
38633a24
fix
749647b0
fix
741d4256
try
b507ac52
try with eval batch
d6ce38e9
debug
4134f8b5
change order
b7adeccd
quick fix
295b4fe8
fix tp
0bcd87c1
add more prints
f52162a9
exit
cab4828d
debug
4ecbede0
print
a900ab38
print
4c9dc25a
remove prints
ca2a5181
try
0e4d6649
clean up
d68844f9
rename func
b824ba89
try pp
10639bda
refactor loss
428e7143
small model
dc67de49
remove unpack
1adcec4f
add print
3f58d4c3
change tuple
80265f7d
try with TP=2
b1bbdc90
pp = 2
9771c37a
pp=1
37cd73a6
use eval_batch
ea9cd571
try
c3c46f34
try
00f9d653
use tuple
fceb22d5
try
e161eca1
use none
bea1e304
use array
ccff1f33
try
bc943d1a
don't use iter
bff85eea
use bs of 2
30df7262
print inputs
cd3d53ac
try
df5434d5
use data instead
b54ef196
add comments
60cfaa70
add print
ecad25d6
tp=2
10678047
add more prints
cf6445e3
add exit
6938c6d8
dp tp
be466baf
add print shape
e3e7a51e
try for debug
3db071c9
use eval batch
5abb6427
add print
1ea31927
add more comments
555d48c8
use eval batch
35e62e47
try
7f7f1ff2
try outside
28c5ab55
fix
e25f04ba
try
f6372ac7
debug
f17dce24
try
7e0e3af9
import deepcopy
053081a3
remove print
490a22d8
remove unused import
0976cf07
correct arg name
bcff4421
fix pp
1c4dbdeb
tp=1
8bc5f962
prints
c05a01dc
more prints
c1b70194
test without PP
38af172b
add print
e6b6d047
remove comment
81e5df82
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub