Megatron-DeepSpeed
Distill megatron - test Draft WIP
#352
Closed

Distill megatron - test Draft WIP #352

younesbelkada
younesbelkada use relu
b986a833
younesbelkada first commit
d8a0d94f
younesbelkada modify example file
27b014ad
younesbelkada modify
e76572b6
younesbelkada modify
dcbc5ca5
younesbelkada modify
2b278356
younesbelkada replace function name
20e7b3b4
younesbelkada hack to support teacher student loading
3145fb49
younesbelkada modify script
03c80b51
younesbelkada fix arg name
5e5ea3dc
younesbelkada fix arg name
06aeeb01
younesbelkada update args
8b55d485
younesbelkada modify
d082ee69
younesbelkada fix
90fcc4f5
younesbelkada change order
0555a1c0
younesbelkada Merge branch 'main' of https://github.com/bigscience-workshop/Megatro…
ab659e39
younesbelkada Merge branch 'bigscience-workshop-main' into distill_megatron
7e15c4c9
younesbelkada Revert "use relu"
170e55ad
younesbelkada add attn mask type
868fa5ce
younesbelkada remove unused files
4472deb4
younesbelkada add distill step
5c98b4a5
younesbelkada update print
85dc0af3
younesbelkada small update
1786b9e3
younesbelkada fix kwarg
1f3a523e
younesbelkada oops
d0eeca49
younesbelkada add kwarg
cf395683
younesbelkada add student
ca0f3581
younesbelkada add kwagr
6a933e08
younesbelkada fix num heads
7f920c36
younesbelkada uncomment assert
6f4da7be
younesbelkada add new args
de4bd828
younesbelkada modify
65039461
younesbelkada update to student model
26d9ab06
younesbelkada attempt to add teacher model
854ac094
younesbelkada add print statements
76c21a22
younesbelkada add teacher
8c301579
younesbelkada revert
fbd49145
younesbelkada attempt
adbe9d7a
younesbelkada add import
f3bcd588
younesbelkada new try
3f97a09e
younesbelkada fix import
9b8ae00b
younesbelkada try
846f187e
younesbelkada try
f031522c
younesbelkada fix
c54b751f
younesbelkada fix
6c457202
younesbelkada try
07fdbd38
younesbelkada try with module
58d9c34a
younesbelkada remove unused args
f3c838b4
younesbelkada add print
94c63a1d
younesbelkada add tuple
a9c529b6
younesbelkada add print
210a66d3
younesbelkada print
cfab3ce5
younesbelkada print rank
e21bffd1
younesbelkada modify
7b73794d
younesbelkada hack test
e0fe73b7
younesbelkada add ce loss
4ed1dc72
younesbelkada add mean
faa0bac8
younesbelkada add eval
067286e3
younesbelkada update specs
dc3402dc
younesbelkada add print
2ce7c804
younesbelkada update loss
66ab26b3
younesbelkada use torch.sum
29607f49
younesbelkada add prints
9d5ef883
younesbelkada add correct shapes
ad292489
younesbelkada remove print
da2e0161
younesbelkada detach logits
16234431
younesbelkada replace by mean
6a6aa179
younesbelkada change batch size
ccd71a33
younesbelkada change micro batch size
500d4bd1
younesbelkada change bs
31412a96
younesbelkada add teacher path
b3349fb3
younesbelkada add teacher load - v1
9da68d13
younesbelkada add student load
f30512be
younesbelkada discard student load
36cfbd5a
younesbelkada correct place checkpoint
c414a3d5
younesbelkada add exit
e9b96600
younesbelkada print( class name
d2cabf52
younesbelkada add not load lr scheduler
0054aa8c
younesbelkada put 176 config
e39156ec
younesbelkada try new pp
37b7ccf8
younesbelkada fix gpu per node
a74ca6f1
younesbelkada fix bf16
f34c111e
younesbelkada add bf16
57e82c79
younesbelkada put correct student size
f4406cc0
younesbelkada del intermediate arg
41d5d3c0
younesbelkada add new pp size
f9476ec1
younesbelkada try new tp/pp
50771b38
younesbelkada remove twice
0baf0878
younesbelkada use correct tp/pp
6ff1a194
younesbelkada remove exit
4ed8e82f
younesbelkada try
2e965d26
younesbelkada fix
5c179e2c
younesbelkada student
51003589
younesbelkada fix
9bd591d7
younesbelkada modify
c0ea4795
younesbelkada add print
572dd4eb
younesbelkada fix
e506c799
younesbelkada remove load
27e683d1
younesbelkada add exit
98c94d61
younesbelkada remove exit
aaf6b18b
younesbelkada mbs=1
9105b446
younesbelkada remove assert
8bd3e6c6
younesbelkada add prints for debug
3e97852c
younesbelkada remove useless prints
db6c30ff
younesbelkada debug
cdd6753e
younesbelkada debug
94243ccc
younesbelkada try without pos ids
38633a24
younesbelkada fix
749647b0
younesbelkada fix
741d4256
younesbelkada try
b507ac52
younesbelkada try with eval batch
d6ce38e9
younesbelkada debug
4134f8b5
younesbelkada change order
b7adeccd
younesbelkada quick fix
295b4fe8
younesbelkada fix tp
0bcd87c1
younesbelkada add more prints
f52162a9
younesbelkada exit
cab4828d
younesbelkada debug
4ecbede0
younesbelkada print
a900ab38
younesbelkada print
4c9dc25a
younesbelkada remove prints
ca2a5181
younesbelkada try
0e4d6649
younesbelkada clean up
d68844f9
younesbelkada rename func
b824ba89
younesbelkada
younesbelkada commented on 2022-10-03
younesbelkada try pp
10639bda
younesbelkada refactor loss
428e7143
younesbelkada small model
dc67de49
younesbelkada remove unpack
1adcec4f
younesbelkada add print
3f58d4c3
younesbelkada change tuple
80265f7d
younesbelkada try with TP=2
b1bbdc90
younesbelkada pp = 2
9771c37a
younesbelkada pp=1
37cd73a6
younesbelkada use eval_batch
ea9cd571
younesbelkada try
c3c46f34
younesbelkada try
00f9d653
younesbelkada use tuple
fceb22d5
younesbelkada try
e161eca1
younesbelkada use none
bea1e304
younesbelkada use array
ccff1f33
younesbelkada try
bc943d1a
younesbelkada don't use iter
bff85eea
younesbelkada use bs of 2
30df7262
younesbelkada print inputs
cd3d53ac
younesbelkada try
df5434d5
younesbelkada use data instead
b54ef196
younesbelkada add comments
60cfaa70
younesbelkada add print
ecad25d6
younesbelkada tp=2
10678047
younesbelkada add more prints
cf6445e3
younesbelkada add exit
6938c6d8
younesbelkada dp tp
be466baf
younesbelkada add print shape
e3e7a51e
younesbelkada try for debug
3db071c9
younesbelkada use eval batch
5abb6427
younesbelkada add print
1ea31927
younesbelkada add more comments
555d48c8
younesbelkada use eval batch
35e62e47
younesbelkada try
7f7f1ff2
younesbelkada try outside
28c5ab55
younesbelkada fix
e25f04ba
younesbelkada try
f6372ac7
younesbelkada debug
f17dce24
younesbelkada try
7e0e3af9
younesbelkada import deepcopy
053081a3
younesbelkada remove print
490a22d8
younesbelkada remove unused import
0976cf07
younesbelkada correct arg name
bcff4421
younesbelkada fix pp
1c4dbdeb
younesbelkada tp=1
8bc5f962
younesbelkada prints
c05a01dc
younesbelkada more prints
c1b70194
younesbelkada test without PP
38af172b
younesbelkada add print
e6b6d047
younesbelkada remove comment
81e5df82
younesbelkada younesbelkada closed this 3 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone