PR #352 Distill megatron - test Draft WIP

Distill megatron - test Draft WIP #352

younesbelkada wants to merge 175 commits into bigscience-workshop:main from younesbelkada:distill_megatron

use relu

b986a833

first commit

d8a0d94f

modify example file

27b014ad

modify

e76572b6

modify

dcbc5ca5

modify

2b278356

replace function name

20e7b3b4

hack to support teacher student loading

3145fb49

modify script

03c80b51

fix arg name

5e5ea3dc

fix arg name

06aeeb01

update args

8b55d485

modify

d082ee69

fix

90fcc4f5

change order

0555a1c0

Merge branch 'main' of https://github.com/bigscience-workshop/Megatro…

ab659e39

Merge branch 'bigscience-workshop-main' into distill_megatron

7e15c4c9

Revert "use relu"

170e55ad

add attn mask type

868fa5ce

remove unused files

4472deb4

add distill step

5c98b4a5

update print

85dc0af3

small update

1786b9e3

fix kwarg

1f3a523e

oops

d0eeca49

add kwarg

cf395683

add student

ca0f3581

add kwagr

6a933e08

fix num heads

7f920c36

uncomment assert

6f4da7be

add new args

de4bd828

modify

65039461

update to student model

26d9ab06

attempt to add teacher model

854ac094

add print statements

76c21a22

add teacher

8c301579

revert

fbd49145

attempt

adbe9d7a

add import

f3bcd588

new try

3f97a09e

fix import

9b8ae00b

try

846f187e

try

f031522c

fix

c54b751f

fix

6c457202

try

07fdbd38

try with module

58d9c34a

remove unused args

f3c838b4

add print

94c63a1d

add tuple

a9c529b6

add print

210a66d3

cfab3ce5

print rank

e21bffd1

modify

7b73794d

hack test

e0fe73b7

add ce loss

4ed1dc72

add mean

faa0bac8

add eval

067286e3

update specs

dc3402dc

add print

2ce7c804

update loss

66ab26b3

use torch.sum

29607f49

add prints

9d5ef883

add correct shapes

ad292489

remove print

da2e0161

detach logits

16234431

replace by mean

6a6aa179

change batch size

ccd71a33

change micro batch size

500d4bd1

change bs

31412a96

add teacher path

b3349fb3

add teacher load - v1

9da68d13

add student load

f30512be

discard student load

36cfbd5a

correct place checkpoint

c414a3d5

add exit

e9b96600

print( class name

d2cabf52

add not load lr scheduler

0054aa8c

put 176 config

e39156ec

try new pp

37b7ccf8

fix gpu per node

a74ca6f1

fix bf16

f34c111e

add bf16

57e82c79

put correct student size

f4406cc0

del intermediate arg

41d5d3c0

add new pp size

f9476ec1

try new tp/pp

50771b38

remove twice

0baf0878

use correct tp/pp

6ff1a194

remove exit

4ed8e82f

try

2e965d26

fix

5c179e2c

student

51003589

fix

9bd591d7

modify

c0ea4795

add print

572dd4eb

fix

e506c799

remove load

27e683d1

add exit

98c94d61

remove exit

aaf6b18b

mbs=1

9105b446

remove assert

8bd3e6c6

add prints for debug

3e97852c

remove useless prints

db6c30ff

debug

cdd6753e

debug

94243ccc

try without pos ids

38633a24

fix

749647b0

fix

741d4256

try

b507ac52

try with eval batch

d6ce38e9

debug

4134f8b5

change order

b7adeccd

quick fix

295b4fe8

fix tp

0bcd87c1

add more prints

f52162a9

exit

cab4828d

debug

4ecbede0

a900ab38

4c9dc25a

remove prints

ca2a5181

try

0e4d6649

clean up

d68844f9

rename func

b824ba89

younesbelkada commented on 2022-10-03

try pp

10639bda

refactor loss

428e7143

small model

dc67de49

remove unpack

1adcec4f

add print

3f58d4c3

change tuple

80265f7d

try with TP=2

b1bbdc90

pp = 2

9771c37a

pp=1

37cd73a6

use eval_batch

ea9cd571

try

c3c46f34

try

00f9d653

use tuple

fceb22d5

try

e161eca1

use none

bea1e304

use array

ccff1f33

try

bc943d1a

don't use iter

bff85eea

use bs of 2

30df7262

print inputs

cd3d53ac

try

df5434d5

use data instead

b54ef196

add comments

60cfaa70

add print

ecad25d6

tp=2

10678047

add more prints

cf6445e3

add exit

6938c6d8

dp tp

be466baf

add print shape

e3e7a51e

try for debug

3db071c9

use eval batch

5abb6427

add print

1ea31927

add more comments

555d48c8

use eval batch

35e62e47

try

7f7f1ff2

try outside

28c5ab55

fix

e25f04ba

try

f6372ac7

debug

f17dce24

try

7e0e3af9

import deepcopy

053081a3

remove print

490a22d8

remove unused import

0976cf07

correct arg name

bcff4421

fix pp

1c4dbdeb

tp=1

8bc5f962

prints

c05a01dc

more prints

c1b70194

test without PP

38af172b

add print

e6b6d047

remove comment

81e5df82

younesbelkada closed this 3 years ago

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

Megatron-DeepSpeed Distill megatron - test Draft WIP #352 Closed

Distill megatron - test Draft WIP #352

Megatron-DeepSpeed
Distill megatron - test Draft WIP
#352

Closed