PR #2439 Train mem usage and other improvements

fix track_max_mem in forward_batch_wo_cache_flash_attn_train

5d124d0c

remove unnecessary Adam(W) optimizer tensors.

d39c8e68

add gradient clipping to AdamW

d395b19c

Fix reset of unused g->nodes and g->grads to NULL

d7003a98

implement gradient checkpointing for training

6e3f95bf

remove unused compute buffer 3

e05e4414

add and use function ggml_build_backward_expand to avoid stack overfl…

ed4319e1

change AdamW decay parameter to work like the torch AdamW decay param…

a80f184e

change default AdamW weight decay parameter used in training to 0.1 a…

f175ead6

change default AdamW weight decay parameter defined in ggml to 0.0, m…

97964a4c

bug fixes for cross entropy loss

2c6985f7

fix test-grad0 for cross_entropy_loss

2d1e6e06

fix test-grad0 for soft_max

864e7e3a

improve finite differences of test-grad0 by using double instead of f…

87febeec

change cross_entropy_loss to output average over all rows

51dc7709

improve gradient checkpointing

3744a9be

disable gradient checkpointing debug output

fc379a2d

llama : fix rope usage in train-text-from-scratch after ChatGLM change

d0fbb7d3

add more training parameters:

c6a18e15

replace memcpy with reshape operation so that the graph is not cut at…

ce937bc4

remove unused function argument from get_example_targets_batch

ff759d95

measure and print total training time

e843d6e7

add optimization callback to ggml_opt_resume_g

bfc31191

use optimization callback in training

d7aa4d95

add minimum number of tensor dimensions to apply weight decay (defaul…

e6ff0728

rename training parameter cos-decay-alpha to cos-decay-min and clarif…

58024d3e

fix increase of model.train_samples and model.train_tokens

17a0898d

change sampling parameters for prediction after training to defaults …

24a4b099

tighten abs error bounds for cross_entropy_loss in test-grad0

1065c3b7

add conditional compilation of using F16 exp in flash attention

dbbc2633

tighten abs error bounds for flash_attn in test-grad0

47055c92

tighten abs error bounds for sqrt in test-grad0

0f6a8ab5

remove out-commented vectorized code of opt_adam

87035b96

ggml : update ggml_rms_norm_back with configurable eps

ecdc1616

llama training : fix ggml_rms_norm_back calls to pass configurable eps

c1a5e116

remove trailing whitespace

22cb368d

Merge branch 'master' into pr-train-mem-usage-improvements

d43af4b5

add train function using automatic gradient checkpointing backward pa…

2bf422ea

in train function replace add_inplace by regular add

fc826c8e

don't use allocate hash_map on context

d4374154

correctly clone reshape and permute operations by also cloning tensor…

cfddc36b

fix variable name and add missing type cast

0dd496c5

terminate recursive tensor cloning when reaching tensor without src t…

52c92c0a

correctly clone view tensors by setting data pointers

345f516f

fix variable names

5a11b758

swap arguments to commutative ops to be the same as in `forward_batch…

b2f13101

add input tensors as checkpoints

5884b43a

fix variable name and add missing boolean negation

9716eb8e

make sure some tensors are not reallocated by inserting new temporary…

38f4438c

fix ASSERT to work with zero layers

d6c5b038

add training options whether to use allocator and/or unified training…

4ed096c6

integrate unified training function which may use memory allocator

865c4cd3

format name of cloned tensors with " (clone)" suffix

3e99a8d6

set names for tensors in unified train function for easier debugging

75baed23

allocate graph on context using ggml_new_graph

fe788a1c

remove handwritten training functions

c954f41c

remove unused training parameters "use_scratch" and "use_unified"

271e4d64

remove trailing whitespace

6f161c78

remove unused train params: mem_compute1_gb & mem_compute2_gb

3794dceb

remove unused forward_batch function

6e280b24

add debug asserts in ggml_allocr_alloc to some common pitfalls when u…

faf3e21e

only use ggml_allocr_alloc when tensor has NULL data and is no view

098654c2

fix test when to create temporary backward graph

3e6468b0

fix memory "leak" in optimizers

56228461

reverse order of for loop in ggml_build_backward_expand to save memor…

3b5515bb

ggerganov added high priority

ggerganov added training

Merge branch 'master' into pr-train-mem-usage-improvements

0c52c65d

add missing lctx argument to get_example_targets_batch

4072f20b

implement llama model file saving using gguf

f51c5d76

implement loading/saving of checkpointing files using GGUF

54079813

bug fixes

6a20f7a2

add checkpoint file version for future compatibility

167dd2dc

update readme with gguf filenames

2978e030

save & load opt->just_initialized value

0c494cc6

add first draft for checkpoint conversion script

3a91c975

Merge branch 'master' into pr-train-mem-usage-improvements

a6f3a47c

add gguf arch and ftype

cb42324d

save opt parameter counter as uint64

495a62a1

add gguf key and tensor names for optimizer and training

ef899fbe

add layer_norm_rms_eps to checkpoint convert script

d71069c4

use same GGUF_GET_KEY macro as in llama.cpp

91a4ccaf

use norm_rms_eps, and rope parameters and command line options to set…

0b2c85b0

fix memory corruption bug in gguf

ca5b344f

add gguf example cmake file

5d94997a

bug fixes in tokenize_file

76d2794e

bug fixes in load_llama_model_gguf

4882ff0c

bug fix: init model when no checkpoint was loaded

152cfaac

bug fix in read_tensor_by_name

1f833434

bug fix in load_opt_context_gguf

3d8d8840

avoid printing lots of spaced on the unusual case that loss gets nan

e86b3e32

set name of tensors with empty name from what was read from gguf

daa0b6c6

remove trailing whitespace

f97f92bc

print data checksums before saving and after loading to verify correc…

c690c203

bug fixes for convert-train-checkpoint-to-gguf

5f27ade4

temporarily add code to write old checkpoint files

e8df9e68

bug fixes for convert-train-checkpoint-to-gguf.py loading checkpoints…

31c093c2

remove code used to verify correctness of checkpoint file conversion

63bf200b

remove trailing whitespace

3155019b

remove prediction related code

3e7dfd08

update train-text-from-scratch README.md

17ab46df

Merge branch 'master' into pr-train-mem-usage-improvements

12c4e5b5

fix non-windows GGML_ALIGNED_REALLOC

a925e930

add missing blank line at end of file

440d221c

ggerganov commented on 2023-08-28

remove GGML_ALIGNED_REALLOC and use normal malloc/realloc/free for gg…

f6828cba

train : fix compile warnings

93535a46

ggerganov approved these changes on 2023-08-28

ggerganov merged 44c117f4 into master 2 years ago

llama.cpp
Train mem usage and other improvements
#2439

Merged

Train mem usage and other improvements #2439

llama.cpp Train mem usage and other improvements #2439 Merged

Train mem usage and other improvements #2439

llama.cpp
Train mem usage and other improvements
#2439

Merged