Train mem usage and other improvements #2439
fix track_max_mem in forward_batch_wo_cache_flash_attn_train
5d124d0c
remove unnecessary Adam(W) optimizer tensors.
d39c8e68
add gradient clipping to AdamW
d395b19c
Fix reset of unused g->nodes and g->grads to NULL
d7003a98
implement gradient checkpointing for training
6e3f95bf
remove unused compute buffer 3
e05e4414
add and use function ggml_build_backward_expand to avoid stack overfl…
ed4319e1
change AdamW decay parameter to work like the torch AdamW decay param…
a80f184e
change default AdamW weight decay parameter used in training to 0.1 a…
f175ead6
change default AdamW weight decay parameter defined in ggml to 0.0, m…
97964a4c
bug fixes for cross entropy loss
2c6985f7
fix test-grad0 for cross_entropy_loss
2d1e6e06
fix test-grad0 for soft_max
864e7e3a
improve finite differences of test-grad0 by using double instead of f…
87febeec
change cross_entropy_loss to output average over all rows
51dc7709
improve gradient checkpointing
3744a9be
disable gradient checkpointing debug output
fc379a2d
llama : fix rope usage in train-text-from-scratch after ChatGLM change
d0fbb7d3
add more training parameters:
c6a18e15
replace memcpy with reshape operation so that the graph is not cut at…
ce937bc4
remove unused function argument from get_example_targets_batch
ff759d95
measure and print total training time
e843d6e7
add optimization callback to ggml_opt_resume_g
bfc31191
use optimization callback in training
d7aa4d95
add minimum number of tensor dimensions to apply weight decay (defaul…
e6ff0728
rename training parameter cos-decay-alpha to cos-decay-min and clarif…
58024d3e
fix increase of model.train_samples and model.train_tokens
17a0898d
change sampling parameters for prediction after training to defaults …
24a4b099
tighten abs error bounds for cross_entropy_loss in test-grad0
1065c3b7
add conditional compilation of using F16 exp in flash attention
dbbc2633
tighten abs error bounds for flash_attn in test-grad0
47055c92
tighten abs error bounds for sqrt in test-grad0
0f6a8ab5
remove out-commented vectorized code of opt_adam
87035b96
ggml : update ggml_rms_norm_back with configurable eps
ecdc1616
llama training : fix ggml_rms_norm_back calls to pass configurable eps
c1a5e116
remove trailing whitespace
22cb368d
Merge branch 'master' into pr-train-mem-usage-improvements
d43af4b5
add train function using automatic gradient checkpointing backward pa…
2bf422ea
in train function replace add_inplace by regular add
fc826c8e
don't use allocate hash_map on context
d4374154
correctly clone reshape and permute operations by also cloning tensor…
cfddc36b
fix variable name and add missing type cast
0dd496c5
terminate recursive tensor cloning when reaching tensor without src t…
52c92c0a
correctly clone view tensors by setting data pointers
345f516f
fix variable names
5a11b758
swap arguments to commutative ops to be the same as in `forward_batch…
b2f13101
add input tensors as checkpoints
5884b43a
fix variable name and add missing boolean negation
9716eb8e
make sure some tensors are not reallocated by inserting new temporary…
38f4438c
fix ASSERT to work with zero layers
d6c5b038
add training options whether to use allocator and/or unified training…
4ed096c6
integrate unified training function which may use memory allocator
865c4cd3
format name of cloned tensors with " (clone)" suffix
3e99a8d6
set names for tensors in unified train function for easier debugging
75baed23
allocate graph on context using ggml_new_graph
fe788a1c
remove handwritten training functions
c954f41c
remove unused training parameters "use_scratch" and "use_unified"
271e4d64
remove trailing whitespace
6f161c78
remove unused train params: mem_compute1_gb & mem_compute2_gb
3794dceb
remove unused forward_batch function
6e280b24
add debug asserts in ggml_allocr_alloc to some common pitfalls when u…
faf3e21e
only use ggml_allocr_alloc when tensor has NULL data and is no view
098654c2
fix test when to create temporary backward graph
3e6468b0
fix memory "leak" in optimizers
56228461
reverse order of for loop in ggml_build_backward_expand to save memor…
3b5515bb
Merge branch 'master' into pr-train-mem-usage-improvements
0c52c65d
add missing lctx argument to get_example_targets_batch
4072f20b
implement llama model file saving using gguf
f51c5d76
implement loading/saving of checkpointing files using GGUF
54079813
bug fixes
6a20f7a2
add checkpoint file version for future compatibility
167dd2dc
update readme with gguf filenames
2978e030
save & load opt->just_initialized value
0c494cc6
add first draft for checkpoint conversion script
3a91c975
Merge branch 'master' into pr-train-mem-usage-improvements
a6f3a47c
add gguf arch and ftype
cb42324d
save opt parameter counter as uint64
495a62a1
add gguf key and tensor names for optimizer and training
ef899fbe
add layer_norm_rms_eps to checkpoint convert script
d71069c4
use same GGUF_GET_KEY macro as in llama.cpp
91a4ccaf
use norm_rms_eps, and rope parameters and command line options to set…
0b2c85b0
fix memory corruption bug in gguf
ca5b344f
add gguf example cmake file
5d94997a
bug fixes in tokenize_file
76d2794e
bug fixes in load_llama_model_gguf
4882ff0c
bug fix: init model when no checkpoint was loaded
152cfaac
bug fix in read_tensor_by_name
1f833434
bug fix in load_opt_context_gguf
3d8d8840
avoid printing lots of spaced on the unusual case that loss gets nan
e86b3e32
set name of tensors with empty name from what was read from gguf
daa0b6c6
remove trailing whitespace
f97f92bc
print data checksums before saving and after loading to verify correc…
c690c203
bug fixes for convert-train-checkpoint-to-gguf
5f27ade4
temporarily add code to write old checkpoint files
e8df9e68
bug fixes for convert-train-checkpoint-to-gguf.py loading checkpoints…
31c093c2
remove code used to verify correctness of checkpoint file conversion
63bf200b
remove trailing whitespace
3155019b
remove prediction related code
3e7dfd08
update train-text-from-scratch README.md
17ab46df
Merge branch 'master' into pr-train-mem-usage-improvements
12c4e5b5
fix non-windows GGML_ALIGNED_REALLOC
a925e930
add missing blank line at end of file
440d221c
remove GGML_ALIGNED_REALLOC and use normal malloc/realloc/free for gg…
f6828cba
train : fix compile warnings
93535a46
ggerganov
approved these changes
on 2023-08-28
ggerganov
merged
44c117f4
into master 2 years ago
Assignees
No one assigned
Labels
high priority
training
Login to write a write a comment.
Login via GitHub