llama.cpp
Train mem usage and other improvements
#2439
Merged

Train mem usage and other improvements #2439

xaedes
xaedes fix track_max_mem in forward_batch_wo_cache_flash_attn_train
5d124d0c
xaedes remove unnecessary Adam(W) optimizer tensors.
d39c8e68
xaedes add gradient clipping to AdamW
d395b19c
xaedes Fix reset of unused g->nodes and g->grads to NULL
d7003a98
xaedes implement gradient checkpointing for training
6e3f95bf
xaedes remove unused compute buffer 3
e05e4414
xaedes add and use function ggml_build_backward_expand to avoid stack overfl…
ed4319e1
xaedes change AdamW decay parameter to work like the torch AdamW decay param…
a80f184e
xaedes change default AdamW weight decay parameter used in training to 0.1 a…
f175ead6
xaedes change default AdamW weight decay parameter defined in ggml to 0.0, m…
97964a4c
xaedes bug fixes for cross entropy loss
2c6985f7
xaedes fix test-grad0 for cross_entropy_loss
2d1e6e06
xaedes fix test-grad0 for soft_max
864e7e3a
xaedes improve finite differences of test-grad0 by using double instead of f…
87febeec
xaedes change cross_entropy_loss to output average over all rows
51dc7709
xaedes improve gradient checkpointing
3744a9be
xaedes disable gradient checkpointing debug output
fc379a2d
xaedes llama : fix rope usage in train-text-from-scratch after ChatGLM change
d0fbb7d3
xaedes add more training parameters:
c6a18e15
xaedes replace memcpy with reshape operation so that the graph is not cut at…
ce937bc4
xaedes remove unused function argument from get_example_targets_batch
ff759d95
xaedes measure and print total training time
e843d6e7
xaedes add optimization callback to ggml_opt_resume_g
bfc31191
xaedes use optimization callback in training
d7aa4d95
xaedes add minimum number of tensor dimensions to apply weight decay (defaul…
e6ff0728
xaedes rename training parameter cos-decay-alpha to cos-decay-min and clarif…
58024d3e
xaedes fix increase of model.train_samples and model.train_tokens
17a0898d
xaedes change sampling parameters for prediction after training to defaults …
24a4b099
xaedes tighten abs error bounds for cross_entropy_loss in test-grad0
1065c3b7
xaedes add conditional compilation of using F16 exp in flash attention
dbbc2633
xaedes tighten abs error bounds for flash_attn in test-grad0
47055c92
xaedes tighten abs error bounds for sqrt in test-grad0
0f6a8ab5
xaedes remove out-commented vectorized code of opt_adam
87035b96
xaedes ggml : update ggml_rms_norm_back with configurable eps
ecdc1616
xaedes llama training : fix ggml_rms_norm_back calls to pass configurable eps
c1a5e116
xaedes remove trailing whitespace
22cb368d
ggerganov
xaedes
slaren
xaedes
howard0su
xaedes Merge branch 'master' into pr-train-mem-usage-improvements
d43af4b5
xaedes add train function using automatic gradient checkpointing backward pa…
2bf422ea
xaedes
xaedes in train function replace add_inplace by regular add
fc826c8e
xaedes don't use allocate hash_map on context
d4374154
xaedes correctly clone reshape and permute operations by also cloning tensor…
cfddc36b
xaedes fix variable name and add missing type cast
0dd496c5
xaedes terminate recursive tensor cloning when reaching tensor without src t…
52c92c0a
xaedes correctly clone view tensors by setting data pointers
345f516f
xaedes fix variable names
5a11b758
xaedes swap arguments to commutative ops to be the same as in `forward_batch…
b2f13101
xaedes add input tensors as checkpoints
5884b43a
xaedes fix variable name and add missing boolean negation
9716eb8e
xaedes make sure some tensors are not reallocated by inserting new temporary…
38f4438c
xaedes fix ASSERT to work with zero layers
d6c5b038
xaedes add training options whether to use allocator and/or unified training…
4ed096c6
xaedes integrate unified training function which may use memory allocator
865c4cd3
xaedes format name of cloned tensors with " (clone)" suffix
3e99a8d6
xaedes set names for tensors in unified train function for easier debugging
75baed23
xaedes allocate graph on context using ggml_new_graph
fe788a1c
xaedes remove handwritten training functions
c954f41c
xaedes remove unused training parameters "use_scratch" and "use_unified"
271e4d64
xaedes remove trailing whitespace
6f161c78
xaedes remove unused train params: mem_compute1_gb & mem_compute2_gb
3794dceb
xaedes remove unused forward_batch function
6e280b24
xaedes
xaedes add debug asserts in ggml_allocr_alloc to some common pitfalls when u…
faf3e21e
xaedes only use ggml_allocr_alloc when tensor has NULL data and is no view
098654c2
xaedes fix test when to create temporary backward graph
3e6468b0
xaedes fix memory "leak" in optimizers
56228461
xaedes
xaedes reverse order of for loop in ggml_build_backward_expand to save memor…
3b5515bb
xaedes
ggerganov ggerganov added high priority
ggerganov ggerganov added training
ggerganov
xaedes
xaedes Merge branch 'master' into pr-train-mem-usage-improvements
0c52c65d
xaedes add missing lctx argument to get_example_targets_batch
4072f20b
xaedes
xaedes
ggerganov
xaedes
xaedes implement llama model file saving using gguf
f51c5d76
xaedes
ggerganov
philpax
xaedes
ggerganov
xaedes implement loading/saving of checkpointing files using GGUF
54079813
xaedes bug fixes
6a20f7a2
xaedes
ggerganov
xaedes add checkpoint file version for future compatibility
167dd2dc
xaedes update readme with gguf filenames
2978e030
xaedes save & load opt->just_initialized value
0c494cc6
xaedes add first draft for checkpoint conversion script
3a91c975
xaedes Merge branch 'master' into pr-train-mem-usage-improvements
a6f3a47c
xaedes add gguf arch and ftype
cb42324d
xaedes save opt parameter counter as uint64
495a62a1
xaedes add gguf key and tensor names for optimizer and training
ef899fbe
xaedes add layer_norm_rms_eps to checkpoint convert script
d71069c4
xaedes use same GGUF_GET_KEY macro as in llama.cpp
91a4ccaf
xaedes use norm_rms_eps, and rope parameters and command line options to set…
0b2c85b0
xaedes fix memory corruption bug in gguf
ca5b344f
xaedes add gguf example cmake file
5d94997a
xaedes bug fixes in tokenize_file
76d2794e
xaedes bug fixes in load_llama_model_gguf
4882ff0c
xaedes bug fix: init model when no checkpoint was loaded
152cfaac
xaedes bug fix in read_tensor_by_name
1f833434
xaedes bug fix in load_opt_context_gguf
3d8d8840
xaedes avoid printing lots of spaced on the unusual case that loss gets nan
e86b3e32
xaedes set name of tensors with empty name from what was read from gguf
daa0b6c6
xaedes remove trailing whitespace
f97f92bc
xaedes print data checksums before saving and after loading to verify correc…
c690c203
xaedes bug fixes for convert-train-checkpoint-to-gguf
5f27ade4
xaedes temporarily add code to write old checkpoint files
e8df9e68
xaedes bug fixes for convert-train-checkpoint-to-gguf.py loading checkpoints…
31c093c2
xaedes remove code used to verify correctness of checkpoint file conversion
63bf200b
xaedes remove trailing whitespace
3155019b
xaedes remove prediction related code
3e7dfd08
xaedes update train-text-from-scratch README.md
17ab46df
xaedes Merge branch 'master' into pr-train-mem-usage-improvements
12c4e5b5
xaedes fix non-windows GGML_ALIGNED_REALLOC
a925e930
xaedes add missing blank line at end of file
440d221c
xaedes
ggerganov
ggerganov commented on 2023-08-28
xaedes remove GGML_ALIGNED_REALLOC and use normal malloc/realloc/free for gg…
f6828cba
ggerganov train : fix compile warnings
93535a46
ggerganov
ggerganov approved these changes on 2023-08-28
ggerganov ggerganov merged 44c117f4 into master 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone