PR #2439 Train mem usage and other improvements

fix track_max_mem in forward_batch_wo_cache_flash_attn_train

xaedes committed 2 years ago

remove unnecessary Adam(W) optimizer tensors.

xaedes committed 2 years ago

add gradient clipping to AdamW

xaedes committed 2 years ago

Fix reset of unused g->nodes and g->grads to NULL

xaedes committed 2 years ago

implement gradient checkpointing for training

xaedes committed 2 years ago

remove unused compute buffer 3

xaedes committed 2 years ago

add and use function ggml_build_backward_expand to avoid stack overflows with large maximum number of nodes

xaedes committed 2 years ago

change AdamW decay parameter to work like the torch AdamW decay parameter

xaedes committed 2 years ago

change default AdamW weight decay parameter used in training to 0.1 as used in nanoGPT

xaedes committed 2 years ago

change default AdamW weight decay parameter defined in ggml to 0.0, making Adam default instead of AdamW

xaedes committed 2 years ago

bug fixes for cross entropy loss

xaedes committed 2 years ago

fix test-grad0 for cross_entropy_loss

xaedes committed 2 years ago

fix test-grad0 for soft_max

xaedes committed 2 years ago

improve finite differences of test-grad0 by using double instead of float

xaedes committed 2 years ago

change cross_entropy_loss to output average over all rows

xaedes committed 2 years ago

improve gradient checkpointing

xaedes committed 2 years ago

disable gradient checkpointing debug output

xaedes committed 2 years ago

llama : fix rope usage in train-text-from-scratch after ChatGLM change

xaedes committed 2 years ago

add more training parameters:

xaedes committed 2 years ago

replace memcpy with reshape operation so that the graph is not cut at the input

xaedes committed 2 years ago

remove unused function argument from get_example_targets_batch

xaedes committed 2 years ago

measure and print total training time

xaedes committed 2 years ago

add optimization callback to ggml_opt_resume_g

xaedes committed 2 years ago

use optimization callback in training

xaedes committed 2 years ago

add minimum number of tensor dimensions to apply weight decay (default 2)

xaedes committed 2 years ago

rename training parameter cos-decay-alpha to cos-decay-min and clarify that adam-min-alpha also applies to warmup

xaedes committed 2 years ago

fix increase of model.train_samples and model.train_tokens

xaedes committed 2 years ago

change sampling parameters for prediction after training to defaults of common.h

xaedes committed 2 years ago

tighten abs error bounds for cross_entropy_loss in test-grad0

xaedes committed 2 years ago

add conditional compilation of using F16 exp in flash attention

xaedes committed 2 years ago

tighten abs error bounds for flash_attn in test-grad0

xaedes committed 2 years ago

tighten abs error bounds for sqrt in test-grad0

xaedes committed 2 years ago

remove out-commented vectorized code of opt_adam

xaedes committed 2 years ago

ggml : update ggml_rms_norm_back with configurable eps

xaedes committed 2 years ago

llama training : fix ggml_rms_norm_back calls to pass configurable eps

xaedes committed 2 years ago

remove trailing whitespace

xaedes committed 2 years ago

Merge branch 'master' into pr-train-mem-usage-improvements

xaedes committed 2 years ago

add train function using automatic gradient checkpointing backward pass and allocator

xaedes committed 2 years ago

in train function replace add_inplace by regular add

xaedes committed 2 years ago

don't use allocate hash_map on context

xaedes committed 2 years ago

correctly clone reshape and permute operations by also cloning tensor->nb values

xaedes committed 2 years ago

fix variable name and add missing type cast

xaedes committed 2 years ago

terminate recursive tensor cloning when reaching tensor without src tensors

xaedes committed 2 years ago

correctly clone view tensors by setting data pointers

xaedes committed 2 years ago

fix variable names

xaedes committed 2 years ago

swap arguments to commutative ops to be the same as in `forward_batch_wo_cache_flash_attn`

xaedes committed 2 years ago

add input tensors as checkpoints

xaedes committed 2 years ago

fix variable name and add missing boolean negation

xaedes committed 2 years ago

make sure some tensors are not reallocated by inserting new temporary nodes depending on them:

xaedes committed 2 years ago

fix ASSERT to work with zero layers

xaedes committed 2 years ago

add training options whether to use allocator and/or unified training function

xaedes committed 2 years ago

integrate unified training function which may use memory allocator

xaedes committed 2 years ago

format name of cloned tensors with " (clone)" suffix

xaedes committed 2 years ago

set names for tensors in unified train function for easier debugging

xaedes committed 2 years ago

allocate graph on context using ggml_new_graph

xaedes committed 2 years ago

remove handwritten training functions

xaedes committed 2 years ago

remove unused training parameters "use_scratch" and "use_unified"

xaedes committed 2 years ago

remove trailing whitespace

xaedes committed 2 years ago

remove unused train params: mem_compute1_gb & mem_compute2_gb

xaedes committed 2 years ago

remove unused forward_batch function

xaedes committed 2 years ago

add debug asserts in ggml_allocr_alloc to some common pitfalls when using this function directly

xaedes committed 2 years ago

only use ggml_allocr_alloc when tensor has NULL data and is no view

xaedes committed 2 years ago

fix test when to create temporary backward graph

xaedes committed 2 years ago

fix memory "leak" in optimizers

xaedes committed 2 years ago

reverse order of for loop in ggml_build_backward_expand to save memory when using gradient checkpointing and allocator

xaedes committed 2 years ago

Merge branch 'master' into pr-train-mem-usage-improvements

xaedes committed 2 years ago

add missing lctx argument to get_example_targets_batch

xaedes committed 2 years ago

implement llama model file saving using gguf

xaedes committed 2 years ago

implement loading/saving of checkpointing files using GGUF

xaedes committed 2 years ago

bug fixes

xaedes committed 2 years ago

add checkpoint file version for future compatibility

xaedes committed 2 years ago

update readme with gguf filenames

xaedes committed 2 years ago

save & load opt->just_initialized value

xaedes committed 2 years ago

add first draft for checkpoint conversion script

xaedes committed 2 years ago

Merge branch 'master' into pr-train-mem-usage-improvements

xaedes committed 2 years ago

add gguf arch and ftype

xaedes committed 2 years ago

save opt parameter counter as uint64

xaedes committed 2 years ago

add gguf key and tensor names for optimizer and training

xaedes committed 2 years ago

add layer_norm_rms_eps to checkpoint convert script

xaedes committed 2 years ago

use same GGUF_GET_KEY macro as in llama.cpp

xaedes committed 2 years ago

use norm_rms_eps, and rope parameters and command line options to set them

xaedes committed 2 years ago

fix memory corruption bug in gguf

xaedes committed 2 years ago

add gguf example cmake file

xaedes committed 2 years ago

bug fixes in tokenize_file

xaedes committed 2 years ago

bug fixes in load_llama_model_gguf

xaedes committed 2 years ago

bug fix: init model when no checkpoint was loaded

xaedes committed 2 years ago

bug fix in read_tensor_by_name

xaedes committed 2 years ago

bug fix in load_opt_context_gguf

xaedes committed 2 years ago

avoid printing lots of spaced on the unusual case that loss gets nan

xaedes committed 2 years ago

set name of tensors with empty name from what was read from gguf

xaedes committed 2 years ago

remove trailing whitespace

xaedes committed 2 years ago

print data checksums before saving and after loading to verify correctness

xaedes committed 2 years ago

bug fixes for convert-train-checkpoint-to-gguf

xaedes committed 2 years ago

temporarily add code to write old checkpoint files

xaedes committed 2 years ago

bug fixes for convert-train-checkpoint-to-gguf.py loading checkpoints with opt_version=0

xaedes committed 2 years ago

remove code used to verify correctness of checkpoint file conversion

xaedes committed 2 years ago

remove trailing whitespace

xaedes committed 2 years ago

remove prediction related code

xaedes committed 2 years ago

update train-text-from-scratch README.md

xaedes committed 2 years ago

Merge branch 'master' into pr-train-mem-usage-improvements

xaedes committed 2 years ago

llama.cpp Train mem usage and other improvements #2439 Merged

llama.cpp
Train mem usage and other improvements
#2439

Merged