llama.cpp
Train mem usage and other improvements
#2439
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
104
Changes
View On
GitHub
Commits
fix track_max_mem in forward_batch_wo_cache_flash_attn_train
xaedes
committed
2 years ago
remove unnecessary Adam(W) optimizer tensors.
xaedes
committed
2 years ago
add gradient clipping to AdamW
xaedes
committed
2 years ago
Fix reset of unused g->nodes and g->grads to NULL
xaedes
committed
2 years ago
implement gradient checkpointing for training
xaedes
committed
2 years ago
remove unused compute buffer 3
xaedes
committed
2 years ago
add and use function ggml_build_backward_expand to avoid stack overflows with large maximum number of nodes
xaedes
committed
2 years ago
change AdamW decay parameter to work like the torch AdamW decay parameter
xaedes
committed
2 years ago
change default AdamW weight decay parameter used in training to 0.1 as used in nanoGPT
xaedes
committed
2 years ago
change default AdamW weight decay parameter defined in ggml to 0.0, making Adam default instead of AdamW
xaedes
committed
2 years ago
bug fixes for cross entropy loss
xaedes
committed
2 years ago
fix test-grad0 for cross_entropy_loss
xaedes
committed
2 years ago
fix test-grad0 for soft_max
xaedes
committed
2 years ago
improve finite differences of test-grad0 by using double instead of float
xaedes
committed
2 years ago
change cross_entropy_loss to output average over all rows
xaedes
committed
2 years ago
improve gradient checkpointing
xaedes
committed
2 years ago
disable gradient checkpointing debug output
xaedes
committed
2 years ago
llama : fix rope usage in train-text-from-scratch after ChatGLM change
xaedes
committed
2 years ago
add more training parameters:
xaedes
committed
2 years ago
replace memcpy with reshape operation so that the graph is not cut at the input
xaedes
committed
2 years ago
remove unused function argument from get_example_targets_batch
xaedes
committed
2 years ago
measure and print total training time
xaedes
committed
2 years ago
add optimization callback to ggml_opt_resume_g
xaedes
committed
2 years ago
use optimization callback in training
xaedes
committed
2 years ago
add minimum number of tensor dimensions to apply weight decay (default 2)
xaedes
committed
2 years ago
rename training parameter cos-decay-alpha to cos-decay-min and clarify that adam-min-alpha also applies to warmup
xaedes
committed
2 years ago
fix increase of model.train_samples and model.train_tokens
xaedes
committed
2 years ago
change sampling parameters for prediction after training to defaults of common.h
xaedes
committed
2 years ago
tighten abs error bounds for cross_entropy_loss in test-grad0
xaedes
committed
2 years ago
add conditional compilation of using F16 exp in flash attention
xaedes
committed
2 years ago
tighten abs error bounds for flash_attn in test-grad0
xaedes
committed
2 years ago
tighten abs error bounds for sqrt in test-grad0
xaedes
committed
2 years ago
remove out-commented vectorized code of opt_adam
xaedes
committed
2 years ago
ggml : update ggml_rms_norm_back with configurable eps
xaedes
committed
2 years ago
llama training : fix ggml_rms_norm_back calls to pass configurable eps
xaedes
committed
2 years ago
remove trailing whitespace
xaedes
committed
2 years ago
Merge branch 'master' into pr-train-mem-usage-improvements
xaedes
committed
2 years ago
add train function using automatic gradient checkpointing backward pass and allocator
xaedes
committed
2 years ago
in train function replace add_inplace by regular add
xaedes
committed
2 years ago
don't use allocate hash_map on context
xaedes
committed
2 years ago
correctly clone reshape and permute operations by also cloning tensor->nb values
xaedes
committed
2 years ago
fix variable name and add missing type cast
xaedes
committed
2 years ago
terminate recursive tensor cloning when reaching tensor without src tensors
xaedes
committed
2 years ago
correctly clone view tensors by setting data pointers
xaedes
committed
2 years ago
fix variable names
xaedes
committed
2 years ago
swap arguments to commutative ops to be the same as in `forward_batch_wo_cache_flash_attn`
xaedes
committed
2 years ago
add input tensors as checkpoints
xaedes
committed
2 years ago
fix variable name and add missing boolean negation
xaedes
committed
2 years ago
make sure some tensors are not reallocated by inserting new temporary nodes depending on them:
xaedes
committed
2 years ago
fix ASSERT to work with zero layers
xaedes
committed
2 years ago
add training options whether to use allocator and/or unified training function
xaedes
committed
2 years ago
integrate unified training function which may use memory allocator
xaedes
committed
2 years ago
format name of cloned tensors with " (clone)" suffix
xaedes
committed
2 years ago
set names for tensors in unified train function for easier debugging
xaedes
committed
2 years ago
allocate graph on context using ggml_new_graph
xaedes
committed
2 years ago
remove handwritten training functions
xaedes
committed
2 years ago
remove unused training parameters "use_scratch" and "use_unified"
xaedes
committed
2 years ago
remove trailing whitespace
xaedes
committed
2 years ago
remove unused train params: mem_compute1_gb & mem_compute2_gb
xaedes
committed
2 years ago
remove unused forward_batch function
xaedes
committed
2 years ago
add debug asserts in ggml_allocr_alloc to some common pitfalls when using this function directly
xaedes
committed
2 years ago
only use ggml_allocr_alloc when tensor has NULL data and is no view
xaedes
committed
2 years ago
fix test when to create temporary backward graph
xaedes
committed
2 years ago
fix memory "leak" in optimizers
xaedes
committed
2 years ago
reverse order of for loop in ggml_build_backward_expand to save memory when using gradient checkpointing and allocator
xaedes
committed
2 years ago
Merge branch 'master' into pr-train-mem-usage-improvements
xaedes
committed
2 years ago
add missing lctx argument to get_example_targets_batch
xaedes
committed
2 years ago
implement llama model file saving using gguf
xaedes
committed
2 years ago
implement loading/saving of checkpointing files using GGUF
xaedes
committed
2 years ago
bug fixes
xaedes
committed
2 years ago
add checkpoint file version for future compatibility
xaedes
committed
2 years ago
update readme with gguf filenames
xaedes
committed
2 years ago
save & load opt->just_initialized value
xaedes
committed
2 years ago
add first draft for checkpoint conversion script
xaedes
committed
2 years ago
Merge branch 'master' into pr-train-mem-usage-improvements
xaedes
committed
2 years ago
add gguf arch and ftype
xaedes
committed
2 years ago
save opt parameter counter as uint64
xaedes
committed
2 years ago
add gguf key and tensor names for optimizer and training
xaedes
committed
2 years ago
add layer_norm_rms_eps to checkpoint convert script
xaedes
committed
2 years ago
use same GGUF_GET_KEY macro as in llama.cpp
xaedes
committed
2 years ago
use norm_rms_eps, and rope parameters and command line options to set them
xaedes
committed
2 years ago
fix memory corruption bug in gguf
xaedes
committed
2 years ago
add gguf example cmake file
xaedes
committed
2 years ago
bug fixes in tokenize_file
xaedes
committed
2 years ago
bug fixes in load_llama_model_gguf
xaedes
committed
2 years ago
bug fix: init model when no checkpoint was loaded
xaedes
committed
2 years ago
bug fix in read_tensor_by_name
xaedes
committed
2 years ago
bug fix in load_opt_context_gguf
xaedes
committed
2 years ago
avoid printing lots of spaced on the unusual case that loss gets nan
xaedes
committed
2 years ago
set name of tensors with empty name from what was read from gguf
xaedes
committed
2 years ago
remove trailing whitespace
xaedes
committed
2 years ago
print data checksums before saving and after loading to verify correctness
xaedes
committed
2 years ago
bug fixes for convert-train-checkpoint-to-gguf
xaedes
committed
2 years ago
temporarily add code to write old checkpoint files
xaedes
committed
2 years ago
bug fixes for convert-train-checkpoint-to-gguf.py loading checkpoints with opt_version=0
xaedes
committed
2 years ago
remove code used to verify correctness of checkpoint file conversion
xaedes
committed
2 years ago
remove trailing whitespace
xaedes
committed
2 years ago
remove prediction related code
xaedes
committed
2 years ago
update train-text-from-scratch README.md
xaedes
committed
2 years ago
Merge branch 'master' into pr-train-mem-usage-improvements
xaedes
committed
2 years ago
+ more commits ...
Loading