llama.cpp
Train mem usage and other improvements
#2439
Merged

Commits
  • fix track_max_mem in forward_batch_wo_cache_flash_attn_train
    xaedes committed 2 years ago
  • remove unnecessary Adam(W) optimizer tensors.
    xaedes committed 2 years ago
  • add gradient clipping to AdamW
    xaedes committed 2 years ago
  • Fix reset of unused g->nodes and g->grads to NULL
    xaedes committed 2 years ago
  • implement gradient checkpointing for training
    xaedes committed 2 years ago
  • remove unused compute buffer 3
    xaedes committed 2 years ago
  • add and use function ggml_build_backward_expand to avoid stack overflows with large maximum number of nodes
    xaedes committed 2 years ago
  • change AdamW decay parameter to work like the torch AdamW decay parameter
    xaedes committed 2 years ago
  • change default AdamW weight decay parameter used in training to 0.1 as used in nanoGPT
    xaedes committed 2 years ago
  • change default AdamW weight decay parameter defined in ggml to 0.0, making Adam default instead of AdamW
    xaedes committed 2 years ago
  • bug fixes for cross entropy loss
    xaedes committed 2 years ago
  • fix test-grad0 for cross_entropy_loss
    xaedes committed 2 years ago
  • fix test-grad0 for soft_max
    xaedes committed 2 years ago
  • improve finite differences of test-grad0 by using double instead of float
    xaedes committed 2 years ago
  • change cross_entropy_loss to output average over all rows
    xaedes committed 2 years ago
  • improve gradient checkpointing
    xaedes committed 2 years ago
  • disable gradient checkpointing debug output
    xaedes committed 2 years ago
  • llama : fix rope usage in train-text-from-scratch after ChatGLM change
    xaedes committed 2 years ago
  • add more training parameters:
    xaedes committed 2 years ago
  • replace memcpy with reshape operation so that the graph is not cut at the input
    xaedes committed 2 years ago
  • remove unused function argument from get_example_targets_batch
    xaedes committed 2 years ago
  • measure and print total training time
    xaedes committed 2 years ago
  • add optimization callback to ggml_opt_resume_g
    xaedes committed 2 years ago
  • use optimization callback in training
    xaedes committed 2 years ago
  • add minimum number of tensor dimensions to apply weight decay (default 2)
    xaedes committed 2 years ago
  • rename training parameter cos-decay-alpha to cos-decay-min and clarify that adam-min-alpha also applies to warmup
    xaedes committed 2 years ago
  • fix increase of model.train_samples and model.train_tokens
    xaedes committed 2 years ago
  • change sampling parameters for prediction after training to defaults of common.h
    xaedes committed 2 years ago
  • tighten abs error bounds for cross_entropy_loss in test-grad0
    xaedes committed 2 years ago
  • add conditional compilation of using F16 exp in flash attention
    xaedes committed 2 years ago
  • tighten abs error bounds for flash_attn in test-grad0
    xaedes committed 2 years ago
  • tighten abs error bounds for sqrt in test-grad0
    xaedes committed 2 years ago
  • remove out-commented vectorized code of opt_adam
    xaedes committed 2 years ago
  • ggml : update ggml_rms_norm_back with configurable eps
    xaedes committed 2 years ago
  • llama training : fix ggml_rms_norm_back calls to pass configurable eps
    xaedes committed 2 years ago
  • remove trailing whitespace
    xaedes committed 2 years ago
  • Merge branch 'master' into pr-train-mem-usage-improvements
    xaedes committed 2 years ago
  • add train function using automatic gradient checkpointing backward pass and allocator
    xaedes committed 2 years ago
  • in train function replace add_inplace by regular add
    xaedes committed 2 years ago
  • don't use allocate hash_map on context
    xaedes committed 2 years ago
  • correctly clone reshape and permute operations by also cloning tensor->nb values
    xaedes committed 2 years ago
  • fix variable name and add missing type cast
    xaedes committed 2 years ago
  • terminate recursive tensor cloning when reaching tensor without src tensors
    xaedes committed 2 years ago
  • correctly clone view tensors by setting data pointers
    xaedes committed 2 years ago
  • fix variable names
    xaedes committed 2 years ago
  • swap arguments to commutative ops to be the same as in `forward_batch_wo_cache_flash_attn`
    xaedes committed 2 years ago
  • add input tensors as checkpoints
    xaedes committed 2 years ago
  • fix variable name and add missing boolean negation
    xaedes committed 2 years ago
  • make sure some tensors are not reallocated by inserting new temporary nodes depending on them:
    xaedes committed 2 years ago
  • fix ASSERT to work with zero layers
    xaedes committed 2 years ago
  • add training options whether to use allocator and/or unified training function
    xaedes committed 2 years ago
  • integrate unified training function which may use memory allocator
    xaedes committed 2 years ago
  • format name of cloned tensors with " (clone)" suffix
    xaedes committed 2 years ago
  • set names for tensors in unified train function for easier debugging
    xaedes committed 2 years ago
  • allocate graph on context using ggml_new_graph
    xaedes committed 2 years ago
  • remove handwritten training functions
    xaedes committed 2 years ago
  • remove unused training parameters "use_scratch" and "use_unified"
    xaedes committed 2 years ago
  • remove trailing whitespace
    xaedes committed 2 years ago
  • remove unused train params: mem_compute1_gb & mem_compute2_gb
    xaedes committed 2 years ago
  • remove unused forward_batch function
    xaedes committed 2 years ago
  • add debug asserts in ggml_allocr_alloc to some common pitfalls when using this function directly
    xaedes committed 2 years ago
  • only use ggml_allocr_alloc when tensor has NULL data and is no view
    xaedes committed 2 years ago
  • fix test when to create temporary backward graph
    xaedes committed 2 years ago
  • fix memory "leak" in optimizers
    xaedes committed 2 years ago
  • reverse order of for loop in ggml_build_backward_expand to save memory when using gradient checkpointing and allocator
    xaedes committed 2 years ago
  • Merge branch 'master' into pr-train-mem-usage-improvements
    xaedes committed 2 years ago
  • add missing lctx argument to get_example_targets_batch
    xaedes committed 2 years ago
  • implement llama model file saving using gguf
    xaedes committed 2 years ago
  • implement loading/saving of checkpointing files using GGUF
    xaedes committed 2 years ago
  • bug fixes
    xaedes committed 2 years ago
  • add checkpoint file version for future compatibility
    xaedes committed 2 years ago
  • update readme with gguf filenames
    xaedes committed 2 years ago
  • save & load opt->just_initialized value
    xaedes committed 2 years ago
  • add first draft for checkpoint conversion script
    xaedes committed 2 years ago
  • Merge branch 'master' into pr-train-mem-usage-improvements
    xaedes committed 2 years ago
  • add gguf arch and ftype
    xaedes committed 2 years ago
  • save opt parameter counter as uint64
    xaedes committed 2 years ago
  • add gguf key and tensor names for optimizer and training
    xaedes committed 2 years ago
  • add layer_norm_rms_eps to checkpoint convert script
    xaedes committed 2 years ago
  • use same GGUF_GET_KEY macro as in llama.cpp
    xaedes committed 2 years ago
  • use norm_rms_eps, and rope parameters and command line options to set them
    xaedes committed 2 years ago
  • fix memory corruption bug in gguf
    xaedes committed 2 years ago
  • add gguf example cmake file
    xaedes committed 2 years ago
  • bug fixes in tokenize_file
    xaedes committed 2 years ago
  • bug fixes in load_llama_model_gguf
    xaedes committed 2 years ago
  • bug fix: init model when no checkpoint was loaded
    xaedes committed 2 years ago
  • bug fix in read_tensor_by_name
    xaedes committed 2 years ago
  • bug fix in load_opt_context_gguf
    xaedes committed 2 years ago
  • avoid printing lots of spaced on the unusual case that loss gets nan
    xaedes committed 2 years ago
  • set name of tensors with empty name from what was read from gguf
    xaedes committed 2 years ago
  • remove trailing whitespace
    xaedes committed 2 years ago
  • print data checksums before saving and after loading to verify correctness
    xaedes committed 2 years ago
  • bug fixes for convert-train-checkpoint-to-gguf
    xaedes committed 2 years ago
  • temporarily add code to write old checkpoint files
    xaedes committed 2 years ago
  • bug fixes for convert-train-checkpoint-to-gguf.py loading checkpoints with opt_version=0
    xaedes committed 2 years ago
  • remove code used to verify correctness of checkpoint file conversion
    xaedes committed 2 years ago
  • remove trailing whitespace
    xaedes committed 2 years ago
  • remove prediction related code
    xaedes committed 2 years ago
  • update train-text-from-scratch README.md
    xaedes committed 2 years ago
  • Merge branch 'master' into pr-train-mem-usage-improvements
    xaedes committed 2 years ago
  • + more commits ...
Loading