compilade
changed the title ggml-quants : 1.625 bpw ternary packing for BitNet 1.58b ggml-quants : 1.625 bpw ternary packing for BitNet b1.581 year ago
bitnet : replace 1.58b with b1.58, as in the paper
ggml : add some informative comments in q1_3 vec_dot
dd3e62a7
Merge branch 'master' into compilade/bitnet-ternary
79a278e9
ggml : add TQ1_0 and TQ2_0 ternary quantization types
77b8f84a
ggml : even faster TQ2_0
560873f3
ggml : also faster TQ1_0
e9719576
ggml : fix build issues in certain environments
a6dd6994
ggml : add NEON vec_dot implementation for TQ1_0 and TQ2_0
5417089a
ggml : avoid directly using vmlal_high_s8, for 32-bit ARM compat
45719a24
compilademarked this pull request as draft 1 year ago
ggml : remove q1_3 and q2_2
04eec581
compilade
changed the title ggml-quants : 1.625 bpw ternary packing for BitNet b1.58 ggml-quants : ternary packing for TriLMs and BitNet b1.581 year ago
ggml-quants : rename fields of TQ1_0 and TQ2_0 structs for consistency
f034aa1b
ggml-quants : allow using vdotq_s32 in TQ2_0 vec_dot
96b3d411
Merge branch 'master' into compilade/bitnet-ternary
d911cd1f
gguf-py : Numpy (de)quantization for TQ1_0 and TQ2_0
3a0bf17d
convert : allow direct conversion to TQ1_0 and TQ2_0
895004f3
ggml-quants : allow using ARM dot product instructions for TQ1_0
69f77268
Merge branch 'master' into compilade/bitnet-ternary
82b24040
ggml-quants : deduplicate TQ1_0 and TQ2_0 __ARM_FEATURE_DOTPROD support
35cc5567
compilademarked this pull request as ready for review 1 year ago
Login to write a write a comment.
Login via GitHub