first adding diffllama
3bd9e34c
add Diff Attention and other but still with errors
269055e1
complate make attention Diff-Attention
dbbf0730
fix some bugs which may be caused by transformer-cli while adding model
c4ea9dfc
fix a bug caused by forgetting KV cache...
e072544a
Update src/transformers/models/diffllama/modeling_diffllama.py
674d7a23
Update src/transformers/models/diffllama/modeling_diffllama.py
9eac636a
Update src/transformers/models/diffllama/modeling_diffllama.py
0e99dbd4
Update src/transformers/models/diffllama/modeling_diffllama.py
1e445c78
Update src/transformers/models/diffllama/modeling_diffllama.py
cca6a5c2
Update src/transformers/models/diffllama/modeling_diffllama.py
dd167af8
Update src/transformers/models/diffllama/modeling_diffllama.py
23099cb9
Update src/transformers/models/diffllama/modeling_diffllama.py
faac378e
I found Attention missed implemented from paper still on e072544a3bfc…
53e13aa2
re-implemented
63b018a2
adding groupnorm
204bec87
align with transformers code style
bce12e5f
fix typo
44d8423c
adding groupnorm
6dc6f81c
change SdpaAttention to DiffSdpaAttention
48b38e87
fix bug
997f561d
Update src/transformers/models/diffllama/modeling_diffllama.py
107bd3c3
fix bugs of places of "GroupNorm with scale" and etc
26307d92
Revert "fix bugs of places of "GroupNorm with scale" and etc"
22aa1451
simplify multiple of attention (matmul) operations into one by repeat…
cc472bef
simplify multiple of attention (matmul) operations into one by repeat…
e834129d
simplify multiple of attention (matmul) operations into one by repeat…
e9d94e5a
remove missed type
03529996
bzantium
approved these changes
on 2024-10-23
add diffllama model_doc
843178ad
apply make style/quality
71c8d124
apply review comment about model
fea95faf
apply review comment about test
b3f8dd5c
place diffllama alphabetically on the src/transformers/__init__.py
50ce3532
fix forgot code
6f253335
Supports parameters that are not initialized with standard deviation …
dd2282e6
add DiffLlamaConfig to CONFIG_CLASSES_TO_IGNORE_FOR_DOCSTRING_CHECKPO…
9e7a9c3e
remove unused property of config
8c98d191
add to supported model list
cbf217d8
add to spda supported model list
c8739822
fix copyright, remove pretraining_tensor_parallel, and modify for ini…
b003a535
remove unused import and etc.
37c7a88e
empty commit
ba92d5c1
empty commit
8cc823ee
empty commit
d47631d6
weak-kajuma
changed the title [WIP] Add diffllama [Request Reviews]Add diffllama 1 year ago
weak-kajuma
changed the title [Request Reviews]Add diffllama Add diffllama 1 year ago
apply modular transformers but with bugs
c6932de8
revert prev commit
48e16cf3
create src/transformers/model/diffllama/modular_diffllama.py
a44f95d3
run utils/modular_model_converter.py
c45aa59b
empty commit
c5741eb0
leaner modular diffllama
ea622ce1
Merge branch 'huggingface:main' into add_diffllama
e30c2984
remove more and more in modular_diffllama.pt
3f85c228
remove more and more in modular_diffllama.pt
87d034da
resolve missing docstring entries
4660c6e3
force reset
b4ff5f3f
Merge branch 'huggingface:main' into add_diffllama
484a493f
convert modular
0ce20233
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub