transformers
[MODEL] Nanochat implementation
#41634
Merged

[MODEL] Nanochat implementation #41634

Cyrilvallez merged 111 commits into main from nanochat-implementation
burtenshaw
burtenshaw first draft on modelling
99b3577d
burtenshaw add modelling to auto
27672521
burtenshaw add to init
2223aa0a
burtenshaw [WIP] add scripts for conversion
bbfd853c
HuggingFaceDocBuilderDev
burtenshaw hack at attention and rotary using logit comparison unitl it works
4ac6a758
burtenshaw update conversion script
5f8bfeda
burtenshaw fix test
216c5d44
burtenshaw tody up decoding inputs
fa2874c4
burtenshaw fix all naming nanogpt >> nanochat
f234f75c
burtenshaw add return as tensors
992b26b8
burtenshaw use batch encoding
9acd3f25
xenova
xenova commented on 2025-10-15
burtenshaw include tokenizer conversion in the conversion script
1af42532
burtenshaw add repo revision to vocab pull
5af8672e
burtenshaw move attention into func and add kward to all signatures (for vllm)
9583d9b1
burtenshaw rename everything NanoGPT >> NanoChat
d83458d3
burtenshaw improve docstrings in modelling code
efa48d4e
burtenshaw run ruff
a3bed2cf
burtenshaw delete working test file
9e1f4eae
burtenshaw remove custom tokenizer
557cfe20
burtenshaw protect imports with lazy
5977a785
burtenshaw removed unused variables
bd83c278
burtenshaw format config
6e81f591
burtenshaw sort auto model configs
939f544c
burtenshaw make sure nanochat config is in all (fixes vllm)
7fd570cb
burtenshaw integrate tests
0fda1c0f
burtenshaw remove custom tokenizer from autos
7e545679
burtenshaw formate tests
0f20bbfb
burtenshaw add documentation for nanochat
6acd62b6
burtenshaw update toc with nanochat
8c59e3e5
burtenshaw Merge branch 'main' into nanochat-implementation
a2ff0e46
burtenshaw burtenshaw marked this pull request as ready for review 66 days ago
burtenshaw burtenshaw changed the title [DRAFT] Nanochat implementation [MODEL] Nanochat implementation 66 days ago
xenova
xenova commented on 2025-10-16
xenova
xenova commented on 2025-10-16
burtenshaw lose some attention notes
2fdf98c4
vasqu
vasqu commented on 2025-10-16
vasqu
vasqu commented on 2025-10-16
burtenshaw remove excess configuration
7bd50312
burtenshaw move init
8409b2d2
burtenshaw move script into modeling dir
da67e4be
burtenshaw switch to modular and inheret llama
a523ce44
burtenshaw add causal flag
da8aa74f
burtenshaw remove excess and just set params in model config
5fec3456
burtenshaw liscence
afba61fd
burtenshaw update model flags
16a0207d
burtenshaw re use init weights
f57a2951
burtenshaw lose functionals from the MLP
1bce97c5
burtenshaw use rmsnorm module instead of functional
743ba1b1
burtenshaw inherit rms from NanoChatRMSNorm
06a60bac
burtenshaw run modular convert and update modeling
658d532f
burtenshaw update docs
0e38fe96
burtenshaw remove excess changes in auto
78dbd223
burtenshaw Update src/transformers/models/nanochat/configuration_nanochat.py
f9d1511a
burtenshaw add a basic tp/pp plan
d71a75d7
burtenshaw Merge branch 'nanochat-implementation' of https://github.com/huggingf…
d837d658
burtenshaw use cleanup util for tests
77900002
burtenshaw update conversion script
59404f8c
burtenshaw add docstring to nanochat config
5169f333
burtenshaw
burtenshaw update auto config with NanoChatForCausalLM
b242b80c
burtenshaw remove forward snippet
37369312
burtenshaw fix auto modeling
41788c25
burtenshaw delete excess modelling file
79a1e7c9
burtenshaw delete excess modelling file and regenerate modeling
6126f3f6
burtenshaw Merge branch 'nanochat-implementation' of https://github.com/huggingf…
a572394f
burtenshaw revert changes and use less llama modules
99801d04
vasqu
vasqu commented on 2025-10-18
burtenshaw use llama rope not custom
57e0e357
burtenshaw
vasqu
vasqu commented on 2025-10-20
burtenshaw cut down on notes and docstrings
a021cd58
burtenshaw revert again to custom rope but this time using Qwen3 attention
2598bf4a
burtenshaw tidy notes in attention module
1ce7e423
burtenshaw update cache kwargs with sin and cos
32f3baa8
burtenshaw inheret from CLIPMLP and overwrite projections
76305b7b
burtenshaw inheret llama decode layer
0ed59dfb
burtenshaw re use more from LlamaPreTrainedModel
ff7bd71f
burtenshaw inherit LlamaModel and copy forward except for initial_norm
321026ed
burtenshaw inheret more from LlamaForCausalLM
f2608eee
burtenshaw
vasqu more modular friendly
39748618
vasqu Merge remote-tracking branch 'upstream/main' into nanochat-implementa…
3c1d5135
vasqu adjust to changes in main
cd411f1a
vasqu
vasqu commented on 2025-10-21
vasqu last nits
e3a8d539
vasqu apparently we do this now lol
70e8e6b4
vasqu mustve been a blank smthn smthn
208e9bb1
burtenshaw update conversion script to use original config
57f128e8
burtenshaw fix attetion_bias everywhere
fa1371f1
burtenshaw resolve todos in config
61f9d987
burtenshaw todos: layer names and repo id in modular
cd6e5eae
burtenshaw last todos in config and modelling
d38e3a75
burtenshaw usng karapthy repo
5f3607fc
burtenshaw
vasqu inherit from gemma instead
50a9db6e
vasqu
github-actions
vasqu
vasqu commented on 2025-10-22
vasqu fix vaultgemma
98446c91
vasqu
github-actions
vasqu Merge branch 'main' into nanochat-implementation
88e12fba
vasqu
github-actions
vasqu ci common
38051c66
vasqu
github-actions
burtenshaw do rope config after super init
5940a501
burtenshaw update logit values in tests
40e1a36d
burtenshaw update logit tests one last time
b0a3dc4b
burtenshaw update and format conversion script
cce09b78
burtenshaw Merge branch 'nanochat-implementation' of https://github.com/huggingf…
9352a0a5
vasqu
github-actions
vasqu
vasqu commented on 2025-10-22
vasqu
vasqu commented on 2025-10-22
burtenshaw update all logits again
234cae71
burtenshaw update test tollerence
b548505f
vasqu fix tests
afdddccb
vasqu auto tokenizer
85536f49
vasqu
vasqu commented on 2025-10-22
vasqu
github-actions
vasqu
vasqu commented on 2025-10-22
vasqu fix test, to be updated upstream on the hub
a543a7c4
vasqu vasqu requested a review from ArthurZucker ArthurZucker 59 days ago
vasqu
burtenshaw
burtenshaw Merge branch 'main' into nanochat-implementation
ecd18445
vasqu
github-actions
vasqu
vasqu commented on 2025-10-27
vasqu Merge branch 'main' into nanochat-implementation
ae367e63
vasqu
github-actions
github-actions
ydshieh
github-actions
ydshieh
github-actions
github-actions
vasqu
ydshieh
vasqu Merge branch 'main' into nanochat-implementation
72db36a6
vasqu
github-actions
vasqu fix
8f85d673
github-actions
burtenshaw update conversion script to deal with new d34 model
fd89210f
github-actions
vasqu Merge branch 'main' into nanochat-implementation
0adff1e5
github-actions
Cyrilvallez Merge branch 'main' into nanochat-implementation
d6a4a218
Cyrilvallez
github-actions
github-actions
github-actions
Cyrilvallez
Cyrilvallez commented on 2025-11-24
Cyrilvallez
burtenshaw fix eval in docs
8be09b65
github-actions
burtenshaw Apply suggestions from code review
01f3009e
github-actions
burtenshaw no separate initial_norm and do not iter over module
185bec7d
burtenshaw update _tp_plan
35570e30
burtenshaw update modeling with changes
78128639
github-actions
burtenshaw
vasqu
github-actions
github-actions
burtenshaw Merge branch 'main' into nanochat-implementation
9f5b2d1b
github-actions
burtenshaw
Cyrilvallez style
0988889f
github-actions
Cyrilvallez fix tp_plan
da382f1f
github-actions
Cyrilvallez
Cyrilvallez approved these changes on 2025-11-27
Cyrilvallez oupppss this one almost slipped through nice I checked
72956075
Cyrilvallez Cyrilvallez merged 7c8d72bd into main 25 days ago
Cyrilvallez Cyrilvallez deleted the nanochat-implementation branch 25 days ago
github-actions
burtenshaw
vasqu

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone