[MODEL] Nanochat implementation #41634
first draft on modelling
99b3577d
add modelling to auto
27672521
add to init
2223aa0a
[WIP] add scripts for conversion
bbfd853c
hack at attention and rotary using logit comparison unitl it works
4ac6a758
update conversion script
5f8bfeda
fix test
216c5d44
tody up decoding inputs
fa2874c4
fix all naming nanogpt >> nanochat
f234f75c
add return as tensors
992b26b8
use batch encoding
9acd3f25
xenova
commented
on 2025-10-15
include tokenizer conversion in the conversion script
1af42532
add repo revision to vocab pull
5af8672e
move attention into func and add kward to all signatures (for vllm)
9583d9b1
rename everything NanoGPT >> NanoChat
d83458d3
improve docstrings in modelling code
efa48d4e
run ruff
a3bed2cf
delete working test file
9e1f4eae
remove custom tokenizer
557cfe20
protect imports with lazy
5977a785
removed unused variables
bd83c278
format config
6e81f591
sort auto model configs
939f544c
make sure nanochat config is in all (fixes vllm)
7fd570cb
integrate tests
0fda1c0f
remove custom tokenizer from autos
7e545679
formate tests
0f20bbfb
add documentation for nanochat
6acd62b6
update toc with nanochat
8c59e3e5
Merge branch 'main' into nanochat-implementation
a2ff0e46
burtenshaw
marked this pull request as ready for review 66 days ago
burtenshaw
changed the title [DRAFT] Nanochat implementation [MODEL] Nanochat implementation 66 days ago
xenova
commented
on 2025-10-16
xenova
commented
on 2025-10-16
lose some attention notes
2fdf98c4
vasqu
commented
on 2025-10-16
vasqu
commented
on 2025-10-16
remove excess configuration
7bd50312
move init
8409b2d2
move script into modeling dir
da67e4be
switch to modular and inheret llama
a523ce44
add causal flag
da8aa74f
remove excess and just set params in model config
5fec3456
liscence
afba61fd
update model flags
16a0207d
re use init weights
f57a2951
lose functionals from the MLP
1bce97c5
use rmsnorm module instead of functional
743ba1b1
inherit rms from NanoChatRMSNorm
06a60bac
run modular convert and update modeling
658d532f
update docs
0e38fe96
remove excess changes in auto
78dbd223
Update src/transformers/models/nanochat/configuration_nanochat.py
f9d1511a
add a basic tp/pp plan
d71a75d7
Merge branch 'nanochat-implementation' of https://github.com/huggingf…
d837d658
use cleanup util for tests
77900002
update conversion script
59404f8c
add docstring to nanochat config
5169f333
update auto config with NanoChatForCausalLM
b242b80c
remove forward snippet
37369312
fix auto modeling
41788c25
delete excess modelling file
79a1e7c9
delete excess modelling file and regenerate modeling
6126f3f6
Merge branch 'nanochat-implementation' of https://github.com/huggingf…
a572394f
revert changes and use less llama modules
99801d04
vasqu
commented
on 2025-10-18
use llama rope not custom
57e0e357
vasqu
commented
on 2025-10-20
cut down on notes and docstrings
a021cd58
revert again to custom rope but this time using Qwen3 attention
2598bf4a
tidy notes in attention module
1ce7e423
update cache kwargs with sin and cos
32f3baa8
inheret from CLIPMLP and overwrite projections
76305b7b
inheret llama decode layer
0ed59dfb
re use more from LlamaPreTrainedModel
ff7bd71f
inherit LlamaModel and copy forward except for initial_norm
321026ed
inheret more from LlamaForCausalLM
f2608eee
more modular friendly
39748618
Merge remote-tracking branch 'upstream/main' into nanochat-implementa…
3c1d5135
adjust to changes in main
cd411f1a
vasqu
commented
on 2025-10-21
last nits
e3a8d539
apparently we do this now lol
70e8e6b4
mustve been a blank smthn smthn
208e9bb1
update conversion script to use original config
57f128e8
fix attetion_bias everywhere
fa1371f1
resolve todos in config
61f9d987
todos: layer names and repo id in modular
cd6e5eae
last todos in config and modelling
d38e3a75
usng karapthy repo
5f3607fc
inherit from gemma instead
50a9db6e
vasqu
commented
on 2025-10-22
fix vaultgemma
98446c91
Merge branch 'main' into nanochat-implementation
88e12fba
ci common
38051c66
do rope config after super init
5940a501
update logit values in tests
40e1a36d
update logit tests one last time
b0a3dc4b
update and format conversion script
cce09b78
Merge branch 'nanochat-implementation' of https://github.com/huggingf…
9352a0a5
vasqu
commented
on 2025-10-22
vasqu
commented
on 2025-10-22
update all logits again
234cae71
update test tollerence
b548505f
fix tests
afdddccb
auto tokenizer
85536f49
vasqu
commented
on 2025-10-22
vasqu
commented
on 2025-10-22
fix test, to be updated upstream on the hub
a543a7c4
Merge branch 'main' into nanochat-implementation
ecd18445
vasqu
commented
on 2025-10-27
Merge branch 'main' into nanochat-implementation
ae367e63
Merge branch 'main' into nanochat-implementation
72db36a6
fix
8f85d673
update conversion script to deal with new d34 model
fd89210f
Merge branch 'main' into nanochat-implementation
0adff1e5
Merge branch 'main' into nanochat-implementation
d6a4a218
fix eval in docs
8be09b65
Apply suggestions from code review
01f3009e
no separate initial_norm and do not iter over module
185bec7d
update _tp_plan
35570e30
update modeling with changes
78128639
Merge branch 'main' into nanochat-implementation
9f5b2d1b
style
0988889f
fix tp_plan
da382f1f
oupppss this one almost slipped through nice I checked
72956075
Cyrilvallez
deleted the nanochat-implementation branch 25 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub