PR #41634 [MODEL] Nanochat implementation

first draft on modelling

99b3577d

add modelling to auto

27672521

add to init

2223aa0a

[WIP] add scripts for conversion

bbfd853c

hack at attention and rotary using logit comparison unitl it works

4ac6a758

update conversion script

5f8bfeda

fix test

216c5d44

tody up decoding inputs

fa2874c4

fix all naming nanogpt >> nanochat

f234f75c

add return as tensors

992b26b8

use batch encoding

9acd3f25

xenova commented on 2025-10-15

include tokenizer conversion in the conversion script

1af42532

add repo revision to vocab pull

5af8672e

move attention into func and add kward to all signatures (for vllm)

9583d9b1

rename everything NanoGPT >> NanoChat

d83458d3

improve docstrings in modelling code

efa48d4e

run ruff

a3bed2cf

delete working test file

9e1f4eae

remove custom tokenizer

557cfe20

protect imports with lazy

5977a785

removed unused variables

bd83c278

format config

6e81f591

sort auto model configs

939f544c

make sure nanochat config is in all (fixes vllm)

7fd570cb

integrate tests

0fda1c0f

remove custom tokenizer from autos

7e545679

formate tests

0f20bbfb

add documentation for nanochat

6acd62b6

update toc with nanochat

8c59e3e5

Merge branch 'main' into nanochat-implementation

a2ff0e46

burtenshaw marked this pull request as ready for review 66 days ago

burtenshaw changed the title ~~[DRAFT] Nanochat implementation~~ [MODEL] Nanochat implementation 66 days ago

xenova commented on 2025-10-16

lose some attention notes

2fdf98c4

vasqu commented on 2025-10-16

remove excess configuration

7bd50312

move init

8409b2d2

move script into modeling dir

da67e4be

switch to modular and inheret llama

a523ce44

add causal flag

da8aa74f

remove excess and just set params in model config

5fec3456

liscence

afba61fd

update model flags

16a0207d

re use init weights

f57a2951

lose functionals from the MLP

1bce97c5

use rmsnorm module instead of functional

743ba1b1

inherit rms from NanoChatRMSNorm

06a60bac

run modular convert and update modeling

658d532f

update docs

0e38fe96

remove excess changes in auto

78dbd223

Update src/transformers/models/nanochat/configuration_nanochat.py

f9d1511a

add a basic tp/pp plan

d71a75d7

Merge branch 'nanochat-implementation' of https://github.com/huggingf…

d837d658

use cleanup util for tests

77900002

update conversion script

59404f8c

add docstring to nanochat config

5169f333

update auto config with NanoChatForCausalLM

b242b80c

remove forward snippet

37369312

fix auto modeling

41788c25

delete excess modelling file

79a1e7c9

delete excess modelling file and regenerate modeling

6126f3f6

Merge branch 'nanochat-implementation' of https://github.com/huggingf…

a572394f

revert changes and use less llama modules

99801d04

vasqu commented on 2025-10-18

use llama rope not custom

57e0e357

vasqu commented on 2025-10-20

cut down on notes and docstrings

a021cd58

revert again to custom rope but this time using Qwen3 attention

2598bf4a

tidy notes in attention module

1ce7e423

update cache kwargs with sin and cos

32f3baa8

inheret from CLIPMLP and overwrite projections

76305b7b

inheret llama decode layer

0ed59dfb

re use more from LlamaPreTrainedModel

ff7bd71f

inherit LlamaModel and copy forward except for initial_norm

321026ed

inheret more from LlamaForCausalLM

f2608eee

more modular friendly

39748618

Merge remote-tracking branch 'upstream/main' into nanochat-implementa…

3c1d5135

adjust to changes in main

cd411f1a

vasqu commented on 2025-10-21

last nits

e3a8d539

apparently we do this now lol

70e8e6b4

mustve been a blank smthn smthn

208e9bb1

update conversion script to use original config

57f128e8

fix attetion_bias everywhere

fa1371f1

resolve todos in config

61f9d987

todos: layer names and repo id in modular

cd6e5eae

last todos in config and modelling

d38e3a75

usng karapthy repo

5f3607fc

inherit from gemma instead

50a9db6e

vasqu commented on 2025-10-22

fix vaultgemma

98446c91

Merge branch 'main' into nanochat-implementation

88e12fba

ci common

38051c66

do rope config after super init

5940a501

update logit values in tests

40e1a36d

update logit tests one last time

b0a3dc4b

update and format conversion script

cce09b78

Merge branch 'nanochat-implementation' of https://github.com/huggingf…

9352a0a5

vasqu commented on 2025-10-22

update all logits again

234cae71

update test tollerence

b548505f

fix tests

afdddccb

auto tokenizer

85536f49

vasqu commented on 2025-10-22

fix test, to be updated upstream on the hub

a543a7c4

vasqu requested a review from

ArthurZucker 59 days ago

Merge branch 'main' into nanochat-implementation

ecd18445

vasqu commented on 2025-10-27

Merge branch 'main' into nanochat-implementation

ae367e63

Merge branch 'main' into nanochat-implementation

72db36a6

fix

8f85d673

update conversion script to deal with new d34 model

fd89210f

Merge branch 'main' into nanochat-implementation

0adff1e5

Merge branch 'main' into nanochat-implementation

d6a4a218

Cyrilvallez commented on 2025-11-24

fix eval in docs

8be09b65

Apply suggestions from code review

01f3009e

no separate initial_norm and do not iter over module

185bec7d

update _tp_plan

35570e30

update modeling with changes

78128639

Merge branch 'main' into nanochat-implementation

9f5b2d1b

style

0988889f

fix tp_plan

da382f1f

Cyrilvallez approved these changes on 2025-11-27

oupppss this one almost slipped through nice I checked

72956075

Cyrilvallez merged 7c8d72bd into main 25 days ago

Cyrilvallez deleted the nanochat-implementation branch 25 days ago

transformers
[MODEL] Nanochat implementation
#41634

Merged

[MODEL] Nanochat implementation #41634

transformers [MODEL] Nanochat implementation #41634 Merged

[MODEL] Nanochat implementation #41634

transformers
[MODEL] Nanochat implementation
#41634

Merged