PR #545 🚀🚀🚀 Transformers.js V3 🚀🚀🚀

xenova1 year ago (edited 211 days ago)👍 31🎉 2❤ 21🚀 32

In preparation for Transformers.js v3, I'm compiling a list of issues/features which will be fixed/included in the release.

Useful commands:

Pack
```
npm pack
```
Publish dry-run
```
npm publish --dry-run
```
Publish dry-run w/ tag
```
npm publish --dry-run --tag dev
```

Bump alpha version

npm version prerelease --preid=alpha -m "[version] Update to %s"

How to use WebGPU

First, install the development branch

npm install @huggingface/transformers

Then specify the device parameter when loading the model. Here's example code to get started. Please note that this is still a WORK IN PROGRESS, so the following usage may change before release.

import { pipeline } from '@huggingface/transformers';

// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {
    device: 'webgpu',
    dtype: 'fp32', // or 'fp16'
});

// Generate embeddings
const sentences = ['That is a happy person', 'That is a very happy person'];
const output = await extractor(sentences, { pooling: 'mean', normalize: true });
console.log(output.tolist());

HuggingFaceDocBuilderDev1 year ago👍 3

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

xenova marked this pull request as draft 1 year ago

Huguet571 year ago

Hey! This is great. Is this already in alpha?

kishorekaruppusamy1 year ago👍 4

Team, is there any tentative time to release this v3 alpha ???

jhpassion06211 year ago👍 2❤ 1

I can't wait anymore :) Please update me when it will be released!

jhpassion06211 year ago (edited 1 year ago)

@xenova Can I test v3-alpha by using NPM? When I try to run, I get this issue.

kishorekaruppusamy1 year ago

@xenova Can I test v3-alpha by using NPM? When I try to run, I get this issue.

use this https://github.com/kishorekaruppusamy/transformers.js/commit/7af8ef1e5c37f3052ed3a8e38938595702836f09 commit to resolve this issue ...

jhpassion06211 year ago

Thanks for your reply @kishorekaruppusamy I tried with your branch and I got other issues.

Please give me your advise!

kishorekaruppusamy1 year ago (edited 1 year ago)

Thanks for your reply @kishorekaruppusamy I tried with your branch and I got other issues. Please give me your advise!

https://github.com/kishorekaruppusamy/transformers.js/blob/V3_BRANCH_WEBGPU_BUG_FIX/src/backends/onnx.js#L144
change this url to local dist dir inside build ..

jhpassion06211 year ago (edited 1 year ago)

Thanks @kishorekaruppusamy
I downloaded the latest wasm from onnxruntime and added in local directory but I got same issue

I realized transformer.js v3 uses onnxruntime 1.16.3 so I created wasm by using onnxruntime 1.16.3 and tested and I got same issue

please give your advise. Thanks

NawarA1 year ago

@xenova it looks like #596 is part of this release?! I think that means onnx_data files will be supported?

If true, I'm stoked!

Beyond upgrading ort to 1.17, are there other changes needed to support models with onnx_data files? Happy to try to lend a hand if possible

xenova1 year ago👍 2🚀 5

Hi everyone! Today we released our first WebGPU x Transformers.js demo: The WebGPU Embedding Benchmark (online demo). If you'd like to help with testing, please run the benchmark and share your results! Thanks!

khmyznikov1 year ago

@xenova can this bench pick the GPU 1 instead of 0? For the laptops with dGPU

xenova1 year ago (edited 1 year ago)

@xenova can this bench pick the GPU 1 instead of 0? For the laptops with dGPU

Not currently, but this is being worked on here: microsoft/onnxruntime#19857. We will add support here once ready.

felladrin commented on 2024-03-12

Conversation is marked as resolved

Show resolved

xenova1 year ago🎉 3

@beaufortfrancois - I've added the source code for the video background removal demo. On my device, I get ~20fps w/ WebGPU support (w/ fp32 since fp16 is broken). Here's a screen recording (which drops my fps to ~14):

webgpu-modnet.mp4

Model used: https://huggingface.co/Xenova/modnet (~4 years old, and it clearly struggles on hands moving quickly). I will try on more up-to-date models soon.
Video tested: https://www.youtube.com/watch?v=NXpdyAWLDas
Online demo: https://huggingface.co/spaces/Xenova/webgpu-video-background-removal

beaufortfrancois1 year ago (edited 1 year ago)

@beaufortfrancois - I've added the source code for the video background removal demo. On my device, I get ~20fps w/ WebGPU support (w/ fp32 since fp16 is broken). Here's a screen recording (which drops my fps to ~14):

You rock. Thanks! It's a cool demo! 👍

I've been wondering how we could improve it:

I've noticed you read the current frame of the video on the main thread. Would it help to move the entire demo to a web worker?
output[0].mul(255).to('uint8') takes some non negligible time to run. Is there a faster path?
How much you expect fp16 to improve perf? In https://developer.chrome.com/blog/new-in-webgpu-120#support_for_16-bit_floating-point_values_in_wgsl, we've noticed on an Apple M1 Pro device that the f16 implementation of Llama2 7B models used in the WebLLM chat demo is significantly faster than the f32 implementation, with a 28% improvement in prefill speed and a 41% improvement in decoding speed.
A way to feed a GPUExternalTexture to the model as an input could also come handy.

beaufortfrancois commented on 2024-03-14

src/utils/devices.js

	1		/**
	2		* @typedef {'cpu'\|'gpu'\|'wasm'\|'webgpu'\|null} DeviceType

beaufortfrancois1 year ago

Out of curiosity, what is 'gpu'?

xenova1 year ago👍 3

It's meant to be a "catch-all" for the different ways that the library can be used with GPU support (not just in the browser with WebGPU). The idea is that it will simplify documentation, as transformers.js will select the best execution provider depending on the environment. For example, DML/CUDA support in onnxruntime-node (see microsoft/onnxruntime#16050 (comment))

Of course, this is still a work in progress, so it can definitely change!

hans001 year ago (edited 1 year ago)

device: 'webgpu',

For some environement it better be list.
Because there not all execution proveders support all oprtators.
For my use-case I'm give a list of EP orderd by priority, let onnxruntime auto fallback.
For example: ['nnapi', 'xnnpack', 'cpu'] for Android / ['qnn', 'dml', 'xnnpack', 'cpu'] for Windows ARM64 (custom build)

young-developer1 year ago (edited 1 year ago)

UPDATE: Looks like some kernels are not supported for quant operations :/

I tested WebGPU version on https://huggingface.co/Xenova/wav2vec2-bert-CV16-en with changes from v3 and model (quantized) is loaded without errors but after running transcribe it throws error with message:
An error occurred during model execution: "Error: [WebGPU] Kernel "[Split] /wav2vec2_bert/encoder/layers.0/conv_module/glu/Split" failed. Error: no GPU data for output: 0".

[E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Split node. Name:'/wav2vec2_bert/encoder/layers.0/conv_module/glu/Split' Status Message: Failed to run JSEP kernel

Is it some quantization error or onnxruntime error?

Logs localhost-1710758687772.log
Env: Windows, Chrome 122, Nvidia Geforce 3090

xenova1 year ago👍 3

@young-developer Thanks for the report. I will cc @guschmue for this unsupported operator. It may already be fixed in the dev branch of onnxruntime-web.

@hans00 For more advanced use-cases, you can update the session options directly with session_options: {...} in the model options.

young-developer1 year ago

FYI @xenova I was able to load the model in fp32 and got the same error. I also tried to load in fp16 but it throws input error is (float) instead of (float16) so I assume inputs should be converted to fp16 too.

xenova1 year ago🎉 4❤ 1🚀 1

Exciting news 🥳 We've got Musicgen working! Example usage:

import { AutoTokenizer, MusicgenForConditionalGeneration } from '@xenova/transformers';

// Load tokenizer and model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/musicgen-small');
const model = await MusicgenForConditionalGeneration.from_pretrained(
  'Xenova/musicgen-small', { dtype: 'fp32' }
);

// Prepare text input
const prompt = '80s pop track with bassy drums and synth';
const inputs = tokenizer(prompt);

// Generate audio
const audio_values = await model.generate({
  ...inputs,
  max_new_tokens: 512,
  do_sample: true,
  guidance_scale: 3,
});

// (Optional) Write the output to a WAV file
import wavefile from 'wavefile';
import fs from 'fs';

const wav = new wavefile.WaveFile();
wav.fromScratch(1, model.config.audio_encoder.sampling_rate, '32f', audio_values.data);
fs.writeFileSync('musicgen_out.wav', wav.toBuffer());

Samples:

sample_1.mp4

sample_2.mp4

sample_3.mp4

flatsiedatsie1 year ago (edited 1 year ago)

Would it be helpful if I created an example for MusicGen? (based on your example code, but as a small stand-along html page)

young-developer1 year ago (edited 1 year ago)

@xenova There is new version 1.17.3 of onnxruntime-web. I tested with wav2vec and there is new error so looks like a progress 😄

xenova1 year ago (edited 1 year ago)❤ 1

Segment Anything Encoder now works with WebGPU: up to 8x faster! (online demo)

sam-webgpu.mp4

xenova1 year ago😄 2🎉 7🚀 8

Phi-3 WebGPU support is now working! Demo: https://huggingface.co/spaces/Xenova/experimental-phi3-webgpu

phi3-webgpu-demo.mp4

josephrocca1 year ago (edited 1 year ago)👍 5

Does anyone have a guide for how to get this bundled into a script, akin to a JSDelivr URL? Here's what I tried:

// index.js
export * from 'transformers.js'; // Adjust if the import path differs

npm install  xenova/transformers.js#v3
npm install rollup @rollup/plugin-node-resolve rollup-plugin-terser --save-dev

// rollup.config.js
import resolve from '@rollup/plugin-node-resolve';
import { terser } from 'rollup-plugin-terser';

export default {
  input: 'index.js',
  output: {
    file: 'bundle.js',
    format: 'esm',
    sourcemap: true
  },
  plugins: [
    resolve({
      browser: true,
    }),
    terser()
  ]
};

And in package.json:

"scripts": {
  "build": "rollup -c"
}

And then:

npm run build

And that produced a bundle.js, but it was looking for webgpu.proxy.min.js on jsdelivr, which doesn't exist where it was looking. I tried manually adjusting to URL in the bundle to point to the ort.webgpu.min.js file, but no luck (I also tried esm/ort.webgpu.min.js). I'm guessing there are some tricky things due to the dynamic nature of backend loading that bundlers struggle to automatically pick up.

@xenova Alternatively, I wonder if you'd be able to do some v3 alpha/prealpha releases via github tags so that jsdelivr picks them up? Since there's no way (IIUC) to simply reference a branch via jsdelivr (due to immutability requirement I assume).

xenova1 year ago🎉 2❤ 2

The latest commits add support for Moondream2, a small vision language model by @vikhyat designed to run efficiently on edge devices.

Try it out yourself with the live demo: https://huggingface.co/spaces/Xenova/experimental-moondream-webgpu

moondream-webgpu-2.mp4

Usage:

import { AutoProcessor, AutoTokenizer, Moondream1ForConditionalGeneration, RawImage } from '@xenova/transformers';

// Load processor, tokenizer and model
const model_id = 'Xenova/moondream2';
const processor = await AutoProcessor.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
const model = await Moondream1ForConditionalGeneration.from_pretrained(model_id, {
    dtype: {
        embed_tokens: 'fp16', // or 'fp32'
        vision_encoder: 'fp16', // or 'q8'
        decoder_model_merged: 'q4', // or 'q4f16' or 'q8'
    },
    device: 'webgpu',
});

// Prepare text inputs
const prompt = 'Describe this image.';
const text = `<image>\n\nQuestion: ${prompt}\n\nAnswer:`;
const text_inputs = tokenizer(text);

// Prepare vision inputs
const url = 'https://huggingface.co/vikhyatk/moondream1/resolve/main/assets/demo-1.jpg';
const image = await RawImage.fromURL(url);
const vision_inputs = await processor(image);

// Generate response
const output = await model.generate({
    ...text_inputs,
    ...vision_inputs,
    do_sample: false,
    max_new_tokens: 64,
});
const decoded = tokenizer.batch_decode(output, { skip_special_tokens: false });
console.log(decoded);
// [
//     '<|endoftext|><image>\n\n' +
//     'Question: Describe this image.\n\n' +
//     'Answer: A hand is holding a white book titled "The Little Book of Deep Learning" against a backdrop of a balcony with a railing and a view of a building and trees.<|endoftext|>'
// ]

xenova364 days ago (edited 364 days ago)👍 2🎉 1🚀 1

VLMs now support PKV caching. Demo: https://huggingface.co/spaces/Xenova/experimental-nanollava-webgpu

nanollava-webgpu.mp4

Example code

import { AutoProcessor, AutoTokenizer, LlavaForConditionalGeneration, RawImage } from '@xenova/transformers';

// Load tokenizer, processor and model
const model_id = 'Xenova/nanoLLaVA';
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await LlavaForConditionalGeneration.from_pretrained(model_id, {
    dtype: {
        embed_tokens: 'fp16', // or 'fp32' or 'q8'
        vision_encoder: 'fp16', // or 'fp32' or 'q8'
        decoder_model_merged: 'q4', // or 'q8'
    },
    // device: 'webgpu',
});

// Prepare text inputs
const prompt = 'What does the text say?';
const messages = [
    { role: 'system', content: 'Answer the question.' },
    { role: 'user', content: `<image>\n${prompt}` }
]
const text = tokenizer.apply_chat_template(messages, { tokenize: false, add_generation_prompt: true });
const text_inputs = tokenizer(text);

// Prepare vision inputs
const url = 'https://huggingface.co/qnguyen3/nanoLLaVA/resolve/main/example_1.png';
const image = await RawImage.fromURL(url);
const vision_inputs = await processor(image);

// Generate response
const { past_key_values, sequences } = await model.generate({
    ...text_inputs,
    ...vision_inputs,
    do_sample: false,
    max_new_tokens: 64,
    return_dict_in_generate: true,
});

// Decode output
const answer = tokenizer.decode(
    sequences.slice(0, [text_inputs.input_ids.dims[1], null]),
    { skip_special_tokens: true },
);
console.log(answer);
// The text reads "Small but mighty".

const new_messages = [
    ...messages,
    { role: 'assistant', content: answer },
    { role: 'user', content: 'How does the text correlate to the context of the image?' }
]
const new_text = tokenizer.apply_chat_template(new_messages, { tokenize: false, add_generation_prompt: true });
const new_text_inputs = tokenizer(new_text);

// Generate another response
const output = await model.generate({
    ...new_text_inputs,
    past_key_values,
    do_sample: false,
    max_new_tokens: 256,
});
const new_answer = tokenizer.decode(
    output.slice(0, [new_text_inputs.input_ids.dims[1], null]),
    { skip_special_tokens: true },
);
console.log(new_answer);
// The context of the image is that of a playful and humorous illustration of a mouse holding a weightlifting bar. The text "Small but mighty" is a playful reference to the mouse's size and strength.

beaufortfrancois347 days ago (edited 347 days ago)👍 2

@xenova For some models, the performance may be a blocker. Since model downloads can be quite large, I wonder if there should be a way for web developers to know their machine performance class for running a model without downloading it completely first.

I believe this would involve running the model code with zeroed-out weights, which would still require buffer allocations but would allow the web app to catch out-of-memory errors or such. The model architecture would still needed to generate shaders, but this be much smaller than model weights.

Essentially, knowing the model architecture and testing with empty weights would allow for assessing performance capability without downloading the full model.

I thought I could use from_config for that but I wonder now if this should be a built-in V3 feature. What are your thoughts?

xenova346 days ago❤ 1

@beaufortfrancois That would be amazing to have! Although, it's probably best suited as a feature request for onnxruntime-web. The way one could do it is to use the external data format to save models into two parts: graph-only (<1MB usually) and weights, and then initialize an empty session from the graph without loading the weights. @guschmue might have additional insights.

beaufortfrancois345 days ago (edited 345 days ago)👍 1

Thank you @xenova for your support ❤️

@guschmue What are your thoughts on #545 (comment)?
I'm happy to file a feature request in https://github.com/microsoft/onnxruntime

guschmue341 days ago

@beaufortfrancois, yes some utility class that helps applications to decide what hardware capabilities are there before a model is loaded is on my wish list since some time.
We have not gotten to it but should find time soon I hope.
It would need to tell how mighty your gpu is, if there is a npu (in the future if there is webnn), if it is visible to run the model on wasm.
Not trivial to get this right on first try so I'd expect a few iterations on it.
It also would need a lot of feedback and help from application developers.

Filing a feature request would be good, then we have a place to track it.

beaufortfrancois341 days ago👍 1🎉 2

@guschmue I've filed microsoft/onnxruntime#20998 to track this feature request. How would we be able to help out there?

guschmue340 days ago👍 1

we'd need to come up with a nice api. the info one can get from webgpu is very sparse and imo not good enough to make this work.
The way I see this work goes like this:
we define a couple of model classes, ie:
llm, vision, speech

based on this we'd run some shaders briefly to determine relevant flops that are used in the selected class.

the result would be some raw flops number, or some class based on some heuristics a class like 'good enough for 500M parameters' and some hints from the webgpu info ...

Application could cache this so the detection needs to run only the first time.

Maybe there would be an offline tool that you can run your model through to capture some data what the model needs.

We would need help defining this (ie what classes) and then get a lot of feedback to tune this to practial values.

But this is just what I think how this would work, very open to other suggestions.

maudnals330 days ago (edited 330 days ago)

@guschmue Sounds great, +1 on the need to integrate feedback from application developers. One early comment/question on the potential API, wrt your "good enough for 500M parameters" example: are you referring to "fast enough"? If so, it may be convenient for application developers to not only get as an output a bucketized speed estimate (e.g. for a given 500MB params model, x-slow / slow / medium/ fast), but also to be able access the raw timings. How long end-users are willing to wait for an output may be use-case specific.

xenova330 days ago👍 1

Experimental Florence2 support has been added! 🥳 (closes #815)

Example code:

import {
    Florence2ForConditionalGeneration,
    AutoProcessor,
    AutoTokenizer,
    RawImage,
} from '@xenova/transformers';

// Load model, processor, and tokenizer
const model_id = 'onnx-community/Florence-2-base-ft';
const model = await Florence2ForConditionalGeneration.from_pretrained(model_id, {
    dtype: 'fp32',
});
const processor = await AutoProcessor.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);

// Load image
const url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true";
const image = await RawImage.fromURL(url);

// Process inputs
const prompts = "Describe with a paragraph what is shown in the image.";
const text_inputs = tokenizer(prompts);
const vision_inputs = await processor(image);

// Generate text
const generated_ids = await model.generate({
    ...text_inputs,
    ...vision_inputs,
    max_new_tokens: 100,
});

// Decode generated text
const generated_text = tokenizer.batch_decode(generated_ids, { skip_special_tokens: true });
console.log(generated_text);

generates

'A green car is parked in front of a tan building. There is a brown door on the building behind the car. There are two windows on the front of the building. '

I'm still working on adding support for other tasks and improving processing methods, but this is a good start. Another issue is that the vision encoder doesn't work on WebGPU (but other submodules do). cc @guschmue for this.

Early dereferencing for performance boosts

0dba2661

cleanup

5e4e20fb

Move quantization logic to `quantize.py`

dd6af93f

update deps

04af3d57

Fix q4 quantization

91286517

save q4 quantization

83cbb218

Add decode ASR test

eb613441

Do not process last chunk unnecessarily

cec24005

fp16 disable_shape_infer if model is too large

c835b543

Use `check_and_save_model` for saving fp16 model

45cd8d4d

Reorder functions

88f3e441

formatting

23440f00

Remove debug log

b411e9fd

Fix q8 quantization for models > 2GB

04a334a5

correct attribute

cd1ea697

Fix `TextGenerationPipeline`

a167f6e2

Fix pauses in whisper word-level timestamps

ea732896

Formatting

344af32a

Sort added tokens by length to avoid early partial matches

c305c382

Add new tokenizer test

d6f6fd47

Only finish with newline if running in Node.js

1557b8d0

Consider token timestamps when selecting longest common sequence

9ac7ceb4

Create whisper word-level timestamps demo

79ed46ed

cleanup

8da68866

Fallback to WASM if WebGPU not supported

d709bd07

Reload model for each quantization mode

9ef3a6d0

Update converstion script requirements

9787b75a

Separate IO and Quantization args

974f0862

Use `const` where possible

d0428688

Add `InterruptableStoppingCriteria`

1b4d2428

`@xenova/transformers` -> `@huggingface/transformers`

31101c82

Override semver version

e84322b5

Add support for pyannote models

bd943340

Update README.md

3dbc633b

Add listed support for pyannote

858e55d1

Add pyannote example code

8bf03494

Support specifying `min_num_frames`

c52618cf

Support simultaneous instantiation of multiple inference sessions

96f19b06

Support broadcasting encoder outputs over decoder inputs

4ad43e21

Fix test

c6aeb4be

fix bundler config for latest ORT

6d3ea4bc

Only check fp16 support for webgpu device

38a3bf6d

Remove default chat templates

9df84c43

Add support for gemma2

fc3d860f

Add gemma2 generation test

939920d2

Update gemma2 config mapping

5bb93a06

Prioritize high-performance adapter when possible

72ec168f

Set defaults for `tools` and `documents` in `apply_chat_template`

9068a531

bump `@huggingface/jinja` -> 0.3.0

824538bc

Add `apply_chat_template` default parameters unit test

836c0afe

Merge branch 'v3' into @huggingface/transformers

487d8b20

Add prettier

1f6e0e16

prettier format config files

55494d18

remove incorrect comment

5a68461b

Merge branch 'pr/864' into @huggingface/transformers

437cb34e

Update onnxruntime-web version

5a6c9267

Update webpack.config.js

b19251b8

Fix copy path

820c1e26

Run `npm ci`

b0dab917

Fix bundling

86b9b621

Do not set `preferredOutputLocation` if we are proxying

222b94ed

Merge branch 'v3' into @huggingface/transformers

b326cc94

Update `@webgpu/types`

ca67092f

Update SAM example

42076fda

Use `??=` operator where possible

48d31424

Fix commonjs usage

3b1a4fd9

Mark `onnxruntime-node` and `sharp` as externals

9a73b5ed

Move `externals` into config

9951aa5d

Downgrade to onnxruntime 1.18.0

c04d37e6

Finalize module/commonjs build

d32fe2bc

Separate web and node builds

1530d509

[version] Update to 3.0.0-alpha.1

b4df0e25

Default to CDN-hosted .wasm files

ab59c516

[version] Update to 3.0.0-alpha.2

866b2198

bump versions

4a3398d1

[version] Update to 3.0.0-alpha.3

8891a142

Merge branch 'improve-conversion-script' into v3

a315933b

Consolidate conversion and quantization script

12569b8f

Downgrade `onnxconverter-common`

83f57181

Link to types in exports

6fa5fa6c

Update list of supported tasks

2f1b2105

Fixed unit tests

27bc55d7

Update imports

23d11500

Bump versions to `3.0.0-alpha.4`

f9070dca

[version] Update to 3.0.0-alpha.4

c3494e1b

Fix "Default condition should be last one"

973fb0dc

Bump versions

7376ecf9

[version] Update to 3.0.0-alpha.5

0a04bc07

Update next.js client-side demo

e4603cd9

Initial WebNN Support

ff1853ce

Mark fs, path and url as external packages for node build

15574bcf

Move content type map outside of `FileResponse` object

72828625

Add GPU support for Node.js

22f7cede

Bump versions

1e319a4c

[version] Update to 3.0.0-alpha.6

d278891f

Fix conflicts

3fefa17a

bump dependency versions

fa6cc70f

Add support for device auto-detection

7fa53265

Fix default device selection

4ec77c1a

Merge branch 'pr/ibelem/890-1' into v3

5799e304

Improve WebNN selection

5b2cac21

Skip token callback if `skip_prompt` is set

ad23c50c

Bump versions

5b84b62a

[version] Update to 3.0.0-alpha.7

bcf6a86f

bump versions

b97ed0d8

[version] Update to 3.0.0-alpha.8

c5b70838

bump versions

cbeefded

[version] Update to 3.0.0-alpha.9

59600f24

Add support for Sapiens

b2e025a0

Update default ONNX env

8661d951

Fix types

57db34db

Topologically sort fp16 nodes

1b7f9789

Add marian unit test

45d1526e

Re-order imports

b903757c

Fix `NoBadWordsLogitsProcessor`

633976f7

Update package.json

24d8787e

[jest] Disable coverage

9412ec46

Bump versions

08e73881

[version] Update to 3.0.0-alpha.10

d5a8f87a

Improve node/web interoperability

7843ad07

Fix scripts/requirements.txt

bf093aec

Bump versions

9a5ee429

[version] Update to 3.0.0-alpha.11

535cdfe5

Add support for JAIS models (#906)

4e1acf04

Add JAIS to README

488548d0

Fix node/web interop (again)

13aed411

Bump versions

7655f81c

[version] Update to 3.0.0-alpha.12

1c7e2267

Set `SapiensForNormalEstimation` to encoder-only

ab6b28b6

Implement `sub` tensor operation

66c05d56

Bump versions

31e8b2ae

[version] Update to 3.0.0-alpha.13

bf3f7d5f

Improve typing for `wrap` helper function

c0253561

Update `preferredOutputLocation` type

7ebdaf21

Make `wrap` type more generic

3b8ddcbc

Re-use `segmentation_data`

a385c6e4

Fix `min` type

537e9586

Add support for Hiera models

bcb28b34

Fix reused loop variable (closes #910)

d21c87cd

Add logits processor test file

1d281f63

Fix test imports

ba0427f4

Bump versions

3bc3e86c

[version] Update to 3.0.0-alpha.14

0518960d

Add another `bad_words` logits processor test (closes #913)

552cdea6

Add support for GroupViT

3422a8bc

Add zero-shot-image-classification unit test

3599902a

Add maskformer model definitions

5892ee81

Support universal image segmentation in `image-segmentation` pipeline

c4dac775

Add support for PVT models

f0c47bed

Add `post_process_instance_segmentation` function template

d80d3a4c

Add `library_name` option to convert.py

844099df

Wrap onnxslim with try block

ba5d7252

Use const where possible

b3691c81

Use const where possible (again)

dcf117f2

Create `MaskFormerFeatureExtractor`

9af026c5

Add support for MaskFormer

0f8200c5

Improve tool-use chat template detection

e278c5e9

Add object detection pipeline unit test

83fa58f0

Add support for ViTMSN and VitMAE

86d6da46

jlucaso1253 days ago (edited 253 days ago)

@xenova the option of use a quantized model doesn't exists anymore?

I'm trying to use https://huggingface.co/Xenova/trocr-base-handwritten/blob/main/onnx/encoder_model_quantized.onnx

Bump ORT versions

93b25fb2

Create `get_chat_template` helper function

2f680ee7

Fix CI

2f9b2ed9

Run prettier on `tests/**`

deec3504

move certain tests to utils subfolder

48fa226e

xenova marked this pull request as ready for review 250 days ago

Bump onnxruntime-web version

a10828f4

Bump `onnxruntime==1.19.2` in scripts/requirements.txt

ba58ea24

Merge branch 'main' into v3

4f17e954

Merge branch 'main' into v3

c40a1512

Sort `this.added_tokens` before creating regex (`.toSorted` is not av…

30315b21

Rather make a copy of `this.added_tokens`

d7df5758

Fix `.tokenize` with `fuse_unk=true`

a519379b

Add blenderbot tokenizer tests

89ddccf5

Add t5 tokenizer tests

36ad144b

Add falcon tokenizer tests

4765dd63

Run prettier

fd8b9a25

Add ESM tokenizer tests

710816ef

Run unit tests in parallel

0d3cd309

Fix `fuse_unk` for tokenizers with `byte_fallback=true` but no byte f…

cc258c23

Add llama tokenizer unit tests

4798755c

Update emoji test string names

c6c3ae18

Move whisper-specific unit tests to subfolder

79a74095

Code formatting

1a388048

Bump versions

dabe6ae3

[version] Update to 3.0.0-alpha.15

54f1f214

Add emoji tokenizer test cases for LlamaTokenizer

a912d796

Attempt to fix encoder-decoder memory leak

969d10e1

Remove unused code

072cbbce

Fix BertNormalizer (strip `Mn` unicode characters)

14b4bd4a

Handle ZERO WIDTH JOINER (U+200D) characters

67977718

Add more spm normalization characters

f148afd6

Add emoji unit tests for bert/t5

ca4b5b98

[WebNN] Add support for specifying `free_dimension_overrides` in config

113c81ea

Log warning if webnn is selected by `free_dimension_overrides` is not…

9005accf

Fix unigram for multi-byte tokens

682c7d05

Add gemma tokenizer tests

4a31e549

Allow user to specify device and dtype in config.json

7a160655

Update dependency versions

4c1d21ba

Bump versions

3c6a95a0

[version] Update to 3.0.0-alpha.16

ac391d24

Add CLIP tokenizer unit tests

d30d3b7a

Add more tokenizer tests

e089ef4c

Bump onnxruntime-web version

2c9e271f

Bump versions

ee1e32a2

[version] Update to 3.0.0-alpha.17

f41e995b

Add support for new `tokenizers>=0.2.0` BPE serialization format

9a42cf32

Bump onnxruntime-web version

f534b352

Bump versions

0c8b1af1

[version] Update to 3.0.0-alpha.18

2ca41780

Keep encoder outputs on GPU

a82e7ef0

Update whisper-webgpu demo dependencies

c37a38cd

Bump versions

e1c4fc69

[version] Update to 3.0.0-alpha.19

fe51609a

Support to load ONNX APIs based on JS runtime (#947)

b5188664

Allow specification of `use_external_data_format` in custom config

95c8cc55

Update deberta unit tests

03eb77bf

Update roberta tokenizer tests

c61a76ba

Support inferringunigram tokenizer type

32d8df40

Reuse tokenizer tests for original t5-small

6505abb1

Remove redundant null coalesce

96192182

Enable unit test coverage reports

52c4ce70

Use `PROBLEMATIC_REGEX_MAP` for bloom tokenizer

12edaf08

Improve tokenizer unit tests

5e7e82b9

Update tokenizer unit tests

795a61a3

Remove unused code

77ebe0de

Add m2m_100 tokenizer unit tests

56eda3bd

Add m2m translation pipeline unit test

2040ad5d

Add support for Depth Pro models

8718c176

Add whisper turbo alignment heads

a32efa3d

Remove in-library list of supported models

8b0d330a

Bump versions

cf3f5c34

[version] Update to 3.0.0-alpha.20

86fe1753

Add function to map tensor data array.

1c78278b

Merge branch 'main' into v3

a5e02100

Optimise loop to reduce calls to `this`

9f8fac09

Merge branch 'pr/966' into v3

1c43e3f8

Add back tensor map test

7a0f77c1

Add support for granite models

da03a0a4

Allow multiple optional configs to be passed (+ reduce code duplication)

37effa36

Bump dependencies

f21b36e2

Bump versions

d26a6633

[version] Update to 3.0.0-alpha.21

c337c3bb

Add support for per-dtype `kv_cache_dtype`

92d0dc69

Add text streamer unit test

ea03bf54

Bump ORT web version

27a033f6

Bump versions

19277eaf

[version] Update to 3.0.0-alpha.22

90a74905

Update repo name to `@huggingface/transformers.js`

38773eab

xenova changed the title ~~[WIP] 🚀🚀🚀 Transformers.js V3 🚀🚀🚀~~ 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 211 days ago

Update tested node versions

832b5b74

Bump versions

b871c087

[version] Update to 3.0.0

7a58d6e1

xenova merged 7ebd50ce into main 211 days ago

do-me211 days ago🎉 4❤ 3🚀 4

Let's gooo 🚀 🚀 🚀 Awesome work!!!

kungfooman211 days ago❤ 1

I nearly thought it would never happen 🙈 An amazing achievement and thank you for your persistence!

young-developer211 days ago (edited 211 days ago)❤ 1

🔥 🚀

flatsiedatsie211 days ago❤ 1

WOOHOO!!! Congrats!! WebGPU all the things!

gyagp211 days ago❤ 1

This is a huge milestone🎉 Thank you for all the fantastic work in this great project!

okasi211 days ago

🚀 🚀 🚀

justin0mcateer211 days ago (edited 210 days ago)

👍🥳

transformers.js
🚀🚀🚀 Transformers.js V3 🚀🚀🚀
#545

Merged

🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Useful commands:

How to use WebGPU

10	10	}
11	11
12	12	// Proxy the WASM backend to prevent the UI from freezing

transformers.js 🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545 Merged

🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Useful commands:

How to use WebGPU

transformers.js
🚀🚀🚀 Transformers.js V3 🚀🚀🚀
#545

Merged