transformers.js
๐Ÿš€๐Ÿš€๐Ÿš€ Transformers.js V3 ๐Ÿš€๐Ÿš€๐Ÿš€
#545
Merged

๐Ÿš€๐Ÿš€๐Ÿš€ Transformers.js V3 ๐Ÿš€๐Ÿš€๐Ÿš€ #545

xenova merged 498 commits into main from v3
xenova
xenova1 year ago (edited 211 days ago)๐Ÿ‘ 31๐ŸŽ‰ 2โค 21๐Ÿš€ 32

In preparation for Transformers.js v3, I'm compiling a list of issues/features which will be fixed/included in the release.

  • WebGPU support (upgrade onnxruntime-web to 1.17.0).
  • Fix WASM backend for large models (onnxruntime-web โ†’ 1.17.0). Closes:
  • Deno support (upgrade sharp.js to 0.33.x). Closes:
  • CommonJS compatibility. Closes #152
  • Skip local model check when running in-browser, unless explicitly set by user. This is an issue experienced by many beginners, where requests made to localhost redirect to error page, but dev server incorrectly returns status code 200. Closes
  • Improve unit test suite and allow local testing. Closes #491
  • Upgrade conversion script dependency versions (+fixes sentence-transformers conversions). Closes
  • Versioned documentation, so that users still on v2 will be able to access the correct documentation.
  • Consistency issues:
    • topk -> top_k parameter.
    • Tensor transpose -> permute
  • Improve pipeline fallback errors
  • WebNN support

Useful commands:

  1. Pack
    npm pack
    
  2. Publish dry-run
    npm publish --dry-run
    
  3. Publish dry-run w/ tag
    npm publish --dry-run --tag dev
    
  4. Bump alpha version
    npm version prerelease --preid=alpha -m "[version] Update to %s"

How to use WebGPU

First, install the development branch

npm install @huggingface/transformers

Then specify the device parameter when loading the model. Here's example code to get started. Please note that this is still a WORK IN PROGRESS, so the following usage may change before release.

import { pipeline } from '@huggingface/transformers';

// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {
    device: 'webgpu',
    dtype: 'fp32', // or 'fp16'
});

// Generate embeddings
const sentences = ['That is a happy person', 'That is a very happy person'];
const output = await extractor(sentences, { pooling: 'mean', normalize: true });
console.log(output.tolist());
HuggingFaceDocBuilderDev
HuggingFaceDocBuilderDev1 year ago๐Ÿ‘ 3

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

xenova xenova marked this pull request as draft 1 year ago
Huguet57
Huguet571 year ago

Hey! This is great. Is this already in alpha?

kishorekaruppusamy
kishorekaruppusamy1 year ago๐Ÿ‘ 4

Team, is there any tentative time to release this v3 alpha ???

jhpassion0621
jhpassion06211 year ago๐Ÿ‘ 2โค 1

I can't wait anymore :) Please update me when it will be released!

jhpassion0621
jhpassion06211 year ago (edited 1 year ago)

@xenova Can I test v3-alpha by using NPM? When I try to run, I get this issue.
Screenshot 2024-02-14 at 6 25 31โ€ฏPM

kishorekaruppusamy
kishorekaruppusamy1 year ago

@xenova Can I test v3-alpha by using NPM? When I try to run, I get this issue. Screenshot 2024-02-14 at 6 25 31โ€ฏPM

use this https://github.com/kishorekaruppusamy/transformers.js/commit/7af8ef1e5c37f3052ed3a8e38938595702836f09 commit to resolve this issue ...

jhpassion0621
jhpassion06211 year ago

Thanks for your reply @kishorekaruppusamy I tried with your branch and I got other issues.
Screenshot 2024-02-15 at 3 59 28โ€ฏPM
Please give me your advise!

kishorekaruppusamy
kishorekaruppusamy1 year ago (edited 1 year ago)

Thanks for your reply @kishorekaruppusamy I tried with your branch and I got other issues. Screenshot 2024-02-15 at 3 59 28โ€ฏPM Please give me your advise!

https://github.com/kishorekaruppusamy/transformers.js/blob/V3_BRANCH_WEBGPU_BUG_FIX/src/backends/onnx.js#L144
change this url to local dist dir inside build ..

jhpassion0621
jhpassion06211 year ago (edited 1 year ago)

Thanks @kishorekaruppusamy
I downloaded the latest wasm from onnxruntime and added in local directory but I got same issue

Screenshot 2024-02-15 at 9 39 10โ€ฏPM

I realized transformer.js v3 uses onnxruntime 1.16.3 so I created wasm by using onnxruntime 1.16.3 and tested and I got same issue

please give your advise. Thanks

NawarA
NawarA1 year ago

@xenova it looks like #596 is part of this release?! I think that means onnx_data files will be supported?

If true, I'm stoked!

Beyond upgrading ort to 1.17, are there other changes needed to support models with onnx_data files? Happy to try to lend a hand if possible

xenova
xenova1 year ago๐Ÿ‘ 2๐Ÿš€ 5

Hi everyone! Today we released our first WebGPU x Transformers.js demo: The WebGPU Embedding Benchmark (online demo). If you'd like to help with testing, please run the benchmark and share your results! Thanks!

webgpu-benchmark

khmyznikov
khmyznikov1 year ago

@xenova can this bench pick the GPU 1 instead of 0? For the laptops with dGPU

xenova
xenova1 year ago (edited 1 year ago)

@xenova can this bench pick the GPU 1 instead of 0? For the laptops with dGPU

Not currently, but this is being worked on here: microsoft/onnxruntime#19857. We will add support here once ready.

felladrin
felladrin commented on 2024-03-12
Conversation is marked as resolved
Show resolved
examples/webgpu-embedding-benchmark/main.js
1010}
1111
1212
// Proxy the WASM backend to prevent the UI from freezing
felladrin1 year ago
Suggested change
// Proxy the WASM backend to prevent the UI from freezing

As env.backends.onnx.wasm.proxy = true; has been moved away from here (moved to src/backends/onnx.js), this comment-line can also be removed.

xenova1 year ago๐Ÿ‘ 1

Good catch - thanks!

xenova
xenova1 year ago๐ŸŽ‰ 3

@beaufortfrancois - I've added the source code for the video background removal demo. On my device, I get ~20fps w/ WebGPU support (w/ fp32 since fp16 is broken). Here's a screen recording (which drops my fps to ~14):

webgpu-modnet.mp4
beaufortfrancois
beaufortfrancois1 year ago (edited 1 year ago)

@beaufortfrancois - I've added the source code for the video background removal demo. On my device, I get ~20fps w/ WebGPU support (w/ fp32 since fp16 is broken). Here's a screen recording (which drops my fps to ~14):

You rock. Thanks! It's a cool demo! ๐Ÿ‘

I've been wondering how we could improve it:

  • I've noticed you read the current frame of the video on the main thread. Would it help to move the entire demo to a web worker?
  • output[0].mul(255).to('uint8') takes some non negligible time to run. Is there a faster path?
  • How much you expect fp16 to improve perf? In https://developer.chrome.com/blog/new-in-webgpu-120#support_for_16-bit_floating-point_values_in_wgsl, we've noticed on an Apple M1 Pro device that the f16 implementation of Llama2 7B models used in the WebLLM chat demo is significantly faster than the f32 implementation, with a 28% improvement in prefill speed and a 41% improvement in decoding speed.
  • A way to feed a GPUExternalTexture to the model as an input could also come handy.
beaufortfrancois
beaufortfrancois commented on 2024-03-14
src/utils/devices.js
1/**
2
* @typedef {'cpu'|'gpu'|'wasm'|'webgpu'|null} DeviceType
beaufortfrancois1 year ago

Out of curiosity, what is 'gpu'?

xenova1 year ago๐Ÿ‘ 3

It's meant to be a "catch-all" for the different ways that the library can be used with GPU support (not just in the browser with WebGPU). The idea is that it will simplify documentation, as transformers.js will select the best execution provider depending on the environment. For example, DML/CUDA support in onnxruntime-node (see microsoft/onnxruntime#16050 (comment))

Of course, this is still a work in progress, so it can definitely change!

hans00
hans001 year ago (edited 1 year ago)
device: 'webgpu',

For some environement it better be list.
Because there not all execution proveders support all oprtators.
For my use-case I'm give a list of EP orderd by priority, let onnxruntime auto fallback.
For example: ['nnapi', 'xnnpack', 'cpu'] for Android / ['qnn', 'dml', 'xnnpack', 'cpu'] for Windows ARM64 (custom build)

young-developer
young-developer1 year ago (edited 1 year ago)

UPDATE: Looks like some kernels are not supported for quant operations :/

I tested WebGPU version on https://huggingface.co/Xenova/wav2vec2-bert-CV16-en with changes from v3 and model (quantized) is loaded without errors but after running transcribe it throws error with message:
An error occurred during model execution: "Error: [WebGPU] Kernel "[Split] /wav2vec2_bert/encoder/layers.0/conv_module/glu/Split" failed. Error: no GPU data for output: 0".

[E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Split node. Name:'/wav2vec2_bert/encoder/layers.0/conv_module/glu/Split' Status Message: Failed to run JSEP kernel

Is it some quantization error or onnxruntime error?

Logs localhost-1710758687772.log
Env: Windows, Chrome 122, Nvidia Geforce 3090

xenova
xenova1 year ago๐Ÿ‘ 3

@young-developer Thanks for the report. I will cc @guschmue for this unsupported operator. It may already be fixed in the dev branch of onnxruntime-web.

@hans00 For more advanced use-cases, you can update the session options directly with session_options: {...} in the model options.

young-developer
young-developer1 year ago

FYI @xenova I was able to load the model in fp32 and got the same error. I also tried to load in fp16 but it throws input error is (float) instead of (float16) so I assume inputs should be converted to fp16 too.

xenova
xenova1 year ago๐ŸŽ‰ 4โค 1๐Ÿš€ 1

Exciting news ๐Ÿฅณ We've got Musicgen working! Example usage:

import { AutoTokenizer, MusicgenForConditionalGeneration } from '@xenova/transformers';

// Load tokenizer and model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/musicgen-small');
const model = await MusicgenForConditionalGeneration.from_pretrained(
  'Xenova/musicgen-small', { dtype: 'fp32' }
);

// Prepare text input
const prompt = '80s pop track with bassy drums and synth';
const inputs = tokenizer(prompt);

// Generate audio
const audio_values = await model.generate({
  ...inputs,
  max_new_tokens: 512,
  do_sample: true,
  guidance_scale: 3,
});

// (Optional) Write the output to a WAV file
import wavefile from 'wavefile';
import fs from 'fs';

const wav = new wavefile.WaveFile();
wav.fromScratch(1, model.config.audio_encoder.sampling_rate, '32f', audio_values.data);
fs.writeFileSync('musicgen_out.wav', wav.toBuffer());

Samples:

sample_1.mp4
sample_2.mp4
sample_3.mp4
flatsiedatsie
flatsiedatsie1 year ago (edited 1 year ago)

Would it be helpful if I created an example for MusicGen? (based on your example code, but as a small stand-along html page)

young-developer
young-developer1 year ago (edited 1 year ago)

@xenova There is new version 1.17.3 of onnxruntime-web. I tested with wav2vec and there is new error so looks like a progress ๐Ÿ˜„

xenova
xenova1 year ago (edited 1 year ago)โค 1

Segment Anything Encoder now works with WebGPU: up to 8x faster! (online demo)

sam-webgpu.mp4
xenova
xenova1 year ago๐Ÿ˜„ 2๐ŸŽ‰ 7๐Ÿš€ 8

Phi-3 WebGPU support is now working! Demo: https://huggingface.co/spaces/Xenova/experimental-phi3-webgpu

phi3-webgpu-demo.mp4
josephrocca
josephrocca1 year ago (edited 1 year ago)๐Ÿ‘ 5

Does anyone have a guide for how to get this bundled into a script, akin to a JSDelivr URL? Here's what I tried:

// index.js
export * from 'transformers.js'; // Adjust if the import path differs
npm install  xenova/transformers.js#v3
npm install rollup @rollup/plugin-node-resolve rollup-plugin-terser --save-dev
// rollup.config.js
import resolve from '@rollup/plugin-node-resolve';
import { terser } from 'rollup-plugin-terser';

export default {
  input: 'index.js',
  output: {
    file: 'bundle.js',
    format: 'esm',
    sourcemap: true
  },
  plugins: [
    resolve({
      browser: true,
    }),
    terser()
  ]
};

And in package.json:

"scripts": {
  "build": "rollup -c"
}

And then:

npm run build

And that produced a bundle.js, but it was looking for webgpu.proxy.min.js on jsdelivr, which doesn't exist where it was looking. I tried manually adjusting to URL in the bundle to point to the ort.webgpu.min.js file, but no luck (I also tried esm/ort.webgpu.min.js). I'm guessing there are some tricky things due to the dynamic nature of backend loading that bundlers struggle to automatically pick up.

@xenova Alternatively, I wonder if you'd be able to do some v3 alpha/prealpha releases via github tags so that jsdelivr picks them up? Since there's no way (IIUC) to simply reference a branch via jsdelivr (due to immutability requirement I assume).

xenova
xenova1 year ago๐ŸŽ‰ 2โค 2

The latest commits add support for Moondream2, a small vision language model by @vikhyat designed to run efficiently on edge devices.

Try it out yourself with the live demo: https://huggingface.co/spaces/Xenova/experimental-moondream-webgpu

moondream-webgpu-2.mp4

Usage:

import { AutoProcessor, AutoTokenizer, Moondream1ForConditionalGeneration, RawImage } from '@xenova/transformers';

// Load processor, tokenizer and model
const model_id = 'Xenova/moondream2';
const processor = await AutoProcessor.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
const model = await Moondream1ForConditionalGeneration.from_pretrained(model_id, {
    dtype: {
        embed_tokens: 'fp16', // or 'fp32'
        vision_encoder: 'fp16', // or 'q8'
        decoder_model_merged: 'q4', // or 'q4f16' or 'q8'
    },
    device: 'webgpu',
});

// Prepare text inputs
const prompt = 'Describe this image.';
const text = `<image>\n\nQuestion: ${prompt}\n\nAnswer:`;
const text_inputs = tokenizer(text);

// Prepare vision inputs
const url = 'https://huggingface.co/vikhyatk/moondream1/resolve/main/assets/demo-1.jpg';
const image = await RawImage.fromURL(url);
const vision_inputs = await processor(image);

// Generate response
const output = await model.generate({
    ...text_inputs,
    ...vision_inputs,
    do_sample: false,
    max_new_tokens: 64,
});
const decoded = tokenizer.batch_decode(output, { skip_special_tokens: false });
console.log(decoded);
// [
//     '<|endoftext|><image>\n\n' +
//     'Question: Describe this image.\n\n' +
//     'Answer: A hand is holding a white book titled "The Little Book of Deep Learning" against a backdrop of a balcony with a railing and a view of a building and trees.<|endoftext|>'
// ]
xenova
xenova364 days ago (edited 364 days ago)๐Ÿ‘ 2๐ŸŽ‰ 1๐Ÿš€ 1

VLMs now support PKV caching. Demo: https://huggingface.co/spaces/Xenova/experimental-nanollava-webgpu

nanollava-webgpu.mp4
Example code
import { AutoProcessor, AutoTokenizer, LlavaForConditionalGeneration, RawImage } from '@xenova/transformers';

// Load tokenizer, processor and model
const model_id = 'Xenova/nanoLLaVA';
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await LlavaForConditionalGeneration.from_pretrained(model_id, {
    dtype: {
        embed_tokens: 'fp16', // or 'fp32' or 'q8'
        vision_encoder: 'fp16', // or 'fp32' or 'q8'
        decoder_model_merged: 'q4', // or 'q8'
    },
    // device: 'webgpu',
});

// Prepare text inputs
const prompt = 'What does the text say?';
const messages = [
    { role: 'system', content: 'Answer the question.' },
    { role: 'user', content: `<image>\n${prompt}` }
]
const text = tokenizer.apply_chat_template(messages, { tokenize: false, add_generation_prompt: true });
const text_inputs = tokenizer(text);

// Prepare vision inputs
const url = 'https://huggingface.co/qnguyen3/nanoLLaVA/resolve/main/example_1.png';
const image = await RawImage.fromURL(url);
const vision_inputs = await processor(image);

// Generate response
const { past_key_values, sequences } = await model.generate({
    ...text_inputs,
    ...vision_inputs,
    do_sample: false,
    max_new_tokens: 64,
    return_dict_in_generate: true,
});

// Decode output
const answer = tokenizer.decode(
    sequences.slice(0, [text_inputs.input_ids.dims[1], null]),
    { skip_special_tokens: true },
);
console.log(answer);
// The text reads "Small but mighty".

const new_messages = [
    ...messages,
    { role: 'assistant', content: answer },
    { role: 'user', content: 'How does the text correlate to the context of the image?' }
]
const new_text = tokenizer.apply_chat_template(new_messages, { tokenize: false, add_generation_prompt: true });
const new_text_inputs = tokenizer(new_text);

// Generate another response
const output = await model.generate({
    ...new_text_inputs,
    past_key_values,
    do_sample: false,
    max_new_tokens: 256,
});
const new_answer = tokenizer.decode(
    output.slice(0, [new_text_inputs.input_ids.dims[1], null]),
    { skip_special_tokens: true },
);
console.log(new_answer);
// The context of the image is that of a playful and humorous illustration of a mouse holding a weightlifting bar. The text "Small but mighty" is a playful reference to the mouse's size and strength.
beaufortfrancois
beaufortfrancois347 days ago (edited 347 days ago)๐Ÿ‘ 2

@xenova For some models, the performance may be a blocker. Since model downloads can be quite large, I wonder if there should be a way for web developers to know their machine performance class for running a model without downloading it completely first.

I believe this would involve running the model code with zeroed-out weights, which would still require buffer allocations but would allow the web app to catch out-of-memory errors or such. The model architecture would still needed to generate shaders, but this be much smaller than model weights.

Essentially, knowing the model architecture and testing with empty weights would allow for assessing performance capability without downloading the full model.

I thought I could use from_config for that but I wonder now if this should be a built-in V3 feature. What are your thoughts?

xenova
xenova346 days agoโค 1

@beaufortfrancois That would be amazing to have! Although, it's probably best suited as a feature request for onnxruntime-web. The way one could do it is to use the external data format to save models into two parts: graph-only (<1MB usually) and weights, and then initialize an empty session from the graph without loading the weights. @guschmue might have additional insights.

beaufortfrancois
beaufortfrancois345 days ago (edited 345 days ago)๐Ÿ‘ 1

Thank you @xenova for your support โค๏ธ

@guschmue What are your thoughts on #545 (comment)?
I'm happy to file a feature request in https://github.com/microsoft/onnxruntime

guschmue
guschmue341 days ago

@beaufortfrancois, yes some utility class that helps applications to decide what hardware capabilities are there before a model is loaded is on my wish list since some time.
We have not gotten to it but should find time soon I hope.
It would need to tell how mighty your gpu is, if there is a npu (in the future if there is webnn), if it is visible to run the model on wasm.
Not trivial to get this right on first try so I'd expect a few iterations on it.
It also would need a lot of feedback and help from application developers.

Filing a feature request would be good, then we have a place to track it.

beaufortfrancois
beaufortfrancois341 days ago๐Ÿ‘ 1๐ŸŽ‰ 2

@guschmue I've filed microsoft/onnxruntime#20998 to track this feature request. How would we be able to help out there?

guschmue
guschmue340 days ago๐Ÿ‘ 1

we'd need to come up with a nice api. the info one can get from webgpu is very sparse and imo not good enough to make this work.
The way I see this work goes like this:
we define a couple of model classes, ie:
llm, vision, speech

based on this we'd run some shaders briefly to determine relevant flops that are used in the selected class.

the result would be some raw flops number, or some class based on some heuristics a class like 'good enough for 500M parameters' and some hints from the webgpu info ...

Application could cache this so the detection needs to run only the first time.

Maybe there would be an offline tool that you can run your model through to capture some data what the model needs.

We would need help defining this (ie what classes) and then get a lot of feedback to tune this to practial values.

But this is just what I think how this would work, very open to other suggestions.

maudnals
maudnals330 days ago (edited 330 days ago)

@guschmue Sounds great, +1 on the need to integrate feedback from application developers. One early comment/question on the potential API, wrt your "good enough for 500M parameters" example: are you referring to "fast enough"? If so, it may be convenient for application developers to not only get as an output a bucketized speed estimate (e.g. for a given 500MB params model, x-slow / slow / medium/ fast), but also to be able access the raw timings. How long end-users are willing to wait for an output may be use-case specific.

xenova
xenova330 days ago๐Ÿ‘ 1

Experimental Florence2 support has been added! ๐Ÿฅณ (closes #815)

Example code:

import {
    Florence2ForConditionalGeneration,
    AutoProcessor,
    AutoTokenizer,
    RawImage,
} from '@xenova/transformers';

// Load model, processor, and tokenizer
const model_id = 'onnx-community/Florence-2-base-ft';
const model = await Florence2ForConditionalGeneration.from_pretrained(model_id, {
    dtype: 'fp32',
});
const processor = await AutoProcessor.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);

// Load image
const url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true";
const image = await RawImage.fromURL(url);

// Process inputs
const prompts = "Describe with a paragraph what is shown in the image.";
const text_inputs = tokenizer(prompts);
const vision_inputs = await processor(image);

// Generate text
const generated_ids = await model.generate({
    ...text_inputs,
    ...vision_inputs,
    max_new_tokens: 100,
});

// Decode generated text
const generated_text = tokenizer.batch_decode(generated_ids, { skip_special_tokens: true });
console.log(generated_text);

image

generates

'A green car is parked in front of a tan building. There is a brown door on the building behind the car. There are two windows on the front of the building. '

I'm still working on adding support for other tasks and improving processing methods, but this is a good start. Another issue is that the vision encoder doesn't work on WebGPU (but other submodules do). cc @guschmue for this.

xenova Early dereferencing for performance boosts
0dba2661
xenova cleanup
5e4e20fb
xenova Move quantization logic to `quantize.py`
dd6af93f
xenova update deps
04af3d57
xenova Fix q4 quantization
91286517
xenova save q4 quantization
83cbb218
xenova Add decode ASR test
eb613441
xenova Do not process last chunk unnecessarily
cec24005
xenova fp16 disable_shape_infer if model is too large
c835b543
xenova Use `check_and_save_model` for saving fp16 model
45cd8d4d
xenova Reorder functions
88f3e441
xenova formatting
23440f00
xenova Remove debug log
b411e9fd
xenova Fix q8 quantization for models > 2GB
04a334a5
xenova correct attribute
cd1ea697
xenova Fix `TextGenerationPipeline`
a167f6e2
xenova Fix pauses in whisper word-level timestamps
ea732896
xenova Formatting
344af32a
xenova Sort added tokens by length to avoid early partial matches
c305c382
xenova Add new tokenizer test
d6f6fd47
xenova Only finish with newline if running in Node.js
1557b8d0
xenova Consider token timestamps when selecting longest common sequence
9ac7ceb4
xenova Create whisper word-level timestamps demo
79ed46ed
xenova cleanup
8da68866
xenova Fallback to WASM if WebGPU not supported
d709bd07
xenova Reload model for each quantization mode
9ef3a6d0
xenova Update converstion script requirements
9787b75a
xenova Separate IO and Quantization args
974f0862
xenova Use `const` where possible
d0428688
xenova Add `InterruptableStoppingCriteria`
1b4d2428
xenova `@xenova/transformers` -> `@huggingface/transformers`
31101c82
xenova Override semver version
e84322b5
xenova Add support for pyannote models
bd943340
xenova Update README.md
3dbc633b
xenova Add listed support for pyannote
858e55d1
xenova Add pyannote example code
8bf03494
xenova Support specifying `min_num_frames`
c52618cf
xenova Support simultaneous instantiation of multiple inference sessions
96f19b06
xenova Support broadcasting encoder outputs over decoder inputs
4ad43e21
xenova Fix test
c6aeb4be
fs-eire fix bundler config for latest ORT
6d3ea4bc
xenova Only check fp16 support for webgpu device
38a3bf6d
xenova Remove default chat templates
9df84c43
xenova Add support for gemma2
fc3d860f
xenova Add gemma2 generation test
939920d2
xenova Update gemma2 config mapping
5bb93a06
xenova Prioritize high-performance adapter when possible
72ec168f
xenova Set defaults for `tools` and `documents` in `apply_chat_template`
9068a531
xenova bump `@huggingface/jinja` -> 0.3.0
824538bc
xenova Add `apply_chat_template` default parameters unit test
836c0afe
xenova Merge branch 'v3' into @huggingface/transformers
487d8b20
xenova Add prettier
1f6e0e16
xenova prettier format config files
55494d18
xenova remove incorrect comment
5a68461b
xenova Merge branch 'pr/864' into @huggingface/transformers
437cb34e
xenova Update onnxruntime-web version
5a6c9267
xenova Update webpack.config.js
b19251b8
xenova Fix copy path
820c1e26
xenova Run `npm ci`
b0dab917
xenova Fix bundling
86b9b621
xenova Do not set `preferredOutputLocation` if we are proxying
222b94ed
xenova Merge branch 'v3' into @huggingface/transformers
b326cc94
xenova Update `@webgpu/types`
ca67092f
xenova Update SAM example
42076fda
xenova Use `??=` operator where possible
48d31424
xenova Fix commonjs usage
3b1a4fd9
xenova Mark `onnxruntime-node` and `sharp` as externals
9a73b5ed
xenova Move `externals` into config
9951aa5d
xenova Downgrade to onnxruntime 1.18.0
c04d37e6
xenova Finalize module/commonjs build
d32fe2bc
xenova Separate web and node builds
1530d509
xenova [version] Update to 3.0.0-alpha.1
b4df0e25
xenova Default to CDN-hosted .wasm files
ab59c516
xenova [version] Update to 3.0.0-alpha.2
866b2198
xenova bump versions
4a3398d1
xenova [version] Update to 3.0.0-alpha.3
8891a142
xenova Merge branch 'improve-conversion-script' into v3
a315933b
xenova Consolidate conversion and quantization script
12569b8f
xenova Downgrade `onnxconverter-common`
83f57181
xenova Link to types in exports
6fa5fa6c
xenova Update list of supported tasks
2f1b2105
xenova Fixed unit tests
27bc55d7
xenova Update imports
23d11500
xenova Bump versions to `3.0.0-alpha.4`
f9070dca
xenova [version] Update to 3.0.0-alpha.4
c3494e1b
xenova Fix "Default condition should be last one"
973fb0dc
xenova Bump versions
7376ecf9
xenova [version] Update to 3.0.0-alpha.5
0a04bc07
xenova Update next.js client-side demo
e4603cd9
ibelem Initial WebNN Support
ff1853ce
xenova Mark fs, path and url as external packages for node build
15574bcf
xenova Move content type map outside of `FileResponse` object
72828625
xenova Add GPU support for Node.js
22f7cede
xenova Bump versions
1e319a4c
xenova [version] Update to 3.0.0-alpha.6
d278891f
ibelem Fix conflicts
3fefa17a
xenova bump dependency versions
fa6cc70f
xenova Add support for device auto-detection
7fa53265
xenova Fix default device selection
4ec77c1a
xenova Merge branch 'pr/ibelem/890-1' into v3
5799e304
xenova Improve WebNN selection
5b2cac21
xenova Skip token callback if `skip_prompt` is set
ad23c50c
xenova Bump versions
5b84b62a
xenova [version] Update to 3.0.0-alpha.7
bcf6a86f
xenova bump versions
b97ed0d8
xenova [version] Update to 3.0.0-alpha.8
c5b70838
xenova bump versions
cbeefded
xenova [version] Update to 3.0.0-alpha.9
59600f24
xenova Add support for Sapiens
b2e025a0
xenova Update default ONNX env
8661d951
xenova Fix types
57db34db
xenova Topologically sort fp16 nodes
1b7f9789
xenova Add marian unit test
45d1526e
xenova Re-order imports
b903757c
xenova Fix `NoBadWordsLogitsProcessor`
633976f7
xenova Update package.json
24d8787e
xenova [jest] Disable coverage
9412ec46
xenova Bump versions
08e73881
xenova [version] Update to 3.0.0-alpha.10
d5a8f87a
xenova Improve node/web interoperability
7843ad07
xenova Fix scripts/requirements.txt
bf093aec
xenova Bump versions
9a5ee429
xenova [version] Update to 3.0.0-alpha.11
535cdfe5
xenova Add support for JAIS models (#906)
4e1acf04
xenova Add JAIS to README
488548d0
xenova Fix node/web interop (again)
13aed411
xenova Bump versions
7655f81c
xenova [version] Update to 3.0.0-alpha.12
1c7e2267
xenova Set `SapiensForNormalEstimation` to encoder-only
ab6b28b6
xenova Implement `sub` tensor operation
66c05d56
xenova Bump versions
31e8b2ae
xenova [version] Update to 3.0.0-alpha.13
bf3f7d5f
xenova Improve typing for `wrap` helper function
c0253561
xenova Update `preferredOutputLocation` type
7ebdaf21
xenova Make `wrap` type more generic
3b8ddcbc
xenova Re-use `segmentation_data`
a385c6e4
xenova Fix `min` type
537e9586
xenova Add support for Hiera models
bcb28b34
xenova Fix reused loop variable (closes #910)
d21c87cd
xenova Add logits processor test file
1d281f63
xenova Fix test imports
ba0427f4
xenova Bump versions
3bc3e86c
xenova [version] Update to 3.0.0-alpha.14
0518960d
xenova Add another `bad_words` logits processor test (closes #913)
552cdea6
xenova Add support for GroupViT
3422a8bc
xenova Add zero-shot-image-classification unit test
3599902a
xenova Add maskformer model definitions
5892ee81
xenova Support universal image segmentation in `image-segmentation` pipeline
c4dac775
xenova Add support for PVT models
f0c47bed
xenova Add `post_process_instance_segmentation` function template
d80d3a4c
xenova Add `library_name` option to convert.py
844099df
xenova Wrap onnxslim with try block
ba5d7252
xenova Use const where possible
b3691c81
xenova Use const where possible (again)
dcf117f2
xenova Create `MaskFormerFeatureExtractor`
9af026c5
xenova Add support for MaskFormer
0f8200c5
xenova Improve tool-use chat template detection
e278c5e9
xenova Add object detection pipeline unit test
83fa58f0
xenova Add support for ViTMSN and VitMAE
86d6da46
jlucaso1
jlucaso1253 days ago (edited 253 days ago)

@xenova the option of use a quantized model doesn't exists anymore?

I'm trying to use https://huggingface.co/Xenova/trocr-base-handwritten/blob/main/onnx/encoder_model_quantized.onnx

xenova Bump ORT versions
93b25fb2
xenova Create `get_chat_template` helper function
2f680ee7
xenova Fix CI
2f9b2ed9
xenova Run prettier on `tests/**`
deec3504
xenova move certain tests to utils subfolder
48fa226e
xenova xenova marked this pull request as ready for review 250 days ago
xenova Bump onnxruntime-web version
a10828f4
xenova Bump `onnxruntime==1.19.2` in scripts/requirements.txt
ba58ea24
xenova Merge branch 'main' into v3
4f17e954
xenova Merge branch 'main' into v3
c40a1512
xenova Sort `this.added_tokens` before creating regex (`.toSorted` is not avโ€ฆ
30315b21
xenova Rather make a copy of `this.added_tokens`
d7df5758
xenova Fix `.tokenize` with `fuse_unk=true`
a519379b
xenova Add blenderbot tokenizer tests
89ddccf5
xenova Add t5 tokenizer tests
36ad144b
xenova Add falcon tokenizer tests
4765dd63
xenova Run prettier
fd8b9a25
xenova Add ESM tokenizer tests
710816ef
xenova Run unit tests in parallel
0d3cd309
xenova Fix `fuse_unk` for tokenizers with `byte_fallback=true` but no byte fโ€ฆ
cc258c23
xenova Add llama tokenizer unit tests
4798755c
xenova Update emoji test string names
c6c3ae18
xenova Move whisper-specific unit tests to subfolder
79a74095
xenova Code formatting
1a388048
xenova Bump versions
dabe6ae3
xenova [version] Update to 3.0.0-alpha.15
54f1f214
xenova Add emoji tokenizer test cases for LlamaTokenizer
a912d796
xenova Attempt to fix encoder-decoder memory leak
969d10e1
xenova Remove unused code
072cbbce
xenova Fix BertNormalizer (strip `Mn` unicode characters)
14b4bd4a
xenova Handle ZERO WIDTH JOINER (U+200D) characters
67977718
xenova Add more spm normalization characters
f148afd6
xenova Add emoji unit tests for bert/t5
ca4b5b98
xenova [WebNN] Add support for specifying `free_dimension_overrides` in config
113c81ea
xenova Log warning if webnn is selected by `free_dimension_overrides` is notโ€ฆ
9005accf
xenova Fix unigram for multi-byte tokens
682c7d05
xenova Add gemma tokenizer tests
4a31e549
xenova Allow user to specify device and dtype in config.json
7a160655
xenova Update dependency versions
4c1d21ba
xenova Bump versions
3c6a95a0
xenova [version] Update to 3.0.0-alpha.16
ac391d24
xenova Add CLIP tokenizer unit tests
d30d3b7a
xenova Add more tokenizer tests
e089ef4c
xenova Bump onnxruntime-web version
2c9e271f
xenova Bump versions
ee1e32a2
xenova [version] Update to 3.0.0-alpha.17
f41e995b
xenova Add support for new `tokenizers>=0.2.0` BPE serialization format
9a42cf32
xenova Bump onnxruntime-web version
f534b352
xenova Bump versions
0c8b1af1
xenova [version] Update to 3.0.0-alpha.18
2ca41780
xenova Keep encoder outputs on GPU
a82e7ef0
xenova Update whisper-webgpu demo dependencies
c37a38cd
xenova Bump versions
e1c4fc69
xenova [version] Update to 3.0.0-alpha.19
fe51609a
kallebysantos Support to load ONNX APIs based on JS runtime (#947)
b5188664
xenova Allow specification of `use_external_data_format` in custom config
95c8cc55
xenova Update deberta unit tests
03eb77bf
xenova Update roberta tokenizer tests
c61a76ba
xenova Support inferringunigram tokenizer type
32d8df40
xenova Reuse tokenizer tests for original t5-small
6505abb1
xenova Remove redundant null coalesce
96192182
xenova Enable unit test coverage reports
52c4ce70
xenova Use `PROBLEMATIC_REGEX_MAP` for bloom tokenizer
12edaf08
xenova Improve tokenizer unit tests
5e7e82b9
xenova Update tokenizer unit tests
795a61a3
xenova Remove unused code
77ebe0de
xenova Add m2m_100 tokenizer unit tests
56eda3bd
xenova Add m2m translation pipeline unit test
2040ad5d
xenova Add support for Depth Pro models
8718c176
xenova Add whisper turbo alignment heads
a32efa3d
xenova Remove in-library list of supported models
8b0d330a
xenova Bump versions
cf3f5c34
xenova [version] Update to 3.0.0-alpha.20
86fe1753
BritishWerewolf Add function to map tensor data array.
1c78278b
xenova Merge branch 'main' into v3
a5e02100
BritishWerewolf Optimise loop to reduce calls to `this`
9f8fac09
xenova Merge branch 'pr/966' into v3
1c43e3f8
xenova Add back tensor map test
7a0f77c1
xenova Add support for granite models
da03a0a4
xenova Allow multiple optional configs to be passed (+ reduce code duplication)
37effa36
xenova Bump dependencies
f21b36e2
xenova Bump versions
d26a6633
xenova [version] Update to 3.0.0-alpha.21
c337c3bb
xenova Add support for per-dtype `kv_cache_dtype`
92d0dc69
xenova Add text streamer unit test
ea03bf54
xenova Bump ORT web version
27a033f6
xenova Bump versions
19277eaf
xenova [version] Update to 3.0.0-alpha.22
90a74905
xenova Update repo name to `@huggingface/transformers.js`
38773eab
xenova xenova changed the title [WIP] ๐Ÿš€๐Ÿš€๐Ÿš€ Transformers.js V3 ๐Ÿš€๐Ÿš€๐Ÿš€ ๐Ÿš€๐Ÿš€๐Ÿš€ Transformers.js V3 ๐Ÿš€๐Ÿš€๐Ÿš€ 211 days ago
xenova Update tested node versions
832b5b74
xenova Bump versions
b871c087
xenova [version] Update to 3.0.0
7a58d6e1
xenova xenova merged 7ebd50ce into main 211 days ago
do-me
do-me211 days ago๐ŸŽ‰ 4โค 3๐Ÿš€ 4

Let's gooo ๐Ÿš€ ๐Ÿš€ ๐Ÿš€ Awesome work!!!

kungfooman
kungfooman211 days agoโค 1

I nearly thought it would never happen ๐Ÿ™ˆ An amazing achievement and thank you for your persistence!

young-developer
young-developer211 days ago (edited 211 days ago)โค 1

๐Ÿ”ฅ ๐Ÿš€

flatsiedatsie
flatsiedatsie211 days agoโค 1

WOOHOO!!! Congrats!! WebGPU all the things!

gyagp
gyagp211 days agoโค 1

This is a huge milestone๐ŸŽ‰ Thank you for all the fantastic work in this great project!

okasi
okasi211 days ago

๐Ÿš€ ๐Ÿš€ ๐Ÿš€

justin0mcateer
justin0mcateer211 days ago (edited 210 days ago)

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone