PR #1037 OpenVINO support

RyanMetcalfeInt81 year ago (edited 1 year ago)👍 3❤ 4

Running Whisper inference using OpenVINO

This PR extends whisper.cpp to run the Whisper Encoder on OpenVINO supported devices such as CPU, and Intel GPUs (integrated & discrete).

I've tested this on number of platforms, including

NUC Intel(R) Core(TM) i7-6770HQ ('Skylake' Skull Canyon NUC) running Ubuntu 22.04
Core(TM) i7-1185G7 ('Tiger Lake' laptop) running Windows 11 Pro
Core(TM) i7-12700 ('Alder Lake' Beast Canyon NUC) with installed Intel(R) ARC(TM) A770 discrete graphics card, running Windows 11 Pro

For each platform, the performance of using OpenVINO-based encoder gives a great boost in performance over the default encoder -- even for CPU -- and the ability to easily offload to another OpenVINO-supported device by simply specifying a different string at runtime (e.g. "CPU" --> "GPU") is very convenient.

High-level description of changes

This introduction of OpenVINO Encode support is modeled very closely to how whisper.cpp uses CoreML (this should be pretty obvious in the change-set). If the project is built with OpenVINO support, an OpenVINO-specific encoder is pulled into the build and instantiated at application startup time.

Also similar to CoreML, the models required to be present to take advantage of the OpenVINO encoder can be generated using a new python script in 'models' directory.

Just to point out -- something that does differ between CoreML and the new OpenVINO integration is how/when support is enabled at runtime. CoreML is enabled within the call to whisper_init_*. For OpenVINO, because we want the ability to specify a device string ("CPU", "GPU", etc.), I exposed a new API that is dedicated to initializing OpenVINO, given a ctx:

(in whisper.h):

    // Given a context, enable use of OpenVINO for encode inference. 
    // openvino_model_path: Optional path to OpenVINO encoder IR model. If set to nullptr,
    //                      the path will be generated from the ggml model path that was passed
    //                      in to whisper_init_from_file. For example, if 'path_model' was
    //                      "/path/to/ggml-base.en.bin", then OpenVINO IR model path will be 
    //                      assumed to be "/path/to/ggml-base.en-encoder-openvino.xml".
    // openvino_device: OpenVINO device to run inference on ("CPU", "GPU", etc.)
    // openvino_cache_dir: Optional cache directory that can speed up init time, especially for 
    //                     GPU, by caching compiled 'blobs' there.
    //                     Set to nullptr if not used.
    // Returns 1 on success. If OpenVINO is not enabled in build, this
    // simply returns 0.
    WHISPER_API int whisper_ctx_init_openvino_encoder(struct whisper_context* ctx,
        const char* openvino_model_path,
        const char* openvino_device,
        const char* openvino_cache_dir);

I'm happy to rework this if anyone has a better idea of how to enable OpenVINO support at init time.

main.cpp exposes a new parameter for user to set OpenVINO encode inference device (default is "CPU"):

...
else if (arg == "-oved" || arg == "--ov-e-device")    { params.openvino_encode_device = argv[++i]; }
...

And the new whisper_ctx_init_openvino_encoder API is called right after ctx creation:

   // whisper init

    struct whisper_context * ctx = whisper_init_from_file(params.model.c_str());

    if (ctx == nullptr) {
        fprintf(stderr, "error: failed to initialize whisper context\n");
        return 3;
    }

    // initialize openvino encoder. This has no effect on whisper.cpp builds that don't have OpenVINO configured.
    whisper_ctx_init_openvino_encoder(ctx, nullptr, params.openvino_encode_device.c_str(), nullptr);

How to generate models and enable OpenVINO for whisper.cpp builds

Here are the instructions for generating the OpenVINO models for use with OpenVINO-enabled builds of whisper.cpp:

First, setup python virtual env. and install python dependencies. Python 3.10 is recommended.

Windows:

cd models
python -m venv openvino_conv_env
openvino_conv_env\Scripts\activate
python -m pip install --upgrade pip
pip install -r openvino-conversion-requirements.txt

Linux and macOS:

cd models
python3 -m venv openvino_conv_env
source openvino_conv_env/bin/activate
python -m pip install --upgrade pip
pip install -r openvino-conversion-requirements.txt

Generate an OpenVINO encoder model. For example, to generate a base.en model, use:
```
python convert-whisper-to-openvino.py --model base.en
```
This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as ggml models, as that is the default location that the OpenVINO extension will search at runtime.
Build whisper.cpp with OpenVINO support:

Download OpenVINO package from release page. The recommended version to use is 2023.0.0.

After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example:

Linux:
```
source /path/to/l_openvino_toolkit_ubuntu22_2023.0.0.10926.b4452d56304_x86_64/setupvars.sh
```
Windows (cmd):
```
C:\Path\To\w_openvino_toolkit_windows_2023.0.0.10926.b4452d56304_x86_64\setupvars.bat
```
And then build the project using cmake:
```
cd build
cmake -DWHISPER_OPENVINO=1 ..
```

Run the examples as usual. For example:

./main -m models/ggml-base.en.bin -f samples/jfk.wav

...

whisper_ctx_init_openvino_encoder: loading OpenVINO model from 'models/ggml-base.en-encoder-openvino.xml'
whisper_ctx_init_openvino_encoder: first run on a device may take a while ...
whisper_openvino_init: path_model = models/ggml-base.en-encoder-openvino.xml, device = CPU, cache_dir = models/ggml-base.en-encoder-openvino-cache
whisper_ctx_init_openvino_encoder: OpenVINO model loaded

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 1 |

...

The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
cached for the next run.

You can use -oved [DEVICE] argument to main to specify OpenVINO device to offload encoder inference to. For example:

main -m ggml-base.bin -f gb1.wav -oved GPU

openvino: use OpenVINO encoder inference

c3528936

openvino: add python script for OpenVINO model generation

93b8be46

whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build

58eae32d

ggerganov1 year ago👀 1

Wow - this is quite interesting. First time I hear about OpenVINO - will try to get familiar.
I'll need to look into more details, but overall the implementation looks very nice. Very good PR description

Just to point out -- something that does differ between CoreML and the new OpenVINO integration is how/when support is enabled at runtime. CoreML is enabled within the call to whisper_init_*. For OpenVINO, because we want the ability to specify a device string ("CPU", "GPU", etc.), I exposed a new API that is dedicated to initializing OpenVINO, given a ctx:

Do you think it makes sense to do the same for Core ML so that the implementations follow similar pattern?

RyanMetcalfeInt81 year ago

@ggerganov, thanks for taking a look!

Do you think it makes sense to do the same for Core ML so that the implementations follow similar pattern?

I think that makes sense, especially if CoreML exposes parameters to control how inference is performed -- but to be honest I know very little about CoreML.

ggerganov requested changes on 2023-06-28

ggerganov1 year ago🚀 1

Minor changes - should be good to merge after that

Conversation is marked as resolved

Show resolved

whisper.h

110	110
111	111	WHISPER_API struct whisper_state * whisper_init_state(struct whisper_context * ctx);
112	112
	113	// Given a context, enable use of OpenVINO for encode inference.
	114	// openvino_model_path: Optional path to OpenVINO encoder IR model. If set to nullptr,
	115	// the path will be generated from the ggml model path that was passed
	116	// in to whisper_init_from_file. For example, if 'path_model' was
	117	// "/path/to/ggml-base.en.bin", then OpenVINO IR model path will be
	118	// assumed to be "/path/to/ggml-base.en-encoder-openvino.xml".
	119	// openvino_device: OpenVINO device to run inference on ("CPU", "GPU", etc.)
	120	// openvino_cache_dir: Optional cache directory that can speed up init time, especially for
	121	// GPU, by caching compiled 'blobs' there.
	122	// Set to nullptr if not used.
	123	// Returns 1 on success. If OpenVINO is not enabled in build, this
	124	// simply returns 0.
	125	WHISPER_API int whisper_ctx_init_openvino_encoder(struct whisper_context* ctx,
	126	const char* openvino_model_path,
	127	const char* openvino_device,
	128	const char* openvino_cache_dir);
	129

ggerganov1 year ago

Suggested change

      
                // Given a context, enable use of OpenVINO for encode inference.
          
                // openvino_model_path: Optional path to OpenVINO encoder IR model. If set to nullptr,
          
                //                      the path will be generated from the ggml model path that was passed
          
                //                      in to whisper_init_from_file. For example, if 'path_model' was
          
                //                      "/path/to/ggml-base.en.bin", then OpenVINO IR model path will be
          
                //                      assumed to be "/path/to/ggml-base.en-encoder-openvino.xml".
          
                // openvino_device: OpenVINO device to run inference on ("CPU", "GPU", etc.)
          
                // openvino_cache_dir: Optional cache directory that can speed up init time, especially for
          
                //                     GPU, by caching compiled 'blobs' there.
          
                //                     Set to nullptr if not used.
          
                // Returns 1 on success. If OpenVINO is not enabled in build, this
          
                // simply returns 0.
          
                WHISPER_API int whisper_ctx_init_openvino_encoder(struct whisper_context* ctx,
          
                    const char* openvino_model_path,
          
                    const char* openvino_device,
          
                    const char* openvino_cache_dir);
          
                // Given a context, enable use of OpenVINO for encode inference.
          
                // model_path: Optional path to OpenVINO encoder IR model. If set to nullptr,
          
                //                      the path will be generated from the ggml model path that was passed
          
                //                      in to whisper_init_from_file. For example, if 'path_model' was
          
                //                      "/path/to/ggml-base.en.bin", then OpenVINO IR model path will be
          
                //                      assumed to be "/path/to/ggml-base.en-encoder-openvino.xml".
          
                // device: OpenVINO device to run inference on ("CPU", "GPU", etc.)
          
                // cache_dir: Optional cache directory that can speed up init time, especially for
          
                //                     GPU, by caching compiled 'blobs' there.
          
                //                     Set to nullptr if not used.
          
                // Returns 1 on success. If OpenVINO is not enabled in build, this
          
                // simply returns 0.
          
                WHISPER_API int whisper_ctx_init_openvino_encoder(
          
                    struct whisper_context * ctx,
          
                                const char * model_path,
          
                                const char * device,
          
                                const char * cache_dir);

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

Conversation is marked as resolved

Show resolved

CMakeLists.txt

310	321	${GGML_OPENCL_SOURCES}
311	322	whisper.h
312	323	whisper.cpp
	324	${OpenVINO_SOURCES}

ggerganov1 year ago

Use OPENVINO_SOURCES

However, why not make a separate target whisper.openvino similar to how whisper.coreml works?

RyanMetcalfeInt81 year ago (edited 1 year ago)

Let me try it again. I had originally tried to add it as a separate target and had some weird issues (something like the corresponding .Lib wasn't being generated in Windows build)-- I intended to circle back though, so thanks for the reminder.

RyanMetcalfeInt81 year ago

okay, see latest commit (76c4186)

I added openvino-encoder to dedicated OBJECT target:

add_library(${TARGET} OBJECT
        openvino/whisper-openvino-encoder.h
        openvino/whisper-openvino-encoder.cpp
        )

And this target is linked to whisper just like coreml:

if (WHISPER_OPENVINO)
    target_link_libraries(${TARGET} PRIVATE whisper.openvino)
endif()

I was thinking of making it SHARED, but I think it'd be more of a hassle to have to carry around a separate .dll / .so..

This builds fine, and did some minimal testing on Windows 11 & Ubuntu.

Apply suggestions from code review

4bc1ebcd

whisper: Fix compilation error

6bfa3711

ggerganov commented on 2023-06-28

Conversation is marked as resolved

Show resolved

whisper: revert whisper_get_openvino_path_encoder & whisper_get_openv…

df77368f

cmake: Add openvino-encoder as separate object target

76c41863

RyanMetcalfeInt8 requested a review from

ggerganov 1 year ago

whisper : minor style fixes

bc5746e8

Merge branch 'master' into openvino_integration

0ed471c3

minor : indentation fixes

df982879

ggerganov approved these changes on 2023-07-04

ggerganov1 year ago🎉 2

Great stuff 👍

ggerganov merged 62b81276 into master 1 year ago

Nabaralathep1 year ago

Hi!
in the OpenVino instructions there is the next sentence
cd build
cmake -DWHISPER_OPENVINO=1 ..
where is that "build" dir?

And when I run:
./main -m models/ggml-base.en.bin -f samples/jfk.wav
I don't see the "OPENVINO = 1" or any other info about loading openvino

All the other instructions was executed with success
What is missing?

Distro info
I am running on parrot OS 5.3

Amazing work, and thanks for sharing.

RyanMetcalfeInt81 year ago (edited 1 year ago)❤ 2

Hi @Nabaralathep,

Looks like I forgot the mkdir build, so it should be:

mkdir build
cd build
cmake -DWHISPER_OPENVINO=1 ..
make

Let me know how it goes.

Nabaralathep1 year ago

Hi @RyanMetcalfeInt8,
Thank you very much for your help! You pulled me out of a hole, but... I had some issues that I would like to share to help someone else get out of the hole as well.

1.When I run cmake -DWHISPER_OPENVINO=1 .. the build files are created in the back folder and not in build, maybe because of my cmake version (3.18.4), I solved this with cmake -DWHISPER_OPENVINO=1 .. -B build, all being in the "openvino_conv_env" folder.

2.When I ran make I received an error, it turns out that I had the debian arm version and my computer is x86_64, but when I went to the repository to download the appropriate one, I discovered that all the packages for debian are arm so what?

So this is a dead end, and I'm going to install the pink windows called ubuntu.

In any case, I really appreciate (you don't know how much) your answer, since at least it made me understand the problem, thank you very much and your work is incredible.

tazz48431 year ago (edited 1 year ago)

Does this implementation of OpenVINO support the GNA in 10th to 14th generation Intel CPUs? Intel advertises it as follows:

Intel® Gaussian & Neural Accelerator is a low-power neural coprocessor for continuous inference at the edge.

When power and performance are critical, the Intel® Gaussian & Neural Accelerator (Intel® GNA) provides power-efficient, always-on support. Intel® GNA is designed to deliver AI speech and audio applications such as neural noise cancellation, while simultaneously freeing up CPU resources for overall system performance and responsiveness.

They also later mention it could be used for tasks such as speech-to-text, and I'm curious if/how well whisper would perform on it.

Setting the OpenVINO device to "gna" just throws an error with assertion failed

whisper_ctx_init_openvino_encoder: loading OpenVINO model from '../../models/ggml-base-encoder-openvino.xml'
whisper_ctx_init_openvino_encoder: first run on a device may take a while ...
whisper_openvino_init: path_model = ../../models/ggml-base-encoder-openvino.xml, device = GNA, cache_dir = ../../models/ggml-base-encoder-openvino-cache
in openvino encoder compile routine: exception: Check 'false' failed at src/inference/src/core.cpp:114:
[ GENERAL_ERROR ]  AssertionFailed: split_sizes.size() > 1

whisper_ctx_init_openvino_encoder: failed to init OpenVINO encoder from '../../models/ggml-base-encoder-openvino.xml'

ilya-lavrenov1 year ago (edited 1 year ago)

2.When I ran make I received an error, it turns out that I had the debian arm version and my computer is x86_64, but when I went to the repository to download the appropriate one, I discovered that all the packages for debian are arm so what? !

OpenVINO Ubuntu packages are compatible with Debian OS. You can use OpenVINO archives as well as install via apt and Debian packages.

ilya-lavrenov commented on 2023-10-26

models/convert-whisper-to-openvino.py

	25		onnx_path,
	26		input_names=["mel"],
	27		output_names=["output_features"]
	28		)

ilya-lavrenov1 year ago

it's not required to export to ONNX before usage in OpenVINO.
You can use convert_model with PyTorch in-memory object https://docs.openvino.ai/2023.1/openvino_docs_OV_Converter_UG_prepare_model_convert_model_Convert_Model_From_PyTorch.html

models/openvino-conversion-requirements.txt

1

openvino-dev[pytorch,onnx]

ilya-lavrenov1 year ago

we can use openvino>=2023.1.0 which contains update version of convert_model directly in main openvino pip package, while openvino-dev is actually deprecated.

openvino/whisper-openvino-encoder.cpp

	33		std::shared_ptr<ov::Model> model = core.read_model(path_model);
	34
	35		// Produce a compiled-model object, given the device ("CPU", "GPU", etc.)
	36		auto compiledModel = core.compile_model(model, device);

ilya-lavrenov1 year ago

you can pass path_model directly to compile_model, which can speed-up loading with ov::cache_dir enabled. See https://docs.openvino.ai/2023.1/openvino_docs_OV_UG_Model_caching_overview.html#make-it-even-faster-use-compile-model-modelpath

pukerpanda1 year ago

Any practical speedup from this change?

I'm on OpenVINO 2022.3.1 for device which is EOL'ed. I can compile master and run it with cache:

whisper_openvino_init: path_model = models/ggml-base.en-encoder-openvino.xml, device = MYRIAD, cache_dir = models/ggml-base.en-encoder-openvino-cache

The speed is on par with CPU/GPU OpenVINO. And it helps RPi to inference on base model.

RyanMetcalfeInt81 year ago (edited 1 year ago)

Probably some yes, but the speedup will be during initialization (i.e. the time it takes to pull the model / cached blob from disk and prep the execution device).

RyanMetcalfeInt81 year ago

@ilya-lavrenov -- good suggestions, looks like OpenVINO made some nice improvements for 2023.1+. Did you want to submit a PR with the updates / fixes?

2615	2638
	2639	#ifdef WHISPER_USE_OPENVINO
	2640	// replace .bin with-encoder-openvino.xml
	2641	static std::string whisper_get_openvino_path_encoder(std::string path_bin) {

	2649		return path_bin;
	2650		}
	2651
	2652		static std::string whisper_get_openvino_path_cache(std::string path_bin) {

1772	1785	}
1773	1786	#endif
	1787	#ifdef WHISPER_USE_OPENVINO
	1788	else if(use_openvino) {

	2660	openvino_path_cache += "-encoder-openvino-cache";
2659	2661
2660		return path_bin;
	2662	return openvino_path_cache;

whisper.cpp
OpenVINO support
#1037

Merged

OpenVINO support #1037

Running Whisper inference using OpenVINO

High-level description of changes

How to generate models and enable OpenVINO for whisper.cpp builds

whisper.cpp OpenVINO support #1037 Merged

OpenVINO support #1037

Running Whisper inference using OpenVINO

High-level description of changes

How to generate models and enable OpenVINO for whisper.cpp builds

whisper.cpp
OpenVINO support
#1037

Merged