Hi thank you your code saved my day! I think line 535 needs to modify a bit prompt_tensor = torch.tensor(generate_kwargs["prompt_ids"], dtype=out["tokens"].dtype).cuda() if is_torch_cuda_available else torch.tensor(generate_kwargs["prompt_ids"], dtype=out["tokens"].dtype)
, and add is_torch_cuda_available
to line 22. without cuda it'll run on cpu which is a lot slower.
@kaminwong , this is just to modify the output sequence to avoid showing inital_prompt
in transcription.
Actual generation has device handles in below line.
tokens = self.model.generate(
attention_mask=attention_mask,
**generate_kwargs,
)
Apart from this token decoding part is serialised implementation which has no effect, that can be misuse of GPU.
Thanks for the reply! But if I don't make that changes I get the following error, so I assume prompt_tensor
needs to be in cuda if device is also in cuda? Or is there any other way to correct the error? Thank you for your time.
File "/.../python3.10/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 538, in _forward if (tmp_tokens[0:nprompt_token] == prompt_tensor).sum() == nprompt_token: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
I followed the code you posted:
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "openai/whisper-small"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
chunk_length_s=15,
batch_size=16,
torch_dtype=torch_dtype,
device=device,
processor=processor
)
@kaminwong , Thank you for addressing. I understood the issue. let me verify and reolved it.
@kaminwong , you can pull latest commit and install it should work now. its fixed.
Thank you for the elegant solution. It works now!
Gentle ping @sanchit-gandhi for review
@amyeroberts is there any plan to close this in near future? or will it take time?
@Biswajit2902 Once @sanchit-gandhi has reviewed and approved, the PR will need a final review from a maintainer. Once approved, then the PR can be merged in.
Hey @Biswajit2902 - thanks for working on this welcome feature! Super sorry for the late review here. Left some comments regarding the pipeline design and how we can simplify.
206 | processor: Optional[AutoProcessor] = None, | ||
203 | 207 | **kwargs, | |
204 | 208 | ): | |
209 | self.processor = processor |
Unfortunately, we can't accept the processor
as an attribute of the pipeline
, for reasons mentioned here.
508 | # Added initial prompt for whisper | ||
509 | if "initial_prompt" in generate_kwargs: | ||
510 | initial_prompt = generate_kwargs.pop("initial_prompt") | ||
511 | generate_kwargs["prompt_ids"] = self.processor.get_prompt_ids(initial_prompt) |
By design, we can get the same behaviour using the tokenizer
method get_prompt_ids
:
generate_kwargs["prompt_ids"] = self.tokenizer.get_prompt_ids(initial_prompt)
Let's simplify this logic to only ever have the feature extractor + tokenizer, and always rely on tokenizer.get_prompt_ids
to convert the prompt.
513 | 530 | else: | |
514 | 531 | out = {"tokens": tokens} | |
532 | |||
533 | if "prompt_ids" in generate_kwargs: |
This should be a post-processing step, rather than in the _forward
. Could you move it to the method postprocess
please?
@sanchit-gandhi , Thank you so much for the review. will work on your comment.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@Biswajit2902 any new updates? let me know if you need help
@thomasmol I will update on this soon. was busy since two weeks. Thank you for the reminder.
@thomasmol @sanchit-gandhi , i see below conflict in AutomaticSpeechRecognitionPipeline._sanitize_parameters
;
<<<<<<< main
forward_params["generate_kwargs"]["max_new_tokens"] = max_new_tokens
if initial_prompt is not None:
forward_params["generate_kwargs"]["initial_prompt"] = initial_prompt
=======
forward_params["max_new_tokens"] = max_new_tokens
>>>>>>> main
I want to understand why we removed generate_kwargs
from forward_params. Also initial_prompt.
My changes before were working fine. But after this, there seems have some bug. I am working on resolving it. So need your input on this.
Hey @Biswajit2902 - you can read the motivation for this change here. Essentially, we're unifying the forward_params
and generate_kwargs
in _sanitize_parameters
. However, for the purposes of your feature, you should strive to put the initial_prompt
under preprocess_params
:
preprocess_params["initial_prompt"] = initial_prompt
And then convert the text prompt to token ids in the preprocess
method, which will then be passed to _forward
.
@sanchit-gandhi , Thanks for the pointer. Sorry got super busy could go back review. Will do it soon and close it.
@sanchit-gandhi , Just an update. I have made the changes for this issue as suggested. But i have identified the output is not proper like before. seems like generate has some issue. its adding initial prompt with all the chunks. Will check and update on this. Also let me know if any existing issue going on this to your knowledge.
cc @kamilakesbi as @sanchit-gandhi is off
are there any updates on this? or other ways you know of for pushing the model to more easily detect certain words using this pipeline?
Hey @basicblueberrry136, thanks for your comment!
@sanchit-gandhi's review still has to be addressed before the next steps. Once it's done, I'll make another review! Hopefully it'll move fast!
I believe this is very helpful when used with the serverless inference API.
It seems that the serverless inference API uses the Transformers library to run models, and we cannot pass any parameter that has a type of tensor, as shown below:
const data = fs.readFileSync(filename);
const b64 = data.toString('base64');
const body = JSON.stringify({
inputs: b64,
parameters: {
return_timestamps: true,
generate_kwargs: {
num_beams: 1,
prompt_ids: [50362, 27338, 3763, 48022, 2257, 48022, 6784, 118, 25157, 1546, 15789, 23987, 5975, 17174, 28472, 25750, 6062, 1543],
}
}
});
It results in the following error:
{
"error": "unknown error",
"warnings": [
"There was an inference error: unknown error: list indices must be integers or slices, not NoneType"
]
}
If initial_prompt
is added, we can pass the prompt as a string to the serverless inference API.
Hi, thanks for your work! Are there any updates on this?
Login to write a write a comment.
What does this PR do?
Fixes # (feature)
initial_prompt
support for whisper Pipeline (automatic-speech-recognition)Before submitting
processor
considered as optional parameterWho can review?
Anyone in the community is free to review the PR once the tests have passed. @sanchit-gandhi , @Narsil, Can anyone help to take this PR forward please. Let me know, if anything is needed.
fixes #27317