onnxruntime
1001ec93 - Ryanunderhill/beamscorer gpu (#16272)

Commit
2 years ago
Ryanunderhill/beamscorer gpu (#16272) ### Description Make BeamScorer run on the GPU vs the CPU. Brief overview: Adds a CUDA 'CudaBeamSearchScorer' implementation of IBeamScorer Instead of a 'done' flag per beam, there is one single 'not done' variable that is copied to the CPU every iteration Removes some of the extra CPU side buffers and parameters that are no longer needed Remaining future optimizations: CPU copied beam indices is still used in the non DecoderMaskedSelfAttention case. An extra kernel can be written to avoid PickGptPasteState needing CPU copied beam indices (called from UpdateGptFeeds). ### Motivation and Context It's faster to keep the work on the GPU to avoid GPU->CPU->GPU copies of data.
Author
Parents
Loading