Precompute flash attention padding info (#880)
* ..
* ..
* ..
* ..
* ..
* ..
* ..
* ..
* ..
* ..
* ..
* ..
* ..
* ..
* ..
* ..
* Update llmfoundry/models/mpt/modeling_mpt.py
Co-authored-by: Vitaliy Chiley <6439018+vchiley@users.noreply.github.com>
* dummy data
* undoing last commit
* ..
* ..
* Update llmfoundry/models/mpt/modeling_mpt.py
Co-authored-by: Vitaliy Chiley <6439018+vchiley@users.noreply.github.com>
* ..
* ..
---------
Co-authored-by: Vitaliy Chiley <6439018+vchiley@users.noreply.github.com>