Remove repetitive devices in load_state_dict() (#1321)
Previously devices() was a list containing duplicate entries. This
changes it into a set.
This significantly speeds safetensors loading when the device map is
long, as the safetensors loop loads each weight entry for each device
entry.
Co-authored-by: John Doe <john.doe@example.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>