xla
Add `_sharded_cpu_state_dict` for distributed checkpointing
#5288
Merged

Add `_sharded_cpu_state_dict` for distributed checkpointing #5288

jonb377 merged 46 commits into pytorch:master from yashjs_sharded_cpu_state_dict
shahyash10
shahyash10 initiak commit
1185c3ea
will-cromar Add test workflow for `xrt` branch (#5241)
aeb90bcf
qihqi Add function to generate stablehlo based callable from pytorch model …
e7c49219
will-cromar Only run the main CI workflow on PRs targeting master and release bra…
141514b2
cowanmeg AMP for TPUs v3 (#5161)
4a142e0b
cowanmeg remove duplicate autocast_test (#5246)
c8052632
will-cromar Remove `test_experimental_pjrt_tpu.py` from TPU CI (#5247)
2b6f2849
will-cromar Install `expecttest` in xla_test_job.yaml (#5252)
c4611a1e
mateuszlewko Add IAM roles for cloudbuild_editors (#5251)
b0fbb485
alanwaketan [Functionalization] Remove view in view_symint (#5231)
2606a309
will-cromar Delete XRT from the main branch (#5240)
1901688c
wonjoo-wj Add nightly build for cuda 12 (#5253)
37ac0495
vanbasten23 Fix the linter command in the CI (#5254)
6754db49
JackCaoG Jack cao g/fix spmd buff is null (#5256)
ec471f5d
vanbasten23 Skip calling as_strided in empty_strided_symint if the input has dyna…
a2f8a93d
will-cromar Add XRT nightly builds (#5261)
db7f8ee5
ManfeiBai [OpenXLA] Migrate to pull XLA from OpenXLA (#5202)
bf759cfe
JackCaoG Add ToString method for both PjrtData and PjrtShardedData (#5265)
8aa92dd1
JackCaoG Update Sharded graph HLO dumping (#5266)
cf3bef8c
lsy323 Enable PjRt Client Compilation with StableHLO (#5233)
00191db6
lsy323 Disable Bazel remote cache for forked PR (#5259)
31fbc332
stgpetrovic Suppress debug symbols in OpenXLA code (#5269)
3c0450a6
khatwanimohit [SPMD] Sharding n-d tensor on (n+1)-d Mesh (#5268)
82a8041a
will-cromar Make TPU detection more robust (#5271)
0d37af46
stgpetrovic Clean bazel stuff on distutils clean. (#5274)
b0a70d3d
ManfeiBai Delete unused .so file, and .lds files (#5275)
03d4f70e
qihqi Fix the error when export_torch_model is given a non-tensor (#5277)
15e32b25
JackCaoG Dsiable test_simple_model_with_different_input_shape since it is curr…
42a41a1e
qihqi Always do build_ext in python setup.py develop (#5273)
60217dba
will-cromar Remove or improve several hardcoded TPU test conditions (#5272)
4af36bac
will-cromar Add `runtime.host_index` (#5283)
a6f72731
vanbasten23 Make it an error if calling sizes() on a dynamic tensor. (#4998)
fa6ff04f
JackCaoG Fix the error where mark_step does not materalize tensors on SPMD:0 (…
97284611
wonjoo-wj Disable torch._dynamo.config.automatic_dynamic_shapes (#5285)
8c13a267
shahyash10 Merge branch 'master' of https://github.com/pytorch/xla into yashjs_s…
9e745815
shahyash10 run linter
aed264fd
jonb377
jonb377 commented on 2023-07-07
shahyash10 wrap only if sharding type is non-replicated
2797df3d
shahyash10 shahyash10 requested a review from jonb377 jonb377 2 years ago
jonb377
jonb377 commented on 2023-07-10
shahyash10 Merge branch 'master' of https://github.com/pytorch/xla into yashjs_s…
34d7f9e5
shahyash10 Handle non-tensors
0842686c
shahyash10 run linter
97f697f7
shahyash10 shahyash10 requested a review from jonb377 jonb377 2 years ago
jonb377
jonb377 commented on 2023-07-10
shahyash10 Call wrap_if_sharded first
1c78d8e3
shahyash10 shahyash10 requested a review from jonb377 jonb377 2 years ago
shahyash10 Add exception in test for unsharded tensor
1faeac36
shahyash10 Merge branch 'master' of https://github.com/pytorch/xla into yashjs_s…
7c614a2a
shahyash10 fix test
4e4d04a5
shahyash10 Use torch.Tensor instead of torch.tensor
a82e8ea2
jonb377
jonb377 commented on 2023-07-12
shahyash10 use .cpu() only for tensors
89ae5684
shahyash10 shahyash10 requested a review from jonb377 jonb377 2 years ago
jonb377
jonb377 approved these changes on 2023-07-13
jonb377 jonb377 merged 46a0117b into master 2 years ago
shahyash10 shahyash10 deleted the yashjs_sharded_cpu_state_dict branch 2 years ago
alanwaketan
jonb377
alanwaketan

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone