PR #5574 Cherry-pick 2.1 release branch into XRT branch through 9/14

Sharding should be per output of IR Node, instead of per IR Node (#5330)

will-cromar committed 2 years ago

Update Python device API for SPMD (#5129)

will-cromar committed 2 years ago

Check out the release branch instead of origin/master in ansible (#5344)

will-cromar committed 2 years ago

Also dump output sharding on HLO file (#5339)

will-cromar committed 2 years ago

Make all-reduce a no-op when world size is 1 (#5342)

will-cromar committed 2 years ago

add fs linker flag (#5347)

will-cromar committed 2 years ago

Add py3.10 whl path to doc, refactor whl table (#5354)

will-cromar committed 2 years ago

fix amp dtype setting for GPU (#5337)

will-cromar committed 2 years ago

Add python test for SPMD+Runtime Python API (#5349)

will-cromar committed 2 years ago

Check the actual device instead of query env var for virtual device (#5352)

will-cromar committed 2 years ago

[BE] use self.assertEquals instead of str equality in test_zero1.py (#5364)

will-cromar committed 2 years ago

Revert "[BE] use self.assertEquals instead of str equality in test_zero1.py (#5364)" (#5366)

will-cromar committed 2 years ago

[Dynamo|TPU] Tweak `atol` and `rtol` for `test_dynamo.py` (#5363)

will-cromar committed 2 years ago

[Dynamo|TPU] Skip`DynamoTrainingBasicTest.test_resnet18` on TPU (#5362)

will-cromar committed 2 years ago

Add a script for running stablehlo tests. (#5360)

will-cromar committed 2 years ago

Don't rewrite index hints in global save planning (#5348)

will-cromar committed 2 years ago

[Dynamo|TPU] Skip `DynamoInferenceBasicTest.test_resnet18` on TPU (#5361)

will-cromar committed 2 years ago

[BE] use self.assertEquals instead of str equality in test_zero1.py (#5367)

will-cromar committed 2 years ago

Fix ReplicateShardedData for int type (#5374)

will-cromar committed 2 years ago

Update dynamo.md (#5378)

will-cromar committed 2 years ago

Revert "Fix ReplicateShardedData for int type (#5374)" (#5380)

will-cromar committed 2 years ago

Remove the mention of XRT_TPU_CONFIG in the CONTRIBUTING.md (#5379)

will-cromar committed 2 years ago

[Dynamo|TPU] Tweak `atol` and `rtol` for `test_simple_model_with_different_input_shape` on TPU (#5373)

will-cromar committed 2 years ago

Rectify test_zero1.py once optim.load_state_dict doesn't guarantee immutability (#5382)

will-cromar committed 2 years ago

Add gpu doc for how to build PyTorch/XLA from source with GPU support. (#5384)

will-cromar committed 2 years ago

clear pending ir should also clear the cc op tokens (#5385)

will-cromar committed 2 years ago

Port resnet data loading optimizations to SPMD test script (#5386)

will-cromar committed 2 years ago

Add support for in-place ops with self tensors in dynamo bridge (#5309)

will-cromar committed 2 years ago

Add dynamo test in TPU CI (#5381)

will-cromar committed 2 years ago

Add manual seed in multihost checkpoint (#5392)

will-cromar committed 2 years ago

Fix change_id type in coverage uploading (#5394)

will-cromar committed 2 years ago

Update dynamo cpu fallback op to aten::_foobar (#5393)

will-cromar committed 2 years ago

Run single host multi GPU tests in the CI. (#5387)

will-cromar committed 2 years ago

[PJRT] Separate collective ops test from TPU runtime test. (#5396)

will-cromar committed 2 years ago

Fix ReplicateShardedData for int type (#5404)

will-cromar committed 2 years ago

Update the dynamo backend name to `openxla` (#5402)

will-cromar committed 2 years ago

[SPMD] Multi-host batch sharded data loading (#5331)

will-cromar committed 2 years ago

Refactor to share code between export_torch_model and save_as_stablehlo (#5388)

will-cromar committed 2 years ago

Fix TPU collective ops test for multi-host TPUs (#5408)

will-cromar committed 2 years ago

Partially replicate lower-rank tensors (#5409)

will-cromar committed 2 years ago

Revert "Partially replicate lower-rank tensors (#5409)" (#5412)

will-cromar committed 2 years ago

SPMD cross slice-replication using partial_replication sharding (#5411)

will-cromar committed 2 years ago

Fix the incorect clone arg condition in dynamo bridge (#5414)

will-cromar committed 2 years ago

[SPMD] named partition spec support (#5415)

will-cromar committed 2 years ago

[PJRT|TPU] Update `test_xla_devices_single_process_all_chips` for expected device number (#5421)

will-cromar committed 2 years ago

Add repo for libcudnn8=8.7.0.84 and CUDA 11.8 (#5425)

will-cromar committed 2 years ago

Update fix_includes.sh (#5441)

will-cromar committed 2 years ago

[PJRT] Support `torchrun` with `pjrt://` `init_method` (#5438)

will-cromar committed 2 years ago

Bugfix + add more test for llama (#5439)

will-cromar committed 2 years ago

Move the C++ test build to CI build job instead of test job (#5442)

will-cromar committed 2 years ago

Update gcc to 10. (#5445)

will-cromar committed 2 years ago

Update the random seed for every dynamo execution (#5444)

will-cromar committed 2 years ago

Revert "Update gcc to 10. (#5445)" (#5449)

will-cromar committed 2 years ago

Install gcc-10 (#5450)

will-cromar committed 2 years ago

Revert "Install gcc-10 (#5450)" (#5452)

will-cromar committed 2 years ago

parallelize SPMD inputhandler and GetDataShards (#5447)

will-cromar committed 2 years ago

Remove base image override from TPU CI build (#5453)

will-cromar committed 2 years ago

Update to GCC 10 (#5451)

will-cromar committed 2 years ago

Cache sharded placeholder for dynamo execution (#5446)

will-cromar committed 2 years ago

Remove Docker image override from dev image (#5456)

will-cromar committed 2 years ago

hack: implement (unimplement?) GetDataShard for XRT

will-cromar committed 2 years ago

skip flaky test (#5459)

will-cromar committed 2 years ago

Neuron import hook (#5429)

will-cromar committed 2 years ago

Add missing includes (#5434)

will-cromar committed 2 years ago

[GPU]Update README.md with wheel/docker for CUDA12.0 and deprecate CUDA11.7 (#5443)

will-cromar committed 2 years ago

update remote cache key in ansible (#5463)

will-cromar committed 2 years ago

Fix data type in Pow with Scalar base and Tensor exponent (#5467)

will-cromar committed 2 years ago

bump the timeout for CI (#5470)

will-cromar committed 2 years ago

Fix the input sharding for dynamo (#5469)

will-cromar committed 2 years ago

Enabling sharding device data IR (#5475)

will-cromar committed 2 years ago

Introduce `torch_xla.runtime.use_spmd()` (#5474)

will-cromar committed 2 years ago

Enable PJRT C API Client and other changes for Neuron (#5428)

will-cromar committed 2 years ago

Don't move full tensor to device in deferred_init (#4819)

will-cromar committed 2 years ago

[SPMD] Fix HybridMesh ordering (#5478)

will-cromar committed 2 years ago

[SPMD] Properly skip tests on TPU V2 (#5479)

will-cromar committed 2 years ago

Add @yeounoh to .github CODEOWNERS (#5482)

will-cromar committed 2 years ago

Add Python API to execute StableHLO bytecode (#5476)

will-cromar committed 2 years ago

[SPMD] Fix TPU CI after #5478 (#5487)

will-cromar committed 2 years ago

[SPMD] Fix XLA_DUMP_POST_OPTIMIZATIONS test (#5485)

will-cromar committed 2 years ago

[Dist] Refactor ZeRO-1 (#5145)

will-cromar committed 2 years ago

Update artifacts.auto.tfvars for 2.1 release (#5483)

will-cromar committed 2 years ago

Add ShardingSpec to XLATensor when it is created with a PJRTShardedData (#5489)

will-cromar committed 2 years ago

Add topological sorting to dynamo partitions (#5472)

will-cromar committed 2 years ago

[SPMD] Patch nn.Linear (#5491)

will-cromar committed 2 years ago

[original author: mrnikwaws] Neuron operator support (#5471)

will-cromar committed 2 years ago

[SPMD] Make IR sharding custom sharding op (#5433)

will-cromar committed 2 years ago

Support input sharding changed after first dynamo tracing (#5477)

will-cromar committed 2 years ago

Always use ExecuteReplicated with SPMD (#5494)

will-cromar committed 2 years ago

Skip a couple tests on TPU due to precision issue (#5496)

will-cromar committed 2 years ago

Refactor stablehlo API and put them in official location. (#5493)

will-cromar committed 2 years ago

Support tuples in partition spec (#5488)

will-cromar committed 2 years ago

Add a API to explictly init runtime (#5500)

will-cromar committed 2 years ago

Add explict error message when tensor is on CPU for dynamo backend (#5499)

will-cromar committed 2 years ago

remove torchvision in stablehlo.py (#5501)

will-cromar committed 2 years ago

Fix tupled partition spec test on v3 (#5503)

will-cromar committed 2 years ago

Update dynamo doc (#5506)

will-cromar committed 2 years ago

Update dynamo.md (#5509)

will-cromar committed 2 years ago

Get original_traced_args as example_inputs. (#5511)

will-cromar committed 2 years ago

mark_sharding over a replicated tensor is allowed. (#5513)

will-cromar committed 2 years ago

[SPMD] Propagate replicated output (#5508)

will-cromar committed 2 years ago

xla Cherry-pick 2.1 release branch into XRT branch through 9/14 #5574 Merged

xla
Cherry-pick 2.1 release branch into XRT branch through 9/14
#5574

Merged