PR #7165 [rebase] rebase fori_loop_simple_case_test

Add doc explaining how it works (#6937)

f44e1dfc

Fix profiling in benchmark script (#6934)

310f08dc

[Pallas] Integrate FlashAttention with SPMD (#6935)

9f2b82dc

Use TPU build for CPU and GPU Python tests (#6921)

b2556d6c

[Fori_loop|While_loop] Create fori_loop.md (#6942)

0417d4d5

Update XLA pin, 04/19/2024 (#6944)

2ec77062

Update jinja and sphinx versions to address the vulnearbility (#6946)

9ba844a4

Make `nms` fallback by default. (#6933)

b06c9c77

revert expand test with dynamo (#6950)

62a2b11c

Lower embedding bag forward only (#6951)

46919a47

Cleanup the code example in the torch_xla2 README. (#6939)

6fd448dc

[torch_xla2] Simplify developer setup steps (#6905)

89efd178

Create rc13 trigger (#6956)

fa090a24

update torch deps to 2.3 (#6959)

b5574d83

update rc14 (#6962)

69eeace3

[Doc] Update Pallas user guide (#6961)

a7749fa1

Add final 2.3 trigger (#6963)

5369e7d6

Temporarily ignore torch commit in CI test (#6964)

76f7dd06

[Doc] Improve docker instructions (#6969)

af74c349

Enable PagedAttention through Pallas (#6912)

6ed20260

Update readme for 2.3 releae (#6967)

0a204a6b

Update GPU readme (#6968)

abe090ad

Write `torch_xla.version.__torch_gitrev__` to file directly (#6966)

0054ec08

fix pytorch CI after pin update, change test to use assertLessEqual (…

2a204e9b

Update Openxla-pin to 04/24 (#6975)

4b481349

Move test_grad_checkpoint.py to tpu test list (#6976)

2bf59e0c

Revert "Update Openxla-pin to 04/24" (#6980)

023e2c83

Update CODEOWNERS for build infrastructure (#6953)

3f5ff0f5

Move `.torch_pin` and handle in ansible (#6920)

b3be775a

Update dynamo test to be less constrain (#6981)

b9a9449f

Build CPP tests in new CI workflow (#6947)

b834e499

Run TPU CI when label is on PR (#6984)

174f4077

Add readme to call a model (lost due to merge conflicts) (#6986)

6443e593

Fix permission issues during CI checkout (#6985)

6d01bb6e

[Revert Revert] Update OpenXLA-pin update to Apr25 (#6982)

971ebe1f

Update test_export_fx_passes.py (#6972)

75278161

Change name of CI documentation (#6994)

7cc78a68

Rework docs push (#6954)

c91171d5

`sudo rm` leftover files in GHA (#6995)

0e032b17

Fixes to dynamic_shapes args in test_unbounded_dynamism.py (#6999)

73b915b5

Manually push to `gh-pages` branch (#6996)

d25f4752

Re-land: dynamo expand test with view-replay. (#6958)

42db7096

Move the nightly whl instruction out of the hide area (#7000)

87329ce0

Don't fail docs push if there's nothing to commit (#7001)

5f75290e

Complain when TensorFlow is installed (#7004)

4a5e238e

Clean up workspace before test (#7005)

b8f8fa9e

Tag CI build with git hash (#7003)

77bbf7f3

fix addbmm opinfo (#6993)

2399e10f

Fix more opinfo tests (#7008)

2907ab30

Fix q dtype in paged attention kernel (#7011)

865836ad

Build upstream CI image on push to master (#6952)

4883f6fe

[Pallas] Support segment ids in flash attention (#6943)

400bd0c9

Support pin pr number in new .torch_pin (#6998)

cbbefa2c

Expose python unsafe buffer pointer (#7006)

9d84df2a

Fix torch.full scalar type (#7010)

0a54b2b1

Update doc push workflow guide and requirement (#7002)

93ce0541

Update XLA/JAX/libtpu pins to 2024/05/02 (#7020)

213b72b9

Support export custom op to stablehlo custom call (#7017)

666eccb4

Add CI overview to `ci.md` (#7015)

aad6a12a

Update non_xla attention to properly support paged_attention dynamo c…

2bce3f83

pass TPU_ML_PLATFORM_VERSION env to libtpu (#7021)

d1235858

Handle multiple inplace update input output aliasing (#7023)

e3fc0331

Add a missing case for _unsafe_buffer_pointer tests. (#7026)

a0063724

Move op dispatching logic into an `Environment` class; and use Mode t…

825ba0da

[Pallas] Support FA sm_scale (#7035)

1c31cde0

Remove date tag for dev images (#7036)

b543dc0e

[Pallas] Improve FlashAttention segment_ids test case (#7034)

887d3446

[Pallas] Improve segment_ids API UX (#7037)

5bbe5c82

add a metrics to track persistent cache loading time (#7039)

c1b745e5

Adding megablox gmm standalone (#6940)

40f7e1f5

[FSDPv2] Support MultiSlice (#7044)

6f0b61e5

[FSDPv2] Fix test_fsdp_v2_multi_slice (#7055)

4a1588c3

[Pallas] Add test_pallas_spmd.py to tpu ci (#7045)

5cb473a2

Add option to export FX Node metadata to StableHLO (#7046)

6f392ccc

Pin update 20240513 (#7052)

2b28ae29

Add example to support pytorch lightning; misc bug fixes (#7054)

b64d8a2a

Add simple example for how to use torch_xla (#7048)

ae63cd1e

Update runner and runner-container-hooks versions (#7058)

f26c35c2

Support megacore_mode in paged_attention (#7060)

cbb9e213

reenable disabled pt2e test (#7059)

df0d147e

add amp example (#7062)

1fa1f858

Update TPU CI debugging tips (#7066)

a8eae0d9

add DDP with SPMD example (#7063)

c6074abd

Support torch_xla2 benchmarking using torchbench (#7013)

9e189350

[benchmarks] Fix AMP setup for torchbench models. (#7067)

aeed89eb

format the output model input (#6869)

68daf61f

Map jnp.int4 to torch.int8 (#7071)

a6ee8a50

Add a unit test for MoE layer. (#7069)

8247aec1

Add VSCode devcontainer instructions to CONTRIBUTING.md (#7072)

56c2368c

Add `xla.step` context manager (#7068)

3c59087e

[benchmarks] Add default value to `move_to_device`. (#7080)

8990f1b9

Fix overflow for `div` arguments. (#7081)

a2540acb

Fix usage of extract_jax (#7075)

e0d5a49a

Add example for a decoder only model (#7082)

e0fb8782

add example for fsdp (#7061)

961c22ae

Implement `ComputationClient::GetMemoryInfo` (#7086)

5409cd5b

Dump HLO HBM usage info (#7085)

206f1b7f

Add data-type promotion to `gelu_backward`. (#7090)

8d35eb05

add missing aten op (#7078)

5e1d454e

Add dlpack support (#7025)

60238557

Add torch_xla2 `export_program_to_stablehlo` API with unbounded dynam…

baf08aea

Add FSDPv2 example for the decoder only model (#7088)

f336317e

Update spmd doc (#7096)

8a1ada88

Add examples for how to benchmark a PyTorch/XLA model (#7089)

0ce06eca

reorganize the example dir (#7097)

c294625d

[Pallas] Refactor the gmm kernel (#7099)

5327033b

add example for flash attnetion (#7098)

8ae4c769

chore(doc): fix typos in FDSPv2 doc (#7104)

7350b702

Add data-type promotion to `stack`. (#7091)

a299f337

fix jax dependency bug (#7105)

cb8533be

[Pallas] Introduce _make_group_metadata (#7107)

22e912ea

implement Repeat with fixed output shape (#7114)

3369bf7d

[Pallas] Support _histogram (#7115)

cb805837

Update XLA pin to 2024/05/24 and fix Hermetic Python integration (#7110)

7d31f7de

[Pallas] Make gmm functional (#7117)

a9b4fadf

Only use remote cache for main repository, not forks (#7112)

1a8c2fe2

[Pallas] Make gmm output a tensor (#7120)

65b5ace8

[Pallas] Set a better tiling for gmm (#7119)

fd4900ce

`index`: fix index of 0-element tensor by 0-element tensor. (#7113)

be3b08e6

Use remote cache for `push` events too (#7124)

7770a494

remove torch_xla2 jax dependency in native install (#7126)

ed90be1e

[MoE] Test sorting lhs for gmm (#7121)

8531d1c5

[Pallas] Make gmm support bf16 (#7133)

fb373129

[FSDPv2] Shard on the maximal dim of weights (#7134)

15fc0f1c

Update XLA pin as of 20240528 (#7131)

6f406b77

update example dir's README (#7136)

c7bbdfbb

`upsample_bilinear`: fix output data-type. (#7111)

468a5c93

Add function for retrieving fallback operations. (#7116)

6d271232

Facebook->Meta in README (#7141)

0eded8da

Update contribute to include bdist_wheel (#7135)

c367b66f

[Pallas] Support tgmm (#7137)

50d81b3e

Revert "`upsample_bilinear`: fix output data-type." (#7142)

f1141980

Add Python 3.11 version for release build (#7143)

bd70d3f0

add test for onehot to make sure no fallback (#7127)

ad73e070

Disable kokoro and Update without key (#7106)

0ca090c3

Remove old checkout files from GitHub workspace (#7146)

ffbbd438

[Pallas] Make repeat_with_fixed_output_size not OOM on VMEM (#7145)

ce1205e1

[Pallas] Introduce gmm_backward (#7151)

c96c95a4

Deprecate XLA_USE_BF16 (#7150)

af51f063

[Pallas] Introduce GMM(torch.autograd.Function) (#7152)

aeed61a9

Make from_dlpack handle cuda synchronization implicitly for input ten…

daada224

add PT_XLA_DEBUG_LEVEL (#7149)

8c2234ee

Add optimizer priming for dist chkpt (#6572)

8fd051f2

[Doc] Update spmd.md for doc (#7019)

cb482bca

Update configuration.yaml (#7158)

8471826a

Add a CI workflow for tests that requires pytorch CUDA. (#7140)

6fadbf5d

ManfeiBai marked this pull request as ready for review 1 year ago