PR #2569 Develop upstream sync 240624

[XLA:GPU] Enable new mlir loop emitter by default.

38dd164b

[xla:cpu] Add `FftThunk`

9128be09

Add test case for 1D convolution

cf31ac04

PR #13310: [NVIDIA GPU] Added a rewrite logic in gpu_windowned_einsum…

ba451bde

PR #13866: [ROCm] Handle disabled backends for AMD case

ae6f3d49

Remove unnecessary paths from patch file.

d4ac3610

compat: Update forward compatibility horizon to 2024-06-18

df0c5fc2

Update GraphDef version to 1897.

cfeaaff9

[XLA:GPU] Disable running `gpu_cub_sort_test` in debug mode to avoid …

dc765168

[XLA:GPU] Add initial version of cost model for tiled hlo.

cd75b111

[XLA:GPU] Add `bitcast` and `reshape` to the list of "passthrough" op…

c395b48c

[XLA:GPU] Add constraints to `SymbolicTileAnalysis`.

aa29a22f

Integrate LLVM at llvm/llvm-project@93ffe1792fd9

19b82d6a

[XLA:GPU] Use priority fusion in TritonGemmAutotunerExtractor.

4b120441

Stop using xla/statusor.h now that it just contains an alias for absl…

f1a240d5

[XLA:GPU] Add initial SymbolicTileAnalysis::GetGoodTilings implementa…

44cb866c

PR #13781: [GPU] Let the on-disk kernel compilation cache grow.

5cfe3aca

[XLA:GPU] Support tiling Softmax example

85f91e81

[XLA:GPU][NFC] Move GPU specific latency estimator to a separate file.

db5c5699

[XLA:GPU] Use absl::Span instead of std::vector to pass tile sizes.

c4a89ad0

Move BlockedSparseToMMA pattern from Triton to XLA.

85b90521

[XLA:GPU][NFC] Replace `bitcast`s with `reshape`s in `symbolic_tile_t…

1107c80e

[XLA:GPU] Fall back to cuBLASlt call in autotuner when it makes sense

528cff79

Fix bug in array type conversion util

e23a7194

[XLA:GPU][MLIR-based emitters] Kill thread tiling for MlirColumnReduce.

65550eb4

Temporarily disable cudnn algorithm 14 for all shapes

686e352b

Internal change only.

5555ec62

Move `InferDotOperandSharding` from `sharding_propagation.cc` to `hlo…

40856334

Integrate LLVM at llvm/llvm-project@52d87de7a42d

6e6641a6

[XLA:GPU] Experimental: Add --xla_gpu_per_fusion_autotune_cache_dir o…

221220f1

Adopt ConvertMlirHloToHloModule instead of passing in proto in PJRT

8f9cee48

[XLA:GPU][MLIR-based emitters] Add more tests for row reduction index…

8c8fb8fd

Re-enable tensorflow/compiler/tests:async_comp_test_gpu test

1ae6cf22

[XLA:GPU] Replace TritonFusionAnalysis with SymbolicTileAnalysis in P…

bcbab524

Change default permissions for github actions workflows to satisfy se…

42860aac

Clean up TF deps tf_to_xla_attribute_utils

042e9b85

Propagate SDPA as a StableHLO composite op instead of converting it t…

cc182302

Support array attribute in vhlo seraialization.

69214438

Remove the deprecated PjRtClient::LookupAddressableDevice() that take…

6156204d

Move StreamExecutor::MemZero processing completely into Stream and it…

84910848

[NFC]xla_compile: Report the Status in the CompilationResult even whe…

4fe124f5

[xla:cpu] Add support for single replica all-reduce

5fb87afc

Reverts fee3bfc812780f9c01a4fd936f69562a6884582a

5f3eacb0

Upstream flatbuffer utils to read big models

40ee6456

[XLA:GPU] Redirect all Triton normalization fusions to the new generi…

487d60b0

[XLA:GPU] Remove trailing references to deprecated `kTritonSoftmaxFus…

8f025afe

Reshard LHS and RHS to match output sharding by default to handle dot…

58bf3f5d

Update tf_type shape to print 0 sized dimensions as 00 to avoid any p…

3ea9253b

Explicitly reset tf::SavedModelBundle after done with using it. This …

a35ca23f

[XLA:GPU] Fix OSS dependency on protobuf descriptor.

7a3fbe7b

Remove an unused parameter

e4d0a29e

[XLA] Add shardings for implicit operands and return values of CaseOp…

964fae6c

Use newer version of scorecards-analysis for XLA and TensorFlow

25099d18

Remove use of deprecated op UnaryEinsum.

3eb2492b

Internal BUILD change

cb736a66

Integrate StableHLO at openxla/stablehlo@f1f49945

17228941

Fix variable name.

df114d8d

Add an option to enable shardings where a tensor dim is sharded acros…

ffede056

Set a valid minSdkVersion in dummy manifest.

c50b0dea

Replace cpu and os based selects with platform based selects

c0e79dad

[XLA:GPU] Clang-tidy cleanup for xla/service/ar_crs_combiner.{cc,h}

a1a5b8eb

[XLA:GPU] Clang-tidy cleanup for xla/service/allocation_tracker.cc

ad29fd64

Use jax AOT APIs instead of deprecated jax.xla_computation.

ed006d09

Integrate LLVM at llvm/llvm-project@b99d0b344001

e98b73df

Allow using ifrt_client() as a shared_ptr from PyClient.

e13d7ea9

[XLA:GPU[Rocm] Fix missing import in `ir_emitter_triton_rocm.cc`.

c625df7f

[XLA:GPU] Fetch `EnumDescriptor` utils from `tsl::protobuf` in `trito…

e7e72734

Reverts c0e79dad82e082a2530e82cae7db22a5164effc8

8c71440e

Automated Code Change

b9359c0c

Add test for broadcast of constant.

e68ffab6

Stop using xla/statusor.h now that it just contains an alias for absl…

8f3c0c08

Stop using xla/statusor.h now that it just contains an alias for absl…

28d7d5ab

Integrate Triton up to [6110b0b](https://github.com/openai/triton/com…

99ff62de

compat: Update forward compatibility horizon to 2024-06-19

cfbd0f15

Update GraphDef version to 1898.

654d9625

We have received reports of compiler hangs which we are investigating.

51f415c7

Disable `unary_ops_test` on Arm64 machines

ca60e393

Make :redzone_allocator_kernel_cuda a cc_library

a2db2afa

[XLA:GPU] Two minor cleanups.

80059e48

Automated Code Change

eaef53bb

[XLA:GPU] [NFC] Remove argument which is never passed

cdafce89

[XLA:GPU] Simplify TritonSupport tests by providing a standard ENTRY …

160e7608

[XLA] Remove proto-based communication for service/client

c72dbfc1

Automated Code Change

b1f94db1

Refactor llvm_compiler_test.

89a47214

[XLA:GPU] [NFC] Remove redundant argument to GetKernelAnnotation

ab3720c8

Fix ASAN error with double -> X float conversions.

7c487a38

[XLA:GPU] More consistent error handling for borrowed streams

1571f0a6

[JAX/XLA] Correct the logic for showing stack traces on JAX_TRACEBACK…

df2dae48

Integrate LLVM at llvm/llvm-project@99c43e3ce314

32aa8bb0

[XLA:GPU] Remove dependency from `triton_support_test.cc` on `TritonF…

542172b0

[XLA:GPU] A test used a literal out of bounds.

353e39e0

[XLA:GPU] Remove all SoftMax-related support from legacy Triton infra…

3685c0a6

Stop using xla/statusor.h now that it just contains an alias for absl…

60e8d960

Avoid underflow in f2reduce

d87bc7ee

[XLA:GPU][NFC] Add missing `TODO` in `SymbolicTileTest.CanPropagateTi…

8e5c47c6

[XLA:GPU] Add a method to Cost Model estimate the best tiling for a f…

d1766d92

[XLA] [NFC] Unify multi-host handling for hlo_runner_main

62f235dc

[XLA:GPU] Support tiling "softmax and reduce" example

21cb1e0b

Reverts df2dae48118d675fb92cf51f42aa6abdb391cedc

4347a69f

[XLA:GPU] Add num_warps to BlockLevelFusionConfig and a method to con…

1d4b49fb

[xla:cpu] Add support for multi-process all-reduce collective

c2caec22

Fix the aggregation of power metrics.

2ab9b0ab

[xla:cpu] Add ReplicaId thunk

b92ced39

[xla:cpu] Add support for ReduceScatter thunk

f84430a8

[xla:cpu] Add support for AllGather thunk

185483bb

Copy definition of tflite::ControlEdges to flatbuffer_export.cc.

59d26954

Stop using xla/statusor.h now that it just contains an alias for absl…

3ee9d16f

Automated Code Change

aaebeb86

PR #13603: NVTX: name threads, CUDA devices and CUDA streams

9b12cd1d

Make GPU PJRT tests xla_tests

464f895b

XNNPack MEAN F32 supports all reduction types

36947e0d

compat: Update forward compatibility horizon to 2024-06-20

6cf72fef

Update GraphDef version to 1899.

8ee8ff8d

Clean up some sentinel `-1`s.

75adcb5e

Reverts 9b12cd1d4053f8b93128eec529956ea7521fe63d

acc8b5b9

Delete flags xla_gpu_max_mlir_kernels and xla_gpu_skip_mlir_kernels

d7609726

PR #13831: [GPU] Improve dumping of GEMM fusions.

a69eff2e

PR #13340: [ROCm] Add Swizzle instruction support for mi100+ in reduc…

c72085d4

[JAX] Fix FDO profile deserialization.

03850ed2

[xla:cpu] Don't forget to include buffer branch index buffer in condi…

5350f439

PR #13555: Fix _xla_send_recv_validation in collective pipeliner

ed4deb88

Fix SDPA testing on different devices.

2952f336

Add unit tests for CanEmitFusedDynamicUpdateSliceInPlaceForGpu().

350ecac0

Integrate LLVM at llvm/llvm-project@e5b0c210cc4c

4ddfa6ab

Reverts 4347a69f8985f9777fc9b92a02c86d6a5e23f737

c80f6733

[XLA] Fix up the behavior for grabbing extra streams.

2c0f46d7

Stop using xla/statusor.h now that it just contains an alias for absl…

453db222

[XLA] Remove dead unused pass propagate_static_shapes

93fcc5ee

[XLA:GPU] Use Cost Model to choose tile sizes in SoftmaxRewriterTriton.

f05e4339

Only split those constants that are shared between manually and autom…

45f67b2c

Move StreamExecutor::Memset32 into Stream and its derived classes.

9e0fce7d

Prevents linspace from generating nans for F8 types.

9e43d079

Disable wgmma support in XLA, since it is causing huge compile time r…

a16c577a

[XLA] [NFC] Use a single SequentialThunk to communicate a sequence of…

9024f026

Call ShapeUtil::ByteSizeOfElements instead of a copy of the function …

546829c2

[xla:cpu] Don't run collective test with thunks, not all thunks are r…

af52806c

Migrate usage of schema_conversion_utils.

a0d4b376

Stop using xla/statusor.h now that it just contains an alias for absl…

4fafa4eb

Support quantized per-tensor type for MHLO Ceil/Floor Ops.

70d2f87b

[IFRT] Add AttributeMap

2e76b081

Move StreamExecutor::Memcpy processing for to-device copies completel…

61845939

[XLA:GPU] Support mocking away all collectives

0939cebb

Don't reject F32 non 4D input tensors for float. XNNPack can handle t…

1fa88496

Stop using xla/statusor.h now that it just contains an alias for absl…

d7925898

PR #13985: Removing spurious `option go_package` from autotuning.proto

1dfff9d6

Return absl::Status not TFLiteStatus from ::tflite::optimize::Quantiz…

444d8bdf

Move StreamExecutor::Memcpy to-host processing completely to Stream a…

b1587f62

[xla:cpu] Add support for AllToAll thunk

f4d5a7ab

[IFRT] Add PjRt<->IFRT attribute map conversion utility functions

77fbee0f

Stop using xla/statusor.h now that it just contains an alias for absl…

c02a0c5c

[xla:hlo][NFC] Fix dims in a comment to match the size of reshape_dims.

1789531f

[mhlo] Remove UnaryEinsumOp from MHLO

79ead0e9

Update the `curl` dependency: 8.4.0 -> 8.6.0.

5f2a16c9

[xla:gpu] Rename collective_ops_test_e2e to conform with Google's tes…

f34a29f6

Move StreamExecutor::MemcpyDeviceToDevice processing into Stream and …

9b2e9eeb

Delete translate directory ConvertMlirToGraphDef.

61dc6fb8

Replace string with std::string in quantize_model_test.cc

d7913e7d

Brings back one usage of GetCorrectedUseTime() to improve code reuse

e7218705

[xla:cpu] Add support for CollectivePermute thunk

5fe8d891

[IFRT Proxy] Run client and backend tests with all supported protocol…

d5b1a3db

Minor cleanups to #includes etc. in sample_stable_delegate.

8c81a27f

Fix formatting in `tensorflow/python`

5483b604

[JAX] Teach jit fast path how to handle negative static_argnums corre…

4f2091da

Implements the `layout` method for the BasicStringArray class.

7be686d7

PR #13971: [ROCm] Fixed build break caused by https://github.com/open…

cf23a3e8

Stop using xla/statusor.h now that it just contains an alias for absl…

5bcf657f

[IFRT] Move MemoryKind attribute from IfrtShardingAttrInterface to If…

bc30cae2

[xla:cpu] Add CustomCall thunk.

43b64083

[XLA:Python:JAX] Add a method jax_jit.parse_arguments and a class jax…

f69fafeb

[IFRT] Add ifrt.CopyArrays op to IFRT IR.

0d801cd2

Replace EXPECT_OK with TF_EXPECT_OK

6f72e32c

Simplify div simplification.

6fd81ed5

Stop using xla/statusor.h now that it just contains an alias for absl…

003e21fa

[xla:cpu] Add support for convolution thunks

7fbc7481

compat: Update forward compatibility horizon to 2024-06-21

efbe0865

Update GraphDef version to 1900.

9d037686

[XLA:GPU] Add HloFindAll to hlo_traversal to find all nodes matching …

87c79a19

Unify constant folding for affine expressions.

c3890062

[XLA:GPU] Fix broken build

b45bc2e7

Introduce nested tuple support in FFI

8d71548c

[XLA:GPU] Remove the requirement to run on a machine with a GPU when …

e15904cc

Fix simplification of a//b//c.

57cfb204

Remove unnecessary dependencies.

9aea1b91

NFC: Replace RemoveSummands with MapSummands.

e193e941

Check input channels before delegating

d566f8de

PR #13856: Extend FFI DataType with FP8 Types

e73001e2

[XLA:GPU] Fix integer overflow issues in Cost Model and Symbolic Tile…

f11f5e37

PR #13757: [XLA:GPU] Upgrade cuDNN frontend to 1.5

6dda17ce

[XLA] Do not validate the operand layout constraint of LayoutConstrai…

a8c6cbd5

[XLA:GPU] Clang-tidy cleanup for xla/service/ar_crs_combiner_test.cc

32f6b488

[XLA:GPU] Clang-tidy cleanup for xla/service/bfloat16_conversion_fold…

5ad34022

[XLA:GPU] Parametrize Triton support tests by data type and device type.

9f0e241c

Introduce utility function GetPrevAllowedBatchSize.

a9ab5056

[XLA:GPU] Introduce `ConstraintExpression` to hold generalized constr…

a892e211

[XLA:GPU] Clang-tidy cleanup for xla/service/bfloat16_propagation_tes…

c26e492f

Move metadata_util to tensorflow/compiler/mlir/lite/experimental/remat

abcaba2e

Stop using xla/statusor.h now that it just contains an alias for absl…

ff06f4d7

[XLA] Support broadcast as a formatting op in collective pipeliner.

ba946a9d

[XLA:GPU] Clang-tidy cleanup for xla/service/broadcast_canonicalizer.cc

701f78b0

[XLA:GPU] Clang-tidy cleanup for xla/service/buffer_assignment.h

4203409f

Add wrapper for building reduce-window HLO using a binary operation's…

dbc7a4ac

PR #12928: [ROCM] Updated fp8 matmul with adjustments for updated hip…

5e39d5ce

[XLA:GPU] Use radix sort in place of classic sort for TopK if input s…

be6bfff2

[XLA:GPU] Clang-tidy cleanup for xla/service/call_graph.h

34a8b4c3

Fix TSAN issue in interpreter Stream.

0ef3cc94

Stop using xla/statusor.h now that it just contains an alias for absl…

0d541c30

Fix a typo and add corresponding tests. The typo essentially reversed…

6688a28e

Integrate LLVM at llvm/llvm-project@c07be08df573

525d163f

Move `::mlir::lite::QuantizeWeights` from `TfLiteStatus` to `absl::St…

a76a162a

Allow strategies for slice ops where the sliced dimensions could be s…

5bfe6d08

Integrate StableHLO at openxla/stablehlo@61826746

eea84120

Automated Code Change

04938cc9

Automated Code Change

a025656d

compat: Update forward compatibility horizon to 2024-06-22

342af9b5

Update GraphDef version to 1901.

a2c59354

Automated Code Change

09a5049b

Automated Code Change

0e07c958

Adds an option to enable / disable post-processing.

0f0409a7

Add GetUniqueGteInstruction to hlo_query utility file.

069e0516

Automated Code Change

0ea7f4ae

[xla:cpu] Add thunk_testlib for writing tests for thunks

d608037a

[xla:cpu] Add WhileThunk test

bd2b781b

[xla:cpu] Add ReplicaId thunk test

e371313c

[xla:cpu] Add extern templates for Conv2D and Conv3D.

d7602ced

[Gradients] Tag constant zero tensors for outputs with no gradient wi…

eb1f2b41

Update GraphDef version to 1902.

a4fbe1b3

compat: Update forward compatibility horizon to 2024-06-23

efd51a9e

[XLA:GPU] Set reduce_window_rewrite_base_length to 16 by default

841fecad

PR #10301: [XLA:CPU][oneDNN] Convolution XLA HLO Pattern Matcher with…

bea53c9a

Automated Code Change

88980c0d

Add missing `const` qualifier in `tflite::Subgraph`.

24d85f37

[xla:cpu] Add benchmark for op `gather`

4e518efe

Update GraphDef version to 1903.

e5fc0100

compat: Update forward compatibility horizon to 2024-06-24

ab8ac1fd

Disable Zapfhahn for tests that time out.

84587690

[XLA:GPU] Remove unused function in `triton_support_test`

94728ca5

Integrate LLVM at llvm/llvm-project@e5a41f0afc15

b6744034

Integrate LLVM at llvm/llvm-project@5cd0ba30f53d

347de0e0

Merge remote-tracking branch 'upstream/master' into develop-upstream-…

dc439ed0

Fix merge conflicts

f8a8e75e

PR #14017: [ROCm] Fix Build break due to f4212dc and 0f75900

285ccc38

Re-enable fixed HLO tests

096d7f11

Enable dot_algorithm_support_test and determinism_test

e9be6470

Enable dot tests

5d5d09cc

Disable determinism_test due to https://github.com/openxla/xla/pull/1…

6a78b6c2

Disable triangular_solve_test

075b5ba4

Fix reduce_large_row_to_scalar.hlo.test

60b8a3e7

Fix failing gpu_kernel_tiling_test subtests

7b52eab3

Disable dot tests due to https://github.com/ROCm/frameworks-internal/…

150bb37e

mmakevic-amd force pushed from 15233ac2 to 150bb37e 1 year ago

i-chaochen requested a review from

i-chaochen 1 year ago

i-chaochen requested a review from

hsharsha 1 year ago

hsharsha approved these changes on 2024-07-29

hsharsha merged f1d1afdb into develop-upstream 1 year ago

tensorflow-upstream
Develop upstream sync 240624
#2569

Merged

Develop upstream sync 240624 #2569

tensorflow-upstream Develop upstream sync 240624 #2569 Merged

Develop upstream sync 240624 #2569

tensorflow-upstream
Develop upstream sync 240624
#2569

Merged