tensorflow-upstream
Develop upstream sync 240624
#2569
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
829
Changes
View On
GitHub
Develop upstream sync 240624
#2569
hsharsha
merged 829 commits into
develop-upstream
from
develop-upstream-sync-240624
[XLA:GPU] Enable new mlir loop emitter by default.
38dd164b
[xla:cpu] Add `FftThunk`
9128be09
Add test case for 1D convolution
cf31ac04
PR #13310: [NVIDIA GPU] Added a rewrite logic in gpu_windowned_einsum…
ba451bde
PR #13866: [ROCm] Handle disabled backends for AMD case
ae6f3d49
Remove unnecessary paths from patch file.
d4ac3610
compat: Update forward compatibility horizon to 2024-06-18
df0c5fc2
Update GraphDef version to 1897.
cfeaaff9
[XLA:GPU] Disable running `gpu_cub_sort_test` in debug mode to avoid …
dc765168
[XLA:GPU] Add initial version of cost model for tiled hlo.
cd75b111
[XLA:GPU] Add `bitcast` and `reshape` to the list of "passthrough" op…
c395b48c
[XLA:GPU] Add constraints to `SymbolicTileAnalysis`.
aa29a22f
Integrate LLVM at llvm/llvm-project@93ffe1792fd9
19b82d6a
[XLA:GPU] Use priority fusion in TritonGemmAutotunerExtractor.
4b120441
Stop using xla/statusor.h now that it just contains an alias for absl…
f1a240d5
[XLA:GPU] Add initial SymbolicTileAnalysis::GetGoodTilings implementa…
44cb866c
PR #13781: [GPU] Let the on-disk kernel compilation cache grow.
5cfe3aca
[XLA:GPU] Support tiling Softmax example
85f91e81
[XLA:GPU][NFC] Move GPU specific latency estimator to a separate file.
db5c5699
[XLA:GPU] Use absl::Span instead of std::vector to pass tile sizes.
c4a89ad0
Move BlockedSparseToMMA pattern from Triton to XLA.
85b90521
[XLA:GPU][NFC] Replace `bitcast`s with `reshape`s in `symbolic_tile_t…
1107c80e
[XLA:GPU] Fall back to cuBLASlt call in autotuner when it makes sense
528cff79
Fix bug in array type conversion util
e23a7194
[XLA:GPU][MLIR-based emitters] Kill thread tiling for MlirColumnReduce.
65550eb4
Temporarily disable cudnn algorithm 14 for all shapes
686e352b
Internal change only.
5555ec62
Move `InferDotOperandSharding` from `sharding_propagation.cc` to `hlo…
40856334
Integrate LLVM at llvm/llvm-project@52d87de7a42d
6e6641a6
[XLA:GPU] Experimental: Add --xla_gpu_per_fusion_autotune_cache_dir o…
221220f1
Adopt ConvertMlirHloToHloModule instead of passing in proto in PJRT
8f9cee48
[XLA:GPU][MLIR-based emitters] Add more tests for row reduction index…
8c8fb8fd
Re-enable tensorflow/compiler/tests:async_comp_test_gpu test
1ae6cf22
[XLA:GPU] Replace TritonFusionAnalysis with SymbolicTileAnalysis in P…
bcbab524
Change default permissions for github actions workflows to satisfy se…
42860aac
Clean up TF deps tf_to_xla_attribute_utils
042e9b85
Propagate SDPA as a StableHLO composite op instead of converting it t…
cc182302
Support array attribute in vhlo seraialization.
69214438
Remove the deprecated PjRtClient::LookupAddressableDevice() that take…
6156204d
Move StreamExecutor::MemZero processing completely into Stream and it…
84910848
[NFC]xla_compile: Report the Status in the CompilationResult even whe…
4fe124f5
[xla:cpu] Add support for single replica all-reduce
5fb87afc
Reverts fee3bfc812780f9c01a4fd936f69562a6884582a
5f3eacb0
Upstream flatbuffer utils to read big models
40ee6456
[XLA:GPU] Redirect all Triton normalization fusions to the new generi…
487d60b0
[XLA:GPU] Remove trailing references to deprecated `kTritonSoftmaxFus…
8f025afe
Reshard LHS and RHS to match output sharding by default to handle dot…
58bf3f5d
Update tf_type shape to print 0 sized dimensions as 00 to avoid any p…
3ea9253b
Explicitly reset tf::SavedModelBundle after done with using it. This …
a35ca23f
[XLA:GPU] Fix OSS dependency on protobuf descriptor.
7a3fbe7b
Remove an unused parameter
e4d0a29e
[XLA] Add shardings for implicit operands and return values of CaseOp…
964fae6c
Use newer version of scorecards-analysis for XLA and TensorFlow
25099d18
Remove use of deprecated op UnaryEinsum.
3eb2492b
Internal BUILD change
cb736a66
Integrate StableHLO at openxla/stablehlo@f1f49945
17228941
Fix variable name.
df114d8d
Add an option to enable shardings where a tensor dim is sharded acros…
ffede056
Set a valid minSdkVersion in dummy manifest.
c50b0dea
Replace cpu and os based selects with platform based selects
c0e79dad
[XLA:GPU] Clang-tidy cleanup for xla/service/ar_crs_combiner.{cc,h}
a1a5b8eb
[XLA:GPU] Clang-tidy cleanup for xla/service/allocation_tracker.cc
ad29fd64
Use jax AOT APIs instead of deprecated jax.xla_computation.
ed006d09
Integrate LLVM at llvm/llvm-project@b99d0b344001
e98b73df
Allow using ifrt_client() as a shared_ptr from PyClient.
e13d7ea9
[XLA:GPU[Rocm] Fix missing import in `ir_emitter_triton_rocm.cc`.
c625df7f
[XLA:GPU] Fetch `EnumDescriptor` utils from `tsl::protobuf` in `trito…
e7e72734
Reverts c0e79dad82e082a2530e82cae7db22a5164effc8
8c71440e
Automated Code Change
b9359c0c
Add test for broadcast of constant.
e68ffab6
Stop using xla/statusor.h now that it just contains an alias for absl…
8f3c0c08
Stop using xla/statusor.h now that it just contains an alias for absl…
28d7d5ab
Integrate Triton up to [6110b0b](https://github.com/openai/triton/com…
99ff62de
compat: Update forward compatibility horizon to 2024-06-19
cfbd0f15
Update GraphDef version to 1898.
654d9625
We have received reports of compiler hangs which we are investigating.
51f415c7
Disable `unary_ops_test` on Arm64 machines
ca60e393
Make :redzone_allocator_kernel_cuda a cc_library
a2db2afa
[XLA:GPU] Two minor cleanups.
80059e48
Automated Code Change
eaef53bb
[XLA:GPU] [NFC] Remove argument which is never passed
cdafce89
[XLA:GPU] Simplify TritonSupport tests by providing a standard ENTRY …
160e7608
[XLA] Remove proto-based communication for service/client
c72dbfc1
Automated Code Change
b1f94db1
Refactor llvm_compiler_test.
89a47214
[XLA:GPU] [NFC] Remove redundant argument to GetKernelAnnotation
ab3720c8
Fix ASAN error with double -> X float conversions.
7c487a38
[XLA:GPU] More consistent error handling for borrowed streams
1571f0a6
[JAX/XLA] Correct the logic for showing stack traces on JAX_TRACEBACK…
df2dae48
Integrate LLVM at llvm/llvm-project@99c43e3ce314
32aa8bb0
[XLA:GPU] Remove dependency from `triton_support_test.cc` on `TritonF…
542172b0
[XLA:GPU] A test used a literal out of bounds.
353e39e0
[XLA:GPU] Remove all SoftMax-related support from legacy Triton infra…
3685c0a6
Stop using xla/statusor.h now that it just contains an alias for absl…
60e8d960
Avoid underflow in f2reduce
d87bc7ee
[XLA:GPU][NFC] Add missing `TODO` in `SymbolicTileTest.CanPropagateTi…
8e5c47c6
[XLA:GPU] Add a method to Cost Model estimate the best tiling for a f…
d1766d92
[XLA] [NFC] Unify multi-host handling for hlo_runner_main
62f235dc
[XLA:GPU] Support tiling "softmax and reduce" example
21cb1e0b
Reverts df2dae48118d675fb92cf51f42aa6abdb391cedc
4347a69f
[XLA:GPU] Add num_warps to BlockLevelFusionConfig and a method to con…
1d4b49fb
[xla:cpu] Add support for multi-process all-reduce collective
c2caec22
Fix the aggregation of power metrics.
2ab9b0ab
[xla:cpu] Add ReplicaId thunk
b92ced39
[xla:cpu] Add support for ReduceScatter thunk
f84430a8
[xla:cpu] Add support for AllGather thunk
185483bb
Copy definition of tflite::ControlEdges to flatbuffer_export.cc.
59d26954
Stop using xla/statusor.h now that it just contains an alias for absl…
3ee9d16f
Automated Code Change
aaebeb86
PR #13603: NVTX: name threads, CUDA devices and CUDA streams
9b12cd1d
Make GPU PJRT tests xla_tests
464f895b
XNNPack MEAN F32 supports all reduction types
36947e0d
compat: Update forward compatibility horizon to 2024-06-20
6cf72fef
Update GraphDef version to 1899.
8ee8ff8d
Clean up some sentinel `-1`s.
75adcb5e
Reverts 9b12cd1d4053f8b93128eec529956ea7521fe63d
acc8b5b9
Delete flags xla_gpu_max_mlir_kernels and xla_gpu_skip_mlir_kernels
d7609726
PR #13831: [GPU] Improve dumping of GEMM fusions.
a69eff2e
PR #13340: [ROCm] Add Swizzle instruction support for mi100+ in reduc…
c72085d4
[JAX] Fix FDO profile deserialization.
03850ed2
[xla:cpu] Don't forget to include buffer branch index buffer in condi…
5350f439
PR #13555: Fix _xla_send_recv_validation in collective pipeliner
ed4deb88
Fix SDPA testing on different devices.
2952f336
Add unit tests for CanEmitFusedDynamicUpdateSliceInPlaceForGpu().
350ecac0
Integrate LLVM at llvm/llvm-project@e5b0c210cc4c
4ddfa6ab
Reverts 4347a69f8985f9777fc9b92a02c86d6a5e23f737
c80f6733
[XLA] Fix up the behavior for grabbing extra streams.
2c0f46d7
Stop using xla/statusor.h now that it just contains an alias for absl…
453db222
[XLA] Remove dead unused pass propagate_static_shapes
93fcc5ee
[XLA:GPU] Use Cost Model to choose tile sizes in SoftmaxRewriterTriton.
f05e4339
Only split those constants that are shared between manually and autom…
45f67b2c
Move StreamExecutor::Memset32 into Stream and its derived classes.
9e0fce7d
Prevents linspace from generating nans for F8 types.
9e43d079
Disable wgmma support in XLA, since it is causing huge compile time r…
a16c577a
[XLA] [NFC] Use a single SequentialThunk to communicate a sequence of…
9024f026
Call ShapeUtil::ByteSizeOfElements instead of a copy of the function …
546829c2
[xla:cpu] Don't run collective test with thunks, not all thunks are r…
af52806c
Migrate usage of schema_conversion_utils.
a0d4b376
Stop using xla/statusor.h now that it just contains an alias for absl…
4fafa4eb
Support quantized per-tensor type for MHLO Ceil/Floor Ops.
70d2f87b
[IFRT] Add AttributeMap
2e76b081
Move StreamExecutor::Memcpy processing for to-device copies completel…
61845939
[XLA:GPU] Support mocking away all collectives
0939cebb
Don't reject F32 non 4D input tensors for float. XNNPack can handle t…
1fa88496
Stop using xla/statusor.h now that it just contains an alias for absl…
d7925898
PR #13985: Removing spurious `option go_package` from autotuning.proto
1dfff9d6
Return absl::Status not TFLiteStatus from ::tflite::optimize::Quantiz…
444d8bdf
Move StreamExecutor::Memcpy to-host processing completely to Stream a…
b1587f62
[xla:cpu] Add support for AllToAll thunk
f4d5a7ab
[IFRT] Add PjRt<->IFRT attribute map conversion utility functions
77fbee0f
Stop using xla/statusor.h now that it just contains an alias for absl…
c02a0c5c
[xla:hlo][NFC] Fix dims in a comment to match the size of reshape_dims.
1789531f
[mhlo] Remove UnaryEinsumOp from MHLO
79ead0e9
Update the `curl` dependency: 8.4.0 -> 8.6.0.
5f2a16c9
[xla:gpu] Rename collective_ops_test_e2e to conform with Google's tes…
f34a29f6
Move StreamExecutor::MemcpyDeviceToDevice processing into Stream and …
9b2e9eeb
Delete translate directory ConvertMlirToGraphDef.
61dc6fb8
Replace string with std::string in quantize_model_test.cc
d7913e7d
Brings back one usage of GetCorrectedUseTime() to improve code reuse
e7218705
[xla:cpu] Add support for CollectivePermute thunk
5fe8d891
[IFRT Proxy] Run client and backend tests with all supported protocol…
d5b1a3db
Minor cleanups to #includes etc. in sample_stable_delegate.
8c81a27f
Fix formatting in `tensorflow/python`
5483b604
[JAX] Teach jit fast path how to handle negative static_argnums corre…
4f2091da
Implements the `layout` method for the BasicStringArray class.
7be686d7
PR #13971: [ROCm] Fixed build break caused by https://github.com/open…
cf23a3e8
Stop using xla/statusor.h now that it just contains an alias for absl…
5bcf657f
[IFRT] Move MemoryKind attribute from IfrtShardingAttrInterface to If…
bc30cae2
[xla:cpu] Add CustomCall thunk.
43b64083
[XLA:Python:JAX] Add a method jax_jit.parse_arguments and a class jax…
f69fafeb
[IFRT] Add ifrt.CopyArrays op to IFRT IR.
0d801cd2
Replace EXPECT_OK with TF_EXPECT_OK
6f72e32c
Simplify div simplification.
6fd81ed5
Stop using xla/statusor.h now that it just contains an alias for absl…
003e21fa
[xla:cpu] Add support for convolution thunks
7fbc7481
compat: Update forward compatibility horizon to 2024-06-21
efbe0865
Update GraphDef version to 1900.
9d037686
[XLA:GPU] Add HloFindAll to hlo_traversal to find all nodes matching …
87c79a19
Unify constant folding for affine expressions.
c3890062
[XLA:GPU] Fix broken build
b45bc2e7
Introduce nested tuple support in FFI
8d71548c
[XLA:GPU] Remove the requirement to run on a machine with a GPU when …
e15904cc
Fix simplification of a//b//c.
57cfb204
Remove unnecessary dependencies.
9aea1b91
NFC: Replace RemoveSummands with MapSummands.
e193e941
Check input channels before delegating
d566f8de
PR #13856: Extend FFI DataType with FP8 Types
e73001e2
[XLA:GPU] Fix integer overflow issues in Cost Model and Symbolic Tile…
f11f5e37
PR #13757: [XLA:GPU] Upgrade cuDNN frontend to 1.5
6dda17ce
[XLA] Do not validate the operand layout constraint of LayoutConstrai…
a8c6cbd5
[XLA:GPU] Clang-tidy cleanup for xla/service/ar_crs_combiner_test.cc
32f6b488
[XLA:GPU] Clang-tidy cleanup for xla/service/bfloat16_conversion_fold…
5ad34022
[XLA:GPU] Parametrize Triton support tests by data type and device type.
9f0e241c
Introduce utility function GetPrevAllowedBatchSize.
a9ab5056
[XLA:GPU] Introduce `ConstraintExpression` to hold generalized constr…
a892e211
[XLA:GPU] Clang-tidy cleanup for xla/service/bfloat16_propagation_tes…
c26e492f
Move metadata_util to tensorflow/compiler/mlir/lite/experimental/remat
abcaba2e
Stop using xla/statusor.h now that it just contains an alias for absl…
ff06f4d7
[XLA] Support broadcast as a formatting op in collective pipeliner.
ba946a9d
[XLA:GPU] Clang-tidy cleanup for xla/service/broadcast_canonicalizer.cc
701f78b0
[XLA:GPU] Clang-tidy cleanup for xla/service/buffer_assignment.h
4203409f
Add wrapper for building reduce-window HLO using a binary operation's…
dbc7a4ac
PR #12928: [ROCM] Updated fp8 matmul with adjustments for updated hip…
5e39d5ce
[XLA:GPU] Use radix sort in place of classic sort for TopK if input s…
be6bfff2
[XLA:GPU] Clang-tidy cleanup for xla/service/call_graph.h
34a8b4c3
Fix TSAN issue in interpreter Stream.
0ef3cc94
Stop using xla/statusor.h now that it just contains an alias for absl…
0d541c30
Fix a typo and add corresponding tests. The typo essentially reversed…
6688a28e
Integrate LLVM at llvm/llvm-project@c07be08df573
525d163f
Move `::mlir::lite::QuantizeWeights` from `TfLiteStatus` to `absl::St…
a76a162a
Allow strategies for slice ops where the sliced dimensions could be s…
5bfe6d08
Integrate StableHLO at openxla/stablehlo@61826746
eea84120
Automated Code Change
04938cc9
Automated Code Change
a025656d
compat: Update forward compatibility horizon to 2024-06-22
342af9b5
Update GraphDef version to 1901.
a2c59354
Automated Code Change
09a5049b
Automated Code Change
0e07c958
Adds an option to enable / disable post-processing.
0f0409a7
Add GetUniqueGteInstruction to hlo_query utility file.
069e0516
Automated Code Change
0ea7f4ae
[xla:cpu] Add thunk_testlib for writing tests for thunks
d608037a
[xla:cpu] Add WhileThunk test
bd2b781b
[xla:cpu] Add ReplicaId thunk test
e371313c
[xla:cpu] Add extern templates for Conv2D and Conv3D.
d7602ced
[Gradients] Tag constant zero tensors for outputs with no gradient wi…
eb1f2b41
Update GraphDef version to 1902.
a4fbe1b3
compat: Update forward compatibility horizon to 2024-06-23
efd51a9e
[XLA:GPU] Set reduce_window_rewrite_base_length to 16 by default
841fecad
PR #10301: [XLA:CPU][oneDNN] Convolution XLA HLO Pattern Matcher with…
bea53c9a
Automated Code Change
88980c0d
Add missing `const` qualifier in `tflite::Subgraph`.
24d85f37
[xla:cpu] Add benchmark for op `gather`
4e518efe
Update GraphDef version to 1903.
e5fc0100
compat: Update forward compatibility horizon to 2024-06-24
ab8ac1fd
Disable Zapfhahn for tests that time out.
84587690
[XLA:GPU] Remove unused function in `triton_support_test`
94728ca5
Integrate LLVM at llvm/llvm-project@e5a41f0afc15
b6744034
Integrate LLVM at llvm/llvm-project@5cd0ba30f53d
347de0e0
Merge remote-tracking branch 'upstream/master' into develop-upstream-…
dc439ed0
Fix merge conflicts
f8a8e75e
PR #14017: [ROCm] Fix Build break due to f4212dc and 0f75900
285ccc38
Re-enable fixed HLO tests
096d7f11
Enable dot_algorithm_support_test and determinism_test
e9be6470
Enable dot tests
5d5d09cc
Disable determinism_test due to https://github.com/openxla/xla/pull/1…
6a78b6c2
Disable triangular_solve_test
075b5ba4
Fix reduce_large_row_to_scalar.hlo.test
60b8a3e7
Fix failing gpu_kernel_tiling_test subtests
7b52eab3
Disable dot tests due to https://github.com/ROCm/frameworks-internal/…
150bb37e
mmakevic-amd
force pushed
from
15233ac2
to
150bb37e
1 year ago
i-chaochen
requested a review
from
i-chaochen
1 year ago
i-chaochen
requested a review
from
hsharsha
1 year ago
hsharsha
approved these changes on 2024-07-29
hsharsha
merged
f1d1afdb
into develop-upstream
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
hsharsha
i-chaochen
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub