tensorflow-upstream
Develop upstream sync 240624
#2569
Merged

Develop upstream sync 240624 #2569

mmakevic-amd
akuegel [XLA:GPU] Enable new mlir loop emitter by default.
38dd164b
heinsaar [xla:cpu] Add `FftThunk`
9128be09
Adam-Banas Add test case for 1D convolution
cf31ac04
Tixxx PR #13310: [NVIDIA GPU] Added a rewrite logic in gpu_windowned_einsum…
ba451bde
hsharsha PR #13866: [ROCm] Handle disabled backends for AMD case
ae6f3d49
chsigg Remove unnecessary paths from patch file.
d4ac3610
tensorflower-gardener compat: Update forward compatibility horizon to 2024-06-18
df0c5fc2
tensorflower-gardener Update GraphDef version to 1897.
cfeaaff9
bchetioui [XLA:GPU] Disable running `gpu_cub_sort_test` in debug mode to avoid …
dc765168
olegshyshkov [XLA:GPU] Add initial version of cost model for tiled hlo.
cd75b111
bchetioui [XLA:GPU] Add `bitcast` and `reshape` to the list of "passthrough" op…
c395b48c
bchetioui [XLA:GPU] Add constraints to `SymbolicTileAnalysis`.
aa29a22f
tensorflower-gardener Integrate LLVM at llvm/llvm-project@93ffe1792fd9
19b82d6a
olegshyshkov [XLA:GPU] Use priority fusion in TritonGemmAutotunerExtractor.
4b120441
klucke Stop using xla/statusor.h now that it just contains an alias for absl…
f1a240d5
tdanyluk [XLA:GPU] Add initial SymbolicTileAnalysis::GetGoodTilings implementa…
44cb866c
sergachev PR #13781: [GPU] Let the on-disk kernel compilation cache grow.
5cfe3aca
tdanyluk [XLA:GPU] Support tiling Softmax example
85f91e81
golechwierowicz [XLA:GPU][NFC] Move GPU specific latency estimator to a separate file.
db5c5699
olegshyshkov [XLA:GPU] Use absl::Span instead of std::vector to pass tile sizes.
c4a89ad0
chsigg Move BlockedSparseToMMA pattern from Triton to XLA.
85b90521
bchetioui [XLA:GPU][NFC] Replace `bitcast`s with `reshape`s in `symbolic_tile_t…
1107c80e
mooskagh [XLA:GPU] Fall back to cuBLASlt call in autotuner when it makes sense
528cff79
tensorflower-gardener Fix bug in array type conversion util
e23a7194
pifon2a [XLA:GPU][MLIR-based emitters] Kill thread tiling for MlirColumnReduce.
65550eb4
beckerhe Temporarily disable cudnn algorithm 14 for all shapes
686e352b
tensorflower-gardener Internal change only.
5555ec62
ZixuanJiang Move `InferDotOperandSharding` from `sharding_propagation.cc` to `hlo…
40856334
tensorflower-gardener Integrate LLVM at llvm/llvm-project@52d87de7a42d
6e6641a6
tdanyluk [XLA:GPU] Experimental: Add --xla_gpu_per_fusion_autotune_cache_dir o…
221220f1
GleasonK Adopt ConvertMlirHloToHloModule instead of passing in proto in PJRT
8f9cee48
pifon2a [XLA:GPU][MLIR-based emitters] Add more tests for row reduction index…
8c8fb8fd
frgossen Re-enable tensorflow/compiler/tests:async_comp_test_gpu test
1ae6cf22
olegshyshkov [XLA:GPU] Replace TritonFusionAnalysis with SymbolicTileAnalysis in P…
bcbab524
ddunl Change default permissions for github actions workflows to satisfy se…
42860aac
tensorflower-gardener Clean up TF deps tf_to_xla_attribute_utils
042e9b85
qukhan Propagate SDPA as a StableHLO composite op instead of converting it t…
cc182302
sirakiin Support array attribute in vhlo seraialization.
69214438
changhuilin Remove the deprecated PjRtClient::LookupAddressableDevice() that take…
6156204d
klucke Move StreamExecutor::MemZero processing completely into Stream and it…
84910848
pizzud [NFC]xla_compile: Report the Status in the CompilationResult even whe…
4fe124f5
ezhulenev [xla:cpu] Add support for single replica all-reduce
5fb87afc
tensorflower-gardener Reverts fee3bfc812780f9c01a4fd936f69562a6884582a
5f3eacb0
paulinesho Upstream flatbuffer utils to read big models
40ee6456
bchetioui [XLA:GPU] Redirect all Triton normalization fusions to the new generi…
487d60b0
bchetioui [XLA:GPU] Remove trailing references to deprecated `kTritonSoftmaxFus…
8f025afe
ZixuanJiang Reshard LHS and RHS to match output sharding by default to handle dot…
58bf3f5d
tensorflower-gardener Update tf_type shape to print 0 sized dimensions as 00 to avoid any p…
3ea9253b
vamsimanchala Explicitly reset tf::SavedModelBundle after done with using it. This …
a35ca23f
bchetioui [XLA:GPU] Fix OSS dependency on protobuf descriptor.
7a3fbe7b
tensorflower-gardener Remove an unused parameter
e4d0a29e
tensorflower-gardener [XLA] Add shardings for implicit operands and return values of CaseOp…
964fae6c
ddunl Use newer version of scorecards-analysis for XLA and TensorFlow
25099d18
GleasonK Remove use of deprecated op UnaryEinsum.
3eb2492b
jingpu Internal BUILD change
cb736a66
sdasgup3 Integrate StableHLO at openxla/stablehlo@f1f49945
17228941
impjdi Fix variable name.
df114d8d
tensorflower-gardener Add an option to enable shardings where a tensor dim is sharded acros…
ffede056
timpeut Set a valid minSdkVersion in dummy manifest.
c50b0dea
tensorflower-gardener Replace cpu and os based selects with platform based selects
c0e79dad
kuym [XLA:GPU] Clang-tidy cleanup for xla/service/ar_crs_combiner.{cc,h}
a1a5b8eb
kuym [XLA:GPU] Clang-tidy cleanup for xla/service/allocation_tracker.cc
ad29fd64
yashk2810 Use jax AOT APIs instead of deprecated jax.xla_computation.
ed006d09
tensorflower-gardener Integrate LLVM at llvm/llvm-project@b99d0b344001
e98b73df
pschuh Allow using ifrt_client() as a shared_ptr from PyClient.
e13d7ea9
bchetioui [XLA:GPU[Rocm] Fix missing import in `ir_emitter_triton_rocm.cc`.
c625df7f
bchetioui [XLA:GPU] Fetch `EnumDescriptor` utils from `tsl::protobuf` in `trito…
e7e72734
tensorflower-gardener Reverts c0e79dad82e082a2530e82cae7db22a5164effc8
8c71440e
tensorflower-gardener Automated Code Change
b9359c0c
akuegel Add test for broadcast of constant.
e68ffab6
klucke Stop using xla/statusor.h now that it just contains an alias for absl…
8f3c0c08
klucke Stop using xla/statusor.h now that it just contains an alias for absl…
28d7d5ab
chsigg Integrate Triton up to [6110b0b](https://github.com/openai/triton/com…
99ff62de
tensorflower-gardener compat: Update forward compatibility horizon to 2024-06-19
cfbd0f15
tensorflower-gardener Update GraphDef version to 1898.
654d9625
jreiffers We have received reports of compiler hangs which we are investigating.
51f415c7
tensorflower-gardener Disable `unary_ops_test` on Arm64 machines
ca60e393
beckerhe Make :redzone_allocator_kernel_cuda a cc_library
a2db2afa
dimitar-asenov [XLA:GPU] Two minor cleanups.
80059e48
tensorflower-gardener Automated Code Change
eaef53bb
cheshire [XLA:GPU] [NFC] Remove argument which is never passed
cdafce89
dimitar-asenov [XLA:GPU] Simplify TritonSupport tests by providing a standard ENTRY …
160e7608
cheshire [XLA] Remove proto-based communication for service/client
c72dbfc1
tensorflower-gardener Automated Code Change
b1f94db1
akuegel Refactor llvm_compiler_test.
89a47214
cheshire [XLA:GPU] [NFC] Remove redundant argument to GetKernelAnnotation
ab3720c8
jreiffers Fix ASAN error with double -> X float conversions.
7c487a38
cheshire [XLA:GPU] More consistent error handling for borrowed streams
1571f0a6
cheshire [JAX/XLA] Correct the logic for showing stack traces on JAX_TRACEBACK…
df2dae48
tensorflower-gardener Integrate LLVM at llvm/llvm-project@99c43e3ce314
32aa8bb0
bchetioui [XLA:GPU] Remove dependency from `triton_support_test.cc` on `TritonF…
542172b0
mooskagh [XLA:GPU] A test used a literal out of bounds.
353e39e0
bchetioui [XLA:GPU] Remove all SoftMax-related support from legacy Triton infra…
3685c0a6
klucke Stop using xla/statusor.h now that it just contains an alias for absl…
60e8d960
gflegar Avoid underflow in f2reduce
d87bc7ee
bchetioui [XLA:GPU][NFC] Add missing `TODO` in `SymbolicTileTest.CanPropagateTi…
8e5c47c6
olegshyshkov [XLA:GPU] Add a method to Cost Model estimate the best tiling for a f…
d1766d92
cheshire [XLA] [NFC] Unify multi-host handling for hlo_runner_main
62f235dc
tdanyluk [XLA:GPU] Support tiling "softmax and reduce" example
21cb1e0b
hawkinsp Reverts df2dae48118d675fb92cf51f42aa6abdb391cedc
4347a69f
olegshyshkov [XLA:GPU] Add num_warps to BlockLevelFusionConfig and a method to con…
1d4b49fb
ezhulenev [xla:cpu] Add support for multi-process all-reduce collective
c2caec22
lionelfeng Fix the aggregation of power metrics.
2ab9b0ab
ezhulenev [xla:cpu] Add ReplicaId thunk
b92ced39
ezhulenev [xla:cpu] Add support for ReduceScatter thunk
f84430a8
ezhulenev [xla:cpu] Add support for AllGather thunk
185483bb
tensorflower-gardener Copy definition of tflite::ControlEdges to flatbuffer_export.cc.
59d26954
klucke Stop using xla/statusor.h now that it just contains an alias for absl…
3ee9d16f
tensorflower-gardener Automated Code Change
aaebeb86
olupton PR #13603: NVTX: name threads, CUDA devices and CUDA streams
9b12cd1d
beckerhe Make GPU PJRT tests xla_tests
464f895b
alankelly XNNPack MEAN F32 supports all reduction types
36947e0d
tensorflower-gardener compat: Update forward compatibility horizon to 2024-06-20
6cf72fef
tensorflower-gardener Update GraphDef version to 1899.
8ee8ff8d
jreiffers Clean up some sentinel `-1`s.
75adcb5e
akuegel Reverts 9b12cd1d4053f8b93128eec529956ea7521fe63d
acc8b5b9
akuegel Delete flags xla_gpu_max_mlir_kernels and xla_gpu_skip_mlir_kernels
d7609726
sergachev PR #13831: [GPU] Improve dumping of GEMM fusions.
a69eff2e
Ruturaj4 PR #13340: [ROCm] Add Swizzle instruction support for mi100+ in reduc…
c72085d4
golechwierowicz [JAX] Fix FDO profile deserialization.
03850ed2
ezhulenev [xla:cpu] Don't forget to include buffer branch index buffer in condi…
5350f439
shraiysh PR #13555: Fix _xla_send_recv_validation in collective pipeliner
ed4deb88
qukhan Fix SDPA testing on different devices.
2952f336
akuegel Add unit tests for CanEmitFusedDynamicUpdateSliceInPlaceForGpu().
350ecac0
tensorflower-gardener Integrate LLVM at llvm/llvm-project@e5b0c210cc4c
4ddfa6ab
cheshire Reverts 4347a69f8985f9777fc9b92a02c86d6a5e23f737
c80f6733
cheshire [XLA] Fix up the behavior for grabbing extra streams.
2c0f46d7
klucke Stop using xla/statusor.h now that it just contains an alias for absl…
453db222
akuegel [XLA] Remove dead unused pass propagate_static_shapes
93fcc5ee
olegshyshkov [XLA:GPU] Use Cost Model to choose tile sizes in SoftmaxRewriterTriton.
f05e4339
tensorflower-gardener Only split those constants that are shared between manually and autom…
45f67b2c
klucke Move StreamExecutor::Memset32 into Stream and its derived classes.
9e0fce7d
tensorflower-gardener Prevents linspace from generating nans for F8 types.
9e43d079
gflegar Disable wgmma support in XLA, since it is causing huge compile time r…
a16c577a
cheshire [XLA] [NFC] Use a single SequentialThunk to communicate a sequence of…
9024f026
tensorflower-gardener Call ShapeUtil::ByteSizeOfElements instead of a copy of the function …
546829c2
ezhulenev [xla:cpu] Don't run collective test with thunks, not all thunks are r…
af52806c
tensorflower-gardener Migrate usage of schema_conversion_utils.
a0d4b376
klucke Stop using xla/statusor.h now that it just contains an alias for absl…
4fafa4eb
sdasgup3 Support quantized per-tensor type for MHLO Ceil/Floor Ops.
70d2f87b
hyeontaek [IFRT] Add AttributeMap
2e76b081
klucke Move StreamExecutor::Memcpy processing for to-device copies completel…
61845939
cheshire [XLA:GPU] Support mocking away all collectives
0939cebb
alankelly Don't reject F32 non 4D input tensors for float. XNNPack can handle t…
1fa88496
klucke Stop using xla/statusor.h now that it just contains an alias for absl…
d7925898
janpfeifer PR #13985: Removing spurious `option go_package` from autotuning.proto
1dfff9d6
pak-laura Return absl::Status not TFLiteStatus from ::tflite::optimize::Quantiz…
444d8bdf
klucke Move StreamExecutor::Memcpy to-host processing completely to Stream a…
b1587f62
ezhulenev [xla:cpu] Add support for AllToAll thunk
f4d5a7ab
hyeontaek [IFRT] Add PjRt<->IFRT attribute map conversion utility functions
77fbee0f
klucke Stop using xla/statusor.h now that it just contains an alias for absl…
c02a0c5c
bixia1 [xla:hlo][NFC] Fix dims in a comment to match the size of reshape_dims.
1789531f
GleasonK [mhlo] Remove UnaryEinsumOp from MHLO
79ead0e9
tensorflower-gardener Update the `curl` dependency: 8.4.0 -> 8.6.0.
5f2a16c9
penpornk [xla:gpu] Rename collective_ops_test_e2e to conform with Google's tes…
f34a29f6
klucke Move StreamExecutor::MemcpyDeviceToDevice processing into Stream and …
9b2e9eeb
rocketas Delete translate directory ConvertMlirToGraphDef.
61dc6fb8
tensorflower-gardener Replace string with std::string in quantize_model_test.cc
d7913e7d
mehrdadkhani Brings back one usage of GetCorrectedUseTime() to improve code reuse
e7218705
ezhulenev [xla:cpu] Add support for CollectivePermute thunk
5fe8d891
hyeontaek [IFRT Proxy] Run client and backend tests with all supported protocol…
d5b1a3db
fergushenderson Minor cleanups to #includes etc. in sample_stable_delegate.
8c81a27f
ddunl Fix formatting in `tensorflow/python`
5483b604
hawkinsp [JAX] Teach jit fast path how to handle negative static_argnums corre…
4f2091da
tensorflower-gardener Implements the `layout` method for the BasicStringArray class.
7be686d7
zoranjovanovic-ns PR #13971: [ROCm] Fixed build break caused by https://github.com/open…
cf23a3e8
klucke Stop using xla/statusor.h now that it just contains an alias for absl…
5bcf657f
ICGog [IFRT] Move MemoryKind attribute from IfrtShardingAttrInterface to If…
bc30cae2
penpornk [xla:cpu] Add CustomCall thunk.
43b64083
hawkinsp [XLA:Python:JAX] Add a method jax_jit.parse_arguments and a class jax…
f69fafeb
ICGog [IFRT] Add ifrt.CopyArrays op to IFRT IR.
0d801cd2
akuegel Replace EXPECT_OK with TF_EXPECT_OK
6f72e32c
jreiffers Simplify div simplification.
6fd81ed5
klucke Stop using xla/statusor.h now that it just contains an alias for absl…
003e21fa
Adam-Banas [xla:cpu] Add support for convolution thunks
7fbc7481
tensorflower-gardener compat: Update forward compatibility horizon to 2024-06-21
efbe0865
tensorflower-gardener Update GraphDef version to 1900.
9d037686
Moerafaat [XLA:GPU] Add HloFindAll to hlo_traversal to find all nodes matching …
87c79a19
jreiffers Unify constant folding for affine expressions.
c3890062
mooskagh [XLA:GPU] Fix broken build
b45bc2e7
Introduce nested tuple support in FFI
8d71548c
dimitar-asenov [XLA:GPU] Remove the requirement to run on a machine with a GPU when …
e15904cc
jreiffers Fix simplification of a//b//c.
57cfb204
akuegel Remove unnecessary dependencies.
9aea1b91
jreiffers NFC: Replace RemoveSummands with MapSummands.
e193e941
alankelly Check input channels before delegating
d566f8de
phu0ngng PR #13856: Extend FFI DataType with FP8 Types
e73001e2
olegshyshkov [XLA:GPU] Fix integer overflow issues in Cost Model and Symbolic Tile…
f11f5e37
Cjkkkk PR #13757: [XLA:GPU] Upgrade cuDNN frontend to 1.5
6dda17ce
blakehechtman [XLA] Do not validate the operand layout constraint of LayoutConstrai…
a8c6cbd5
kuym [XLA:GPU] Clang-tidy cleanup for xla/service/ar_crs_combiner_test.cc
32f6b488
kuym [XLA:GPU] Clang-tidy cleanup for xla/service/bfloat16_conversion_fold…
5ad34022
bchetioui [XLA:GPU] Parametrize Triton support tests by data type and device type.
9f0e241c
tf-marissaw Introduce utility function GetPrevAllowedBatchSize.
a9ab5056
bchetioui [XLA:GPU] Introduce `ConstraintExpression` to hold generalized constr…
a892e211
kuym [XLA:GPU] Clang-tidy cleanup for xla/service/bfloat16_propagation_tes…
c26e492f
tensorflower-gardener Move metadata_util to tensorflow/compiler/mlir/lite/experimental/remat
abcaba2e
klucke Stop using xla/statusor.h now that it just contains an alias for absl…
ff06f4d7
seherellis [XLA] Support broadcast as a formatting op in collective pipeliner.
ba946a9d
kuym [XLA:GPU] Clang-tidy cleanup for xla/service/broadcast_canonicalizer.cc
701f78b0
kuym [XLA:GPU] Clang-tidy cleanup for xla/service/buffer_assignment.h
4203409f
tensorflower-gardener Add wrapper for building reduce-window HLO using a binary operation's…
dbc7a4ac
wenchenvincent PR #12928: [ROCM] Updated fp8 matmul with adjustments for updated hip…
5e39d5ce
chihuahua [XLA:GPU] Use radix sort in place of classic sort for TopK if input s…
be6bfff2
kuym [XLA:GPU] Clang-tidy cleanup for xla/service/call_graph.h
34a8b4c3
klucke Fix TSAN issue in interpreter Stream.
0ef3cc94
klucke Stop using xla/statusor.h now that it just contains an alias for absl…
0d541c30
tensorflower-gardener Fix a typo and add corresponding tests. The typo essentially reversed…
6688a28e
tensorflower-gardener Integrate LLVM at llvm/llvm-project@c07be08df573
525d163f
pak-laura Move `::mlir::lite::QuantizeWeights` from `TfLiteStatus` to `absl::St…
a76a162a
tensorflower-gardener Allow strategies for slice ops where the sliced dimensions could be s…
5bfe6d08
sdasgup3 Integrate StableHLO at openxla/stablehlo@61826746
eea84120
tensorflower-gardener Automated Code Change
04938cc9
tensorflower-gardener Automated Code Change
a025656d
tensorflower-gardener compat: Update forward compatibility horizon to 2024-06-22
342af9b5
tensorflower-gardener Update GraphDef version to 1901.
a2c59354
tensorflower-gardener Automated Code Change
09a5049b
tensorflower-gardener Automated Code Change
0e07c958
tensorflower-gardener Adds an option to enable / disable post-processing.
0f0409a7
fhoushmand Add GetUniqueGteInstruction to hlo_query utility file.
069e0516
tensorflower-gardener Automated Code Change
0ea7f4ae
ezhulenev [xla:cpu] Add thunk_testlib for writing tests for thunks
d608037a
ezhulenev [xla:cpu] Add WhileThunk test
bd2b781b
ezhulenev [xla:cpu] Add ReplicaId thunk test
e371313c
Adam-Banas [xla:cpu] Add extern templates for Conv2D and Conv3D.
d7602ced
mrry [Gradients] Tag constant zero tensors for outputs with no gradient wi…
eb1f2b41
tensorflower-gardener Update GraphDef version to 1902.
a4fbe1b3
tensorflower-gardener compat: Update forward compatibility horizon to 2024-06-23
efd51a9e
chihuahua [XLA:GPU] Set reduce_window_rewrite_base_length to 16 by default
841fecad
akhilgoe PR #10301: [XLA:CPU][oneDNN] Convolution XLA HLO Pattern Matcher with…
bea53c9a
tensorflower-gardener Automated Code Change
88980c0d
qukhan Add missing `const` qualifier in `tflite::Subgraph`.
24d85f37
heinsaar [xla:cpu] Add benchmark for op `gather`
4e518efe
tensorflower-gardener Update GraphDef version to 1903.
e5fc0100
tensorflower-gardener compat: Update forward compatibility horizon to 2024-06-24
ab8ac1fd
chsigg Disable Zapfhahn for tests that time out.
84587690
dimitar-asenov [XLA:GPU] Remove unused function in `triton_support_test`
94728ca5
d0k Integrate LLVM at llvm/llvm-project@e5a41f0afc15
b6744034
d0k Integrate LLVM at llvm/llvm-project@5cd0ba30f53d
347de0e0
mmakevic-amd Merge remote-tracking branch 'upstream/master' into develop-upstream-…
dc439ed0
mmakevic-amd Fix merge conflicts
f8a8e75e
hsharsha PR #14017: [ROCm] Fix Build break due to f4212dc and 0f75900
285ccc38
mmakevic-amd Re-enable fixed HLO tests
096d7f11
mmakevic-amd Enable dot_algorithm_support_test and determinism_test
e9be6470
mmakevic-amd Enable dot tests
5d5d09cc
mmakevic-amd Disable determinism_test due to https://github.com/openxla/xla/pull/1…
6a78b6c2
mmakevic-amd Disable triangular_solve_test
075b5ba4
mmakevic-amd Fix reduce_large_row_to_scalar.hlo.test
60b8a3e7
mmakevic-amd Fix failing gpu_kernel_tiling_test subtests
7b52eab3
i-chaochen
mmakevic-amd Disable dot tests due to https://github.com/ROCm/frameworks-internal/…
150bb37e
mmakevic-amd mmakevic-amd force pushed from 15233ac2 to 150bb37e 1 year ago
i-chaochen i-chaochen requested a review from i-chaochen i-chaochen 1 year ago
i-chaochen i-chaochen requested a review from hsharsha hsharsha 1 year ago
hsharsha
hsharsha approved these changes on 2024-07-29
hsharsha hsharsha merged f1d1afdb into develop-upstream 1 year ago
i-chaochen
hsharsha

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone