PR #24353 Merge 'main' into 'win-ort-main' @ 39e585ff2b

Make Nuget package pipeline 1ES compliant (#23803)

839d9dcd

Conveting npm packaging pipeline to 1ES (#23767)

9a2e0090

[webgpu] support resize operator (#23780)

cc3f4120

Upgrade React Native to 0.73 (#23575)

40c329ef

Make Nuget CUDA package pipeline 1ES compliant (#23804)

d5742708

[ARM CPU] Fix flaky hgemmb ut (#23814)

7a3810d3

[TensorRT EP] update oss parser to latest (#23710)

000f2c9f

[webgpu] Fix alignment issues in shader code (#23776)

c6664e20

upgrade emsdk to 4.0.4 (#23819)

6df0973e

[OVEP] Update support for Contrib Ops (#23789)

17f39475

Update onnxruntime_external_deps.cmake: add missing EXCLUDE_FROM_ALL …

b1f2a3f5

Quant tool: Add `nodes_to_exclude` in `get_qnn_qdq_config` (#23779)

5ab953cb

[ORT/CI_Pipeline] Use --enable_generic_interface in ORT builds for EP…

05642657

Increase npm package pipeline ReactNative_CI_iOS timeout to 120 mins …

a189bfca

[Mlas] Unblock hardcoded matmul blocking size (#23815)

c61a4b11

Revert changes onn mac-react-native-ci-pipeline.yml (#23845)

2a4cfab4

Fix flash attention for GQA (Phi4) (#23850)

1be64f88

Model Builder API (#23223)

1088a1ed

Fix typo: change `Upample` to `Upsample`. (#23838)

1ffe793a

[doc] Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/ (#23848)

0a6b05fb

Quant tool: Consistent `get_qdq_config` and `get_qnn_qdq_config` beha…

daf9565d

Change the logic to generate the default ep context file name (#23788)

99c51a32

Make Nuget QNN package pipeline 1ES compliant (#23805)

7f0c2c64

[js/common] allows using Uint16Array as data for float16 tensor (#23827)

18725277

[js/webgpu] Reland the optimization of ConvTranspose (#23858)

325ee309

[OpenVINO] Fix a build warning (#23877)

30c68254

Change gsl::byte to std::byte (#23872)

bde4fbec

Allow using extended minimal build for several EPs (#23834)

17dcea7a

Add dawn to ThirdPartyNotices (#23876)

813bdaab

Enable QNN EP weight sharing generation using public API (#23702)

9d0dc9f0

[QNN-EP]: Fix inference failures while running with htp_shared_memory…

788ca51b

Fix enable_pix_capture build for WebGPU (#23857)

8aed9208

[WebGPU-EP Native] Add ReduceMean (#23860)

834adde8

[WebGPU EP] introduce BiasAdd contrib op (#23861)

cfb0a72f

Dynamo export and improve benchmark script for SAM2 encoder (#23887)

5e636a67

[js/web] improve workaround for bundlers (#23902)

aafa8d17

[webgpu] Restore MatMulNBits workgroup size for Phi-3.5 (#23349)

d35db9b8

[webgpu] support Pad operator (#23141)

95225dda

[WebNN] Accept Float16Array for float16 data type if it is available …

b5242293

Ensure that the 'cmake_minimum_required' is version 3.5 or greater (#…

996fffbe

WebGPU: Remove deprecated subgroups-f16 from WebGPU native and JS EP …

54b2d64c

[JSEP/WebGPU] Fixed error in softmax dispatch. (#23906)

ccf8fdd9

enable WebGPU EP in WebAssembly build (#23913)

101353cf

Adding OpenVINO Windows CI Pipeline (#23919)

8f077435

[WebGPU EP] SoftMax Implementation (#23538)

4bb79d13

Exclude MAUI projects from GPU C# packaging builds (#23923)

b2ab87e8

Support all block sizes that are multiples of 32 for DP4A (#23907)

eeaf73b3

Example custom op with output type inferencing (#23916)

c28bf788

Enabling L2+ Optimizations for EPs (#23517)

1199dc08

fix binplace file in web pipeline (#23930)

2ba076aa

Updated run_CIs_for_external_pr.py to support the Windows OpenVINO CI…

e47c6c16

Fix ConvInteger handling of optional inputs. (#23935)

8969ee78

Updated ov version in pipeline (#595) (#23882)

26f590b3

[AIX] External data handling (#23859)

f25deaea

Create a packaging pipeline for a custom nuget package (#23918)

593d5c0e

Fix license in example test code. (#23936)

7dbbfe08

replace usage of gsl::narrow and gsl::narrow_cast in WebGPU EP (#23926)

ab38607d

VCPKG improvement: set VCPKG_OSX_DEPLOYMENT_TARGET (#23933)

cffef2e0

Allow using a different version of flatbuffers when building with vcp…

49328fe6

Make python package pipeline 1ES compliant (#23800)

95dcd150

Delete ROCM Nuget Publishing Pipeline (#23948)

989d4177

Bump SixLabors.ImageSharp from 2.1.9 to 2.1.10 in /csharp/sample/Micr…

fe7634eb

Make python CUDA package pipeline 1ES compliant (#23802)

246c2191

Migrate yarn to npm (#22116)

773bb4ff

[WebGPU/JSEP] Support group query attention do_rotary attribute (#23524)

333fbdb4

Fix npm audit in js/react-native/e2e (#23975)

f18e9faa

Suppress some warnings in WebGPU EP generated by GCC 13 (#23984)

64436265

Fix NPM audit in js/react-native (#23974)

d010acb5

Bump axios from 1.7.9 to 1.8.2 in /js/node (#23963)

9118b1de

GCC 14: fix insert_or_assign() call (#23955)

5672cf7d

ADD emsdk env vars to VCPKG_KEEP_ENV_VARS (#23997)

d2bf9a79

Fix ONNX Runtime Python Test Pipeline (#23990)

fe435371

[webgpu] Fix the continuation issue (#23999)

16d6f397

[WebGPU EP] Implements Gelu, BiasSplitGelu, and QuickGelu (#23981)

9891eb3d

[Native WebGPU] Added ReduceMax and ReduceSum (#23934)

6dd6ef93

Convert Windows CPU CI Pipeline to Github Actions (#23996)

47bd0468

[Fix] Dependencies find_package Eigen error (#23939)

06482c26

Update onnxruntime_c_api.h to work with MinGW (#24006)

5e057292

Add DNNL github workflow (#24011)

57ddd026

Qnn weight sharing improvement (#23945)

7ae606f7

Correct generated cmake syntax (#24016)

11216a4e

[webgpu] allow to specify UseIndicesTypeAlias for Indices (#24019)

1362e7ca

[webgpu] allow overloads to Program::AddIndices (#24021)

401f24a8

fix test for RotaryEmbedding (#24022)

219c919c

Fix attention bias broadcast (#24017)

99b78a94

Remove unused parameter in csharp InferenceTest (#24031)

5bd31636

[TensorRT EP] Call cudaSetDevice at compute function for handling mul…

6bb6d791

Increase timeout for ARM64-Xcode16-targeting-iphonesimulator (#24030)

3f71d637

Support tvOS build (#24000)

1fc6d8ca

[TensorRT EP] Stop enforcing oss parser during Windows debug build (#…

cb3f631f

Set CMAKE_POLICY_DEFAULT_CMP0069 to NEW to ensure that IPO flags are …

9a296a0a

Make Cuda packaging pipeline 1ES compliant (#23806)

9f214561

[webgpu/wasm] allow runtime switch between WebGPUEP and JSEP (#24032)

7c05e7f5

Move call to MLAS_CPUIDINFO::GetCPUIDInfo() out of MlasSQNBitGemmDisp…

c9c8b48d

[webgpu] fix the wrong dispatch size in flash_attention (#24020)

cc5840be

avoid copy unnecessary files for nodejs pkg (#23992)

41c239df

Add support for custom position ids and attention bias to GQA CPU ope…

5a694bcb

[WebNN] Better int64 integration (#23831)

73d9826a

Convert Windows GPU pipelines and Windows OpenVino pipeline to Github…

b8966665

[ARM CPU] Fix fp16 const initialization on no-fp16 platform (#23978)

f22ee08f

[Native WebGPU EP] Add packedQKV and do_rotary attribute support to G…

ae501eeb

Whisper Redesigned Solution (#23549)

7942fa7a

Windows: Show more useful DLL load errors to say exactly what DLL is …

5ef0d211

Extend CMAKE_CUDA_FLAGS with all Blackwell compute capacity (#23928)

2bc73ca8

[WebGPU] Reduce staging buffers for uploading intializers (#23968)

f5812d0e

[WebGPU EP] Implement Remaining Reduction Ops (#24045)

154e3b7d

add bool support to EPContext schema to unblock some models (#24065)

a46d2127

[WebGPU EP] fix for reduce min/max error on MacOS CI (#24077)

b3aa5a3c

Upgrade current MacOS-13 to 14 (#23293)

e495750a

Fix CUDA EP Abs and Sign bfloat16 support (#23914)

c6a26754

Improve typing for OrtValue and other public Python interfaces (#24086)

12fea572

[webgpu] Limit that K must be divisible by 128 to apply dp4a matmul (…

a85977dd

Add macOS ARM64 pipeline for webgpu (#24060)

d98046b3

[WebNN/WebGPU JS] Fix shared Module methods overriding each other (#2…

eceae8b2

Enable multithreading on FP16 to FP32 cast operator (#23619)

7fc7d5ec

Move Android CI Pipeline to Github Actions (#24094)

3488ba39

Cleanup CoreML EP's code to remove COREML_ENABLE_MLPROGRAM (#23490)

7444feeb

webgpu ep support for argmax/argmin (#24089)

b626409e

[mobile/reactnative] Remove namespace from AndroidManifest.XML to res…

d8ed4da1

[WebGPU EP] fix implementation of Pow (#24088)

80441e4e

Increase timeout to 90min for ARM64-Xcode16-targeting-iphonesimulator…

731b27e2

[WebGPU] fix test failure in Reduce operators on macOS ARM64 (#24108)

da7874c8

[WebGPU EP] Implements CumSum Operator (#24047)

8d21bf72

[webgpu] Use 1d dispatch group size (#24084)

81a89204

[WebGPU] fix test failure in MatMulNBits on macOS ARM64 (#24109)

9dcb99cd

[QNN-EP] Add support for Sum operator with 2 inputs (#24098)

4d5e274f

[WebNN] Replace narrow with SafeInt for consistently in integer handl…

5d43f0ab

[QNN-EP] Add Lora Support with offline QNN context binary (#24026)

6bdbf08c

[TensorRT EP] support TensorRT 10.9-GA (#23905)

440d17a7

[webgpu] Apply dp4a for generation shader (#24064)

127c8503

[CUDA] Support slide window in cutlass fused attention (#24072)

db0c95c1

[MIGraphX EP] rename HIPPinnedAllocator to MIGraphXPinnedAllocator (#…

16b0b323

[MIGraphX EP] check POLICY CMP0144 availability before used (#24104)

9922d480

[JSEP] handles edge case in gridsample operator (#24121)

469fb7e3

[OpenVINO]Session Options Appended After AppendExecutionProvider (#23…

49024a1e

[webgpu]Add MaxPool and AveragePool (#23714)

7a6514c8

[webgpu EP] put GetMaxComponents and SumVector to one place. (#24122)

9e53afab

skip MOE python test when MPI is not installed (#24116)

dcc1f5ac

Integrate KleidiAI for MatMulNBits via MlasQNBitGemm (#23627)

90c5ffb5

add test cases for webgpu ep in web (#24117)

0a363d9e

Refactor Webnn IsSupported*() to use constant initializers. (#24118)

cd9406bf

Deleted the constant SKIP_CUDA_TEST_WITH_DML (#24113)

4959468a

Update T5 Onnx Export and Optimization (#23949)

d84314cb

Update package.json to make the dist avaliable again (#23991)

3012d445

Fix attention QK linkage error (#24134)

2b3d7fb1

Bump next from 15.1.2 to 15.2.3 in /js/web/test/e2e/exports/testcases…

5ed900e9

[Shape Inference] Add shape inference for QLinearAdd and QLinearMul o…

2b5c9da6

[mobile] Add Android NuGet BrowserStack test to NuGet packaging pipel…

8eb8c2b0

[CPU] Add fp16 support to sparse attention (#24015)

828e3726

refactor mac CI pipelines (#24138)

373b9e2a

Address Windows CUDA build issue (#24149)

5244d68b

[webgpu] add option to perserve device and enable in unittest (#24115)

e03631ee

[js/web] allow bundler import condition for not bundling wasm (#24014)

78d91cdd

[js] Add API for accessing metadata of a model's input/output (#23937)

618aef7e

add cache "onnxnodetests" for node tests (#24150)

afaf4a5e

[Native WebGPU] Add Matmul (#24046)

ce65e253

Upgrade Big Model pipeline CUDA from 11.8 to 12.x (#24156)

bb005b93

Proper Error Message when fp16 model is used for Beam Search in CPU (…

de502c89

Change type len from int to size_t (#24157)

a4b8f11c

Limit the Pipeline ability to build cuda 11 (#24073)

a8fb7868

Move Linux CPU CI pipeline to Github Actions (#24154)

86806677

Bump vite from 6.2.1 to 6.2.3 in /js/web/test/e2e/exports/testcases/v…

d9c961ce

[onnxruntime_perf_test] Fix custom_allocator_ destruction order. (#24…

1ef30446

Fix layout transformer for FusedConv (#24169)

25b06f20

Migrate Zip-Nuget Package Pipeline to 1ES (#23609) Also, kleidail is …

1f6dc881

Update the min GCC version (#24148)

9dbfee91

[QNN EP] ARM64EC python package remove --vcpkg in build (#24174)

2a800d1e

[WebGPU EP] Add GEMM implementation (#24023)

a8673c6e

[wasm] remove --vcpkg in wasm build (#24179)

513e8de1

revise mac os pipeline to reduce the amount of jobs (#24177)

32b376cd

fix triggering for "Validate Gradle Wrapper" pipeline (#24181)

be1cfc4e

upgrade QNN to version 2.32.0.250228 (#23977)

5d805c23

[JSEP] adjust edge case logic for scatternd (#24172)

24ece479

Make the custom nuget packaging pipeline 1ES commpliant. (#24191)

1f70fc25

Disable KleidiAI in Python Packaging pipeline MacOS build (#24194)

4d13b70f

Rolling back the python/cuda (#24170)

041674ad

Remove all CG template from pipelines (#24193)

914be22e

Move Linux ARM64 CI pipeline and Linux DNNL CI pipeline to Github Act…

bd00c39f

[webgpu-ep] Fix test_batchnorm_example (#24184)

86b4c789

Further reduce work load for Mac CI pipeline (#24197)

26566710

Generate unique names for SliceSplit fusion. (#24217)

64b0d071

Fix the pipeline that failed because of vcpkg (#24226)

25921476

Improve Shape Inference for GQA (#24143)

c756e0ab

Add React Native namespace back in for iOS (#24218)

19d8d69c

RoPE fp16 avx (#23772)

180ba8f8

Migrate Linux GPU pipelines to Github Actions (#24232)

f430dce9

Migrate Web CI into github actions (#24219)

41dde351

update the readme doc for the tool ep_weight_sharing_ctx_gen (#24233)

4a669fd1

[WebGPU EP] If Implementation for WebGPU EP (#24242)

7ef0ddc5

Update linux-dnnl.yml: rename the pipeline (#24240)

8de342ad

[webgpu] Fix test_layer_normalization_2d_axis0 (#24223)

d71aa4d8

[webgpu] fix LayerNorm with empty input (#24244)

f1d790c2

Bump actions/setup-python from 4 to 5 (#24251)

492af7a3

Bump actions/cache from 3 to 4 (#24250)

83650edc

[QNN EP] Add platform-agnostic EP option to specify QNN backend, `bac…

22787aec

[webgpu] Fix opset-12 softmax nhwc issue (#24227)

ad2e5652

Extend pyright exclude list in pyproject.toml (#24246)

528f29a8

[js/web] Add Wasm Relaxed SIMD support to wasm backend (#22794)

ba2999c5

Add shader key validation step in WebGPU CI pipeline (#24243)

4eeefd72

upgrade dawn version to 4cb1f9be152a4fa6bb695c08cd707ab078a1e2fb (#24…

30115cfe

Bump dsaltares/fetch-gh-release-asset from 1.1.0 to 1.1.2 (#24248)

5982430a

Bump vite from 6.2.3 to 6.2.4 in /js/web/test/e2e/exports/testcases/v…

e2274150

[WebGPU EP] fixes bugs in split implementation (#24259)

5068ab9b

Bump microsoft/onnxruntime-github-actions from 35f8bd42417991aa46577e…

1b48cc41

Update xcode and iphoneSimulatorVersion after MacOS-14 (#24260)

5b080558

Exclude onnxruntime-inference-examples directory from Component Gover…

24620e70

[VitisAI] Fixed include error. (#24199)

67216c89

Migrate pull:wasm to github action (#24269)

a5bc69c5

Ensure to use correct GPU device in RunSince when it's invoked by new…

b3793906

Adding build-system to pyproject.toml (#24216)

b5d15bc9

[WebGPU EP] Implements ceil mode for Average Pool (#24270)

bc7b07db

Pin vcpkg version (#24284)

55aa03c1

Support load TensorRT V3 plugin (#24211)

a14d586d

Expose TRT preview features as EP option (#24212)

21db38c1

[webgpu] test_layer_normalization_3d_axis0_epsilon (#24276)

8465ca38

[webgpu][dawn API optimization] reduce number of calls to wgpuDeviceH…

7a551887

Bump next from 15.2.3 to 15.2.4 in /js/web/test/e2e/exports/testcases…

d2388135

Bump image-size from 1.1.1 to 1.2.1 in /js/react_native/e2e (#24278)

cbaa8bc7

[QNN-EP] Enhance QNN-EP support for Softmax with opset < 13. (#24180)

a28da4b9

Update publish-nuget.yml to correct feed. (#24299)

e5e906ee

[webgpu] Optimize MatMulNBits for f16 Block32 prefill performance (#2…

3dfc2ae3

upgrade action shellcheck to v1.30.0 (#24304)

82c8e569

[QNN-EP] Fix ONNX context model helper. (#24271)

1cb53d00

[WebGPU] fix Pad cache key (#24305)

318cc87f

Bump vite from 6.2.4 to 6.2.5 in /js/web/test/e2e/exports/testcases/v…

56f10183

[WebGPU] fix cache key of AttentionProbs/VxAttentionScore (#24309)

2e94c5a4

Support Gemma3 with Clip fused attention (#24280)

e944379e

Update packaging pipeline for Nodejs binding (#24301)

11fda2ad

Add support for uint8_t as data type for GatherBlockQuantized (#24239)

a4976e33

[Native WebGPU] Add Conv, ConTranspose and FusedConv (#24186)

9102aaee

[webgpu][dawn API optimization] reduce number of calls to wgpuDeviceG…

a7e62d63

Fix 'minimal_power' to 'minimum_power' for DirectML performance selec…

55c1a3b0

Add ConvTranspose cache key (#24317)

d6df4f29

[webgpu] Use 1D dispatch groups for attention (#24228)

a1186f63

[webgpu][dawn API optimization] reduce number of calls to buffer APIs…

73676fc5

Implement load cancellation ability (#24257)

350d1400

[webgpu] Fix ROUND_PREFER_CEIL issue of Resize operator (#24229)

ca1b32df

[Native WebGPU] Exclude WebGPU EP from ConvFp16 3D tests. (#24327)

b803429a

[VitisAI EP] export InferShapes to VitisAIEP (#23881)

554fb4ad

[webgpu] Flash attention for generation (#23808)

18f91e55

Use WASM f32x4 relaxed min/max for relaxed simd build (#24324)

04e0b50c

webgpu support for DequantizeLinear (#24268)

f83e6618

[webgpu] fix the reflect mode issue of Pad (#24202)

10e51d26

Remove explicit batch network flag for TRT 10+ (#24298)

4edada60

[webgpu] Fix bias_split_gelu (#24342)

22656134

[webgpu] fix bias-add (#24336)

34abb8b4

[webgpu] optimize SkipLayerNormalization operator (#24164)

0acb0488

ROCm: Remove -Wno-interference-size compiler flag (#24326)

d7a38a57

[web] revise flag `ort.env.wasm.simd` (#24314)

39e585ff

Merge 'main' into 'win-ort-main' @ "39e585ff2b:[web] revise flag `ort…

529a34e6

mschofie requested a review from

ashrit-ms 1 year ago

mschofie requested a review 1 year ago

mschofie merged f03e3c7d into win-ort-main 1 year ago

mschofie deleted the mschofie/merge-1.22-current branch 1 year ago

onnxruntime
Merge 'main' into 'win-ort-main' @ 39e585ff2b
#24353

Merged

Merge 'main' into 'win-ort-main' @ 39e585ff2b #24353

onnxruntime Merge 'main' into 'win-ort-main' @ 39e585ff2b #24353 Merged

Merge 'main' into 'win-ort-main' @ 39e585ff2b #24353

onnxruntime
Merge 'main' into 'win-ort-main' @ 39e585ff2b
#24353

Merged