pytorch
4f3dd80c - [Vulkan] VK Timestamp Queries for op profiling (#75829)

Commit View On GitHub

Commit

2 years ago

[Vulkan] VK Timestamp Queries for op profiling (#75829) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75829 Added ability for measuring the pure GPU execution time by using [Vulkan Timestamp Queries](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#queries-timestamps). * Two key limitations for Vulkan Timestamp Queries: * `timestampComputeAndGraphics` specifies support for timestamps on all graphics and compute queues. * `timestampPeriod` is the number of nanoseconds required for a timestamp query to be incremented by 1. * See [VkPhysicalDeviceProperties.VkPhysicalDeviceLimits](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceLimits.html) * These can be extracted from `Adapter::properties::limits`. * The pure GPU execution time includes shaders and recorded command buffers + butter to texture conversion. However, it doesn't include the overhead of tensor conversion between CPU and Vulkan. * A new class `QueryPool` was introduced to manage Vulkan timestamp queries: * Per-thread (TLS) instance * Thread-safe: Verified by `cat_op_channel_perf_gpu_only` test with 3 threads in `vulkan_perf_test.cpp` * This records the pure GPU execution time (shader execution and Vulkan API calls) for each operator. In other words, it doesn't include some setup code and tensor conversion between CPU and GPU for the operator. Note that `copy_` query will be recorded only when converting Vulkan to Vulkan tensor. Waiting for all GPU operations and `memcpy()` calls are involved for other copy operations (CPU <-> Vulkan) . * `QueryPool::Configuration::kMaxQueryCount` defines the max # of queries. 65,536 for now but can be increased as needed in the future. * `PerfInfo` struct contains query name, start/end/execution time in microseconds. * `QueryPool::Configuration::kTimestampsPerQuery` defines the number of timestamps per query. 2 for start and end timestamps * `is_enabled()` returns whether or not the timestamp query feature is enabled. * `enable()` enables the timestamp query. To isolate the queries from the previous run, it submits the pending command buffer first by calling `submit_anypending()` if exists. * `disable()` stops the timestamp query and returns a collection of `PerfInfo`. To wait for any pending execution, it also submits the pending command buffer first by calling `submit_anypending()` if exists. * `begin()` starts a query with command buffer and query name. This should be called before starting operation recording for the command buffer. It returns the query index assigned to the command buffer. * `end()` finishes the query by query index for the command buffer. This should be called before calling `vkEndCommandBuffer()` or `vkQueueSubmit` API, and after recording all desired shader or Vulkan API calls. * `result()` returns a collection of `PerfInfo` for the queries so far. It leaves a warning message if `waitfor_allqueries` argument is false and `vkGetQueryPoolResults()` call returns `VK_NOT_READY(1)` because the command buffer was not executed yet. * `submit_anypending()` submits if there is a pending command buffer (`Command::Pool::stream_.buffer`). The pending one is waiting for more calls prior to submission. If we don't submit the pending one, `vkGetQueryPoolResults()` won't return at all which results in hanging. * Expected exceptions if: * the `timestampComputeAndGraphics` flag of the device is false which means it doesn't support the timestamp query feature. * the query pool already exists when the QueryPool instance is already enabled. * the query index exceeds `Configuration::kMaxQueryCount`. * Manual time based Vuikan performance tests: * With google benchmark, each iteration time can be manually set by `benchmark::State::SetIterationTime()` after setting `Benchmark::UseManualTime()`. * See [UseManualTime API](https://github.com/google/benchmark/blob/365670e4328beb694d0a3adaf40a5974a616bb17/include/benchmark/benchmark.h#L1013) * Ignore `CPU` column. `Time` column is the average execution time for each operator. * Added gtest `SetUpTestSuite` and `TearDownTestSuite` impl. to enable/disable profiling {F724058968} * The desired order of API calls: ``` vkBeginCommandBuffer vkCmdResetQueryPool (optional, this is not needed since we are not reusing the same query) vkCmdWriteTimestamp for start ... vkCmdDispatch for shader (or vkCmdXXX API calls. i.e., vkCmdCopyImage) vkCmdWriteTimestamp for end vkEndCommandBuffer or vkQueueSubmit vkGetQueryPoolResults ``` * Limitation: * Vulkan Timestamp Queries doesn't work on MacOS due to the limitations of MoltenVk (which is running on top of the Apple Metal framework to support Vulkan APIs). The begin/end timestamp counters return the same number. * How it works: * Call `Command::Pool::stream(QueryPool* query_pool, , const std::string& query_name)` when getting a command buffer before recording. This internally calls `Command::Buffer:::begin_query()` which begins adding the start timestamp (`QueryPool::begin()`). `begin_query()` stores the QueryPool pointer so that it calls `QueryPool::end()` when `Command::Buffer::end()` gets hit if there is a fence or `Command::Pool::stream_.counter` is greater than `Command::Pool::Configuration::kSubmit`. * Call `Command::Pool::submit()` after recording all command buffers. This internally calls `Command::Buffer::end()` which result in calling `QueryPool::end()`. * Usage - Recording timestamps: * The reason why we need to pass an instance of `QueryPool` here instead of retrieving by `api::context()->querypool()` is for injecting a test instance of `QueryPool` in unit tests. ``` // Case 1. No query recording for scenarios where we don't execute any shader nor Vulkan API call api::Command::Buffer& command_buffer = command_pool.stream(); // Case 2. Simply passing the pointer of QueryPool instance with query name enables the timestamp recording api::Command::Buffer& command_buffer = command_pool.stream(&context->querypool(), "aten::hardtanh"); ``` * Usage - Getting the result for the `cat` operator with google benchmark: ``` for (auto _ : state) { at::native::vulkan::api::context()->querypool().enable(); at::cat({in_vulkan1, in_vulkan2, in_vulkan3}, 1); auto perf_info = at::native::vulkan::api::context()->querypool().disable(true); state.SetIterationTime(perf_info[0].execution_time_us / 1'000'000.); // us to sec } ``` * Test result of `vulkan_perf_test` for `cat` operator with 4x channels (shader vs `vkCmdCopyImage`): * GPU only for `cat` operator; the pure GPU execution time of shaders, recorded command buffers and buffer-to-texture conversion: 86.91% improved in average {F722343755} * Reflected code review feedback: * Decoupling `QueryPool` from `Command Buffer` - Added a new class `OpProfiler` as a RAII object - Backed out the changes on `Command.h/cpp` - Replaced all `stream()` calls with `OpProfiler` RAII instances * Removed `submit_pending()` API. Now we use `.cpu()` call to make sure all GPU operations are done in the test cases * Added new `copy_image_to_buffer` and `copy_buffer_to_image` op names for copy operations between CPU and GPU * Reflected some minor feedback such as `TORCH_CHECK`, `emplace_back` and `timestamp_period_us_` * References: * [Benchmarking GPU execution by manual timing](https://chromium.googlesource.com/external/github.com/google/benchmark/#manual-timing) * [A micro Vulkan compute pipeline and a collection of benchmarking compute shaders](https://github.com/google/uVkCompute) * [Improving Vulkan Breakout](http://kylehalladay.com/blog/tutorial/vulkan/2017/08/30/Vulkan-Uniform-Buffers-pt2.html) * [Sample code with VK Timestamp Queries](https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/src/graphics/lib/compute/hotsort/platforms/vk/tests/hotsort_vk_bench/main.c) * [[mlir][vulkan-runner] Add basic timing for compute pipeline](https://reviews.llvm.org/D75531) * Next Steps: * Publish all Vulkan timestamp queries via `KinetoEdgeCPUPorfiler` for op level benchmarking Test Plan: **Test build on Android (vulkan_perf_test)** ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test adb shell "/data/local/tmp/vulkan_perf_test" ``` **Test build on Android (vulkan_api_test)** ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` **Test result on Google Pixel 5 (vulkan_perf_test)** ``` Without optimization for 4x channels: --------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------------------------------------- cat_op_channel_perf_gpu_only/N:3/C:40/H:221/W:193/iterations:100/manual_time/threads:1 30.1 ms 6.78 ms 100 cat_op_channel_perf_gpu_only/N:3/C:20/H:221/W:193/iterations:100/manual_time/threads:1 21.7 ms 3.17 ms 100 cat_op_channel_perf_gpu_only/N:3/C:39/H:221/W:193/iterations:100/manual_time/threads:1 30.0 ms 5.47 ms 100 cat_op_channel_perf_gpu_only/N:3/C:4/H:221/W:193/iterations:100/manual_time/threads:1 6.75 ms 1.05 ms 100 cat_op_channel_perf_gpu_only/N:3/C:3/H:221/W:193/iterations:100/manual_time/threads:1 5.54 ms 1.02 ms 100 cat_op_channel_perf_gpu_only/N:3/C:40/H:221/W:193/iterations:100/manual_time/threads:3 10.0 ms 15.4 ms 300 gru_op_perf/N:384/C:384/H:2/iterations:100/threads:1 18.0 ms 15.9 ms 100 With optimization for 4x channels: --------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------------------------------------- cat_op_channel_perf_gpu_only/N:3/C:40/H:221/W:193/iterations:100/manual_time/threads:1 5.47 ms 3.79 ms 100 cat_op_channel_perf_gpu_only/N:3/C:20/H:221/W:193/iterations:100/manual_time/threads:1 2.77 ms 1.41 ms 100 cat_op_channel_perf_gpu_only/N:3/C:39/H:221/W:193/iterations:100/manual_time/threads:1 30.0 ms 6.02 ms 100 cat_op_channel_perf_gpu_only/N:3/C:4/H:221/W:193/iterations:100/manual_time/threads:1 0.562 ms 0.557 ms 100 cat_op_channel_perf_gpu_only/N:3/C:3/H:221/W:193/iterations:100/manual_time/threads:1 5.39 ms 0.738 ms 100 cat_op_channel_perf_gpu_only/N:3/C:40/H:221/W:193/iterations:100/manual_time/threads:3 1.89 ms 8.03 ms 300 gru_op_perf/N:384/C:384/H:2/iterations:100/threads:1 18.2 ms 16.2 ms 100 ``` **Test result on Google Pixel 5 (vulkan_api_test)** ``` Building... 15.9 sec (99%) 429/431 jobs, 1/431 updated - //xplat/caffe2:pt_vulkan_api_test_binAndroid#android-arm64,binary... 8.0 sec (running computing_output_hashes[4.1 sec]) //xplat/caffe2:pt_vulkan_api_test_binAndroid#android-arm64 buck-out/gen/fe3a39b8/xplat/caffe2/pt_vulkan_api_test_binAndroid#android-arm64 buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid#android-arm64: 1 file pushed, 0 skipped. 149.9 MB/s (881568472 bytes in 5.609s) Running main() from xplat/third-party/gmock/googletest-1.10.0/googletest/src/gtest_main.cc [==========] Running 106 tests from 1 test suite. [----------] Global test environment set-up. [----------] 106 tests from VulkanAPITest [ RUN ] VulkanAPITest.adaptive_avg_pool2d [ OK ] VulkanAPITest.adaptive_avg_pool2d (23 ms) [ RUN ] VulkanAPITest.add [ OK ] VulkanAPITest.add (63 ms) [ RUN ] VulkanAPITest.add_broadcast0 [ OK ] VulkanAPITest.add_broadcast0 (31 ms) [ RUN ] VulkanAPITest.add_broadcast1 [ OK ] VulkanAPITest.add_broadcast1 (23 ms) [ RUN ] VulkanAPITest.add_broadcast2 [ OK ] VulkanAPITest.add_broadcast2 (23 ms) [ RUN ] VulkanAPITest.add_ [ OK ] VulkanAPITest.add_ (144 ms) [ RUN ] VulkanAPITest.add_broadcast0_ [ OK ] VulkanAPITest.add_broadcast0_ (23 ms) [ RUN ] VulkanAPITest.add_broadcast1_ [ OK ] VulkanAPITest.add_broadcast1_ (14 ms) [ RUN ] VulkanAPITest.add_scalar [ OK ] VulkanAPITest.add_scalar (53 ms) [ RUN ] VulkanAPITest.add_scalar_ [ OK ] VulkanAPITest.add_scalar_ (12 ms) [ RUN ] VulkanAPITest.addmm [ OK ] VulkanAPITest.addmm (21 ms) [ RUN ] VulkanAPITest.addmm_expand [ OK ] VulkanAPITest.addmm_expand (30 ms) [ RUN ] VulkanAPITest.avg_pool2d [ OK ] VulkanAPITest.avg_pool2d (18 ms) [ RUN ] VulkanAPITest.clamp [ OK ] VulkanAPITest.clamp (211 ms) [ RUN ] VulkanAPITest.clamp_ [ OK ] VulkanAPITest.clamp_ (194 ms) [ RUN ] VulkanAPITest.conv2d [ OK ] VulkanAPITest.conv2d (22 ms) [ RUN ] VulkanAPITest.conv2d_dw WARNING: Logging before InitGoogleLogging() is written to STDERR W0412 19:30:29.990865 3098789112 TensorImpl.h:1336] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator()) [ OK ] VulkanAPITest.conv2d_dw (22 ms) [ RUN ] VulkanAPITest.conv2d_pw [ OK ] VulkanAPITest.conv2d_pw (79 ms) [ RUN ] VulkanAPITest.conv2d_winograd [ OK ] VulkanAPITest.conv2d_winograd (55 ms) [ RUN ] VulkanAPITest.copy [ OK ] VulkanAPITest.copy (5 ms) [ RUN ] VulkanAPITest.div [ OK ] VulkanAPITest.div (74 ms) [ RUN ] VulkanAPITest.div_broadcast0 [ OK ] VulkanAPITest.div_broadcast0 (25 ms) [ RUN ] VulkanAPITest.div_broadcast1 [ OK ] VulkanAPITest.div_broadcast1 (26 ms) [ RUN ] VulkanAPITest.div_broadcast2 [ OK ] VulkanAPITest.div_broadcast2 (20 ms) [ RUN ] VulkanAPITest.div_ [ OK ] VulkanAPITest.div_ (151 ms) [ RUN ] VulkanAPITest.div_broadcast0_ [ OK ] VulkanAPITest.div_broadcast0_ (31 ms) [ RUN ] VulkanAPITest.div_broadcast1_ [ OK ] VulkanAPITest.div_broadcast1_ (3 ms) [ RUN ] VulkanAPITest.div_scalar [ OK ] VulkanAPITest.div_scalar (190 ms) [ RUN ] VulkanAPITest.div_scalar_ [ OK ] VulkanAPITest.div_scalar_ (63 ms) [ RUN ] VulkanAPITest.empty [ OK ] VulkanAPITest.empty (0 ms) [ RUN ] VulkanAPITest.hardsigmoid [ OK ] VulkanAPITest.hardsigmoid (217 ms) [ RUN ] VulkanAPITest.hardsigmoid_ [ OK ] VulkanAPITest.hardsigmoid_ (212 ms) [ RUN ] VulkanAPITest.hardshrink Max Diff allowed: 0.1 xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:1070: Failure Value of: check Actual: false Expected: true [ FAILED ] VulkanAPITest.hardshrink (1154 ms) [ RUN ] VulkanAPITest.hardshrink_ [ OK ] VulkanAPITest.hardshrink_ (1488 ms) [ RUN ] VulkanAPITest.leaky_relu [ OK ] VulkanAPITest.leaky_relu (758 ms) [ RUN ] VulkanAPITest.leaky_relu_ [ OK ] VulkanAPITest.leaky_relu_ (752 ms) [ RUN ] VulkanAPITest.hardswish [ OK ] VulkanAPITest.hardswish (217 ms) [ RUN ] VulkanAPITest.hardswish_ [ OK ] VulkanAPITest.hardswish_ (214 ms) [ RUN ] VulkanAPITest.max_pool2d [ OK ] VulkanAPITest.max_pool2d (26 ms) [ RUN ] VulkanAPITest.mean [ OK ] VulkanAPITest.mean (24 ms) [ RUN ] VulkanAPITest.mean2d [ OK ] VulkanAPITest.mean2d (60 ms) [ RUN ] VulkanAPITest.mm [ OK ] VulkanAPITest.mm (15 ms) [ RUN ] VulkanAPITest.mul [ OK ] VulkanAPITest.mul (71 ms) [ RUN ] VulkanAPITest.mul_broadcast0 [ OK ] VulkanAPITest.mul_broadcast0 (25 ms) [ RUN ] VulkanAPITest.mul_broadcast1 [ OK ] VulkanAPITest.mul_broadcast1 (23 ms) [ RUN ] VulkanAPITest.mul_broadcast2 [ OK ] VulkanAPITest.mul_broadcast2 (19 ms) [ RUN ] VulkanAPITest.mul_ [ OK ] VulkanAPITest.mul_ (144 ms) [ RUN ] VulkanAPITest.mul_broadcast0_ [ OK ] VulkanAPITest.mul_broadcast0_ (29 ms) [ RUN ] VulkanAPITest.mul_broadcast1_ [ OK ] VulkanAPITest.mul_broadcast1_ (4 ms) [ RUN ] VulkanAPITest.mul_scalar [ OK ] VulkanAPITest.mul_scalar (190 ms) [ RUN ] VulkanAPITest.mul_scalar_ [ OK ] VulkanAPITest.mul_scalar_ (55 ms) [ RUN ] VulkanAPITest.reflection_pad2d [ OK ] VulkanAPITest.reflection_pad2d (8 ms) [ RUN ] VulkanAPITest.reshape [ OK ] VulkanAPITest.reshape (127 ms) [ RUN ] VulkanAPITest.reshape_ [ OK ] VulkanAPITest.reshape_ (99 ms) [ RUN ] VulkanAPITest.sigmoid [ OK ] VulkanAPITest.sigmoid (239 ms) [ RUN ] VulkanAPITest.sigmoid_ [ OK ] VulkanAPITest.sigmoid_ (234 ms) [ RUN ] VulkanAPITest.softmax [ OK ] VulkanAPITest.softmax (89 ms) [ RUN ] VulkanAPITest.log_softmax Max Diff allowed: 0.05875 xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:1583: Failure Value of: check Actual: false Expected: true [ FAILED ] VulkanAPITest.log_softmax (41 ms) [ RUN ] VulkanAPITest.tanh [ OK ] VulkanAPITest.tanh (271 ms) [ RUN ] VulkanAPITest.tanh_ [ OK ] VulkanAPITest.tanh_ (263 ms) [ RUN ] VulkanAPITest.sub [ OK ] VulkanAPITest.sub (72 ms) [ RUN ] VulkanAPITest.sub_broadcast0 [ OK ] VulkanAPITest.sub_broadcast0 (22 ms) [ RUN ] VulkanAPITest.sub_broadcast1 [ OK ] VulkanAPITest.sub_broadcast1 (23 ms) [ RUN ] VulkanAPITest.sub_broadcast2 [ OK ] VulkanAPITest.sub_broadcast2 (17 ms) [ RUN ] VulkanAPITest.sub_ [ OK ] VulkanAPITest.sub_ (129 ms) [ RUN ] VulkanAPITest.sub_broadcast0_ [ OK ] VulkanAPITest.sub_broadcast0_ (23 ms) [ RUN ] VulkanAPITest.sub_broadcast1_ [ OK ] VulkanAPITest.sub_broadcast1_ (4 ms) [ RUN ] VulkanAPITest.transposed_conv2d [ OK ] VulkanAPITest.transposed_conv2d (16 ms) [ RUN ] VulkanAPITest.upsample_nearest2d [ OK ] VulkanAPITest.upsample_nearest2d (6 ms) [ RUN ] VulkanAPITest.cat_dim1_samefeature_success [ OK ] VulkanAPITest.cat_dim1_samefeature_success (140 ms) [ RUN ] VulkanAPITest.cat_dim1_difffeature_success [ OK ] VulkanAPITest.cat_dim1_difffeature_success (127 ms) [ RUN ] VulkanAPITest.cat_dim1_texture2d_success [ OK ] VulkanAPITest.cat_dim1_texture2d_success (4 ms) [ RUN ] VulkanAPITest.cat_dim1_singledepth_success [ OK ] VulkanAPITest.cat_dim1_singledepth_success (8 ms) [ RUN ] VulkanAPITest.cat_dim1_singletensor_success [ OK ] VulkanAPITest.cat_dim1_singletensor_success (23 ms) [ RUN ] VulkanAPITest.cat_dim1_twotensors_success [ OK ] VulkanAPITest.cat_dim1_twotensors_success (71 ms) [ RUN ] VulkanAPITest.cat_dim1_bat1_mult4ch_success [ OK ] VulkanAPITest.cat_dim1_bat1_mult4ch_success (18 ms) [ RUN ] VulkanAPITest.cat_dim1_bat2_mult4ch_success [ OK ] VulkanAPITest.cat_dim1_bat2_mult4ch_success (36 ms) [ RUN ] VulkanAPITest.cat_dim1_mult4ch_mixed_success [ OK ] VulkanAPITest.cat_dim1_mult4ch_mixed_success (105 ms) [ RUN ] VulkanAPITest.cat_dim1_mult4ch_nonmult4ch_success [ OK ] VulkanAPITest.cat_dim1_mult4ch_nonmult4ch_success (124 ms) [ RUN ] VulkanAPITest.cat_dim2_sameheight_success [ OK ] VulkanAPITest.cat_dim2_sameheight_success (124 ms) [ RUN ] VulkanAPITest.cat_dim2_diffheight_success [ OK ] VulkanAPITest.cat_dim2_diffheight_success (125 ms) [ RUN ] VulkanAPITest.cat_dim2_singledepth_success [ OK ] VulkanAPITest.cat_dim2_singledepth_success (10 ms) [ RUN ] VulkanAPITest.cat_dim2_invalidinputs_exceptions [ OK ] VulkanAPITest.cat_dim2_invalidinputs_exceptions (125 ms) [ RUN ] VulkanAPITest.permute_2d_success [ OK ] VulkanAPITest.permute_2d_success (28 ms) [ RUN ] VulkanAPITest.permute_3d_success [ OK ] VulkanAPITest.permute_3d_success (7 ms) [ RUN ] VulkanAPITest.permute_4d_success [ OK ] VulkanAPITest.permute_4d_success (13 ms) [ RUN ] VulkanAPITest.permute_4dmclaren_success [ OK ] VulkanAPITest.permute_4dmclaren_success (1 ms) [ RUN ] VulkanAPITest.permute_4dbig_success [ OK ] VulkanAPITest.permute_4dbig_success (211 ms) [ RUN ] VulkanAPITest.permute_negativedims_success [ OK ] VulkanAPITest.permute_negativedims_success (0 ms) [ RUN ] VulkanAPITest.permute_1d_nochange [ OK ] VulkanAPITest.permute_1d_nochange (1 ms) [ RUN ] VulkanAPITest.permute_sameDims_nochange [ OK ] VulkanAPITest.permute_sameDims_nochange (0 ms) [ RUN ] VulkanAPITest.permute_invalidinputs_exceptions [ OK ] VulkanAPITest.permute_invalidinputs_exceptions (1 ms) [ RUN ] VulkanAPITest.slice_width_success [ OK ] VulkanAPITest.slice_width_success (17 ms) [ RUN ] VulkanAPITest.slice_height_success [ OK ] VulkanAPITest.slice_height_success (14 ms) [ RUN ] VulkanAPITest.slice_feature_success [ OK ] VulkanAPITest.slice_feature_success (20 ms) [ RUN ] VulkanAPITest.slice_batch_success [ OK ] VulkanAPITest.slice_batch_success (8 ms) [ RUN ] VulkanAPITest.slice_invalidinputs_exceptions [ OK ] VulkanAPITest.slice_invalidinputs_exceptions (0 ms) [ RUN ] VulkanAPITest.clone_success [ OK ] VulkanAPITest.clone_success (4 ms) [ RUN ] VulkanAPITest.clone_invalidinputs_exceptions [ OK ] VulkanAPITest.clone_invalidinputs_exceptions (1 ms) [ RUN ] VulkanAPITest.mobilenetv2 [ OK ] VulkanAPITest.mobilenetv2 (153 ms) [ RUN ] VulkanAPITest.gru_mclareninputs_success [ OK ] VulkanAPITest.gru_mclareninputs_success (64 ms) [ RUN ] VulkanAPITest.gru_invalidinputs_exceptions [ OK ] VulkanAPITest.gru_invalidinputs_exceptions (17 ms) [ RUN ] VulkanAPITest.gru_prepack_success [ OK ] VulkanAPITest.gru_prepack_success (48 ms) [ RUN ] VulkanAPITest.gru_prepack_invalidinputs_exceptions [ OK ] VulkanAPITest.gru_prepack_invalidinputs_exceptions (106 ms) [ RUN ] VulkanAPITest.profiling_invalideinputs_exceptions [ OK ] VulkanAPITest.profiling_invalideinputs_exceptions (433 ms) [ RUN ] VulkanAPITest.profiling_result_success [ OK ] VulkanAPITest.profiling_result_success (86 ms) [----------] 106 tests from VulkanAPITest (11317 ms total) [----------] Global test environment tear-down [==========] 106 tests from 1 test suite ran. (11345 ms total) [ PASSED ] 104 tests. [ FAILED ] 2 tests, listed below: [ FAILED ] VulkanAPITest.hardshrink [ FAILED ] VulkanAPITest.log_softmax 2 FAILED TESTS ``` **Test result on Google Pixel 5 (vulkan op profiling)** See P496709600 Reviewed By: kimishpatel, SS-JIA Differential Revision: D33151286 fbshipit-source-id: 2e37b3f2401134eeab6d705791d881803b07a73f (cherry picked from commit da217f4e894f1f03738395375464beab5236fe18)

Author

beback4u

Committer

pytorchmergebot

Parents

f4200600

pytorch 4f3dd80c - [Vulkan] VK Timestamp Queries for op profiling (#75829)

Commit

pytorch
4f3dd80c - [Vulkan] VK Timestamp Queries for op profiling (#75829)