PR #2533 update - SemanticDiff

Revert "Trace enter/exit of TorchFunctionModes (#135422)" (#136590)

a31c3fe9

Add nsys integration

2edf80cb

Fix bug #2458 (#2459)

0f050151

Restore FlexAttention and FlashV3 backward (#2473)

611bf702

Fix hardcoded shape in low_mem_dropout benchmark (#2475)

252a3b17

Make FA3 work in fbcode

b6b67a4f

Skip loading triton.nvidia.cublas if not found

0611c41c

Print TMA benchmark info to stderr

0cb1e96d

Modernize cutlass call for fp8 blockwise

2d9ab0b1

CSV of extra shapes for gemm benchmarks

d512e673

Add layout options to gemm

4445aa2b

Enable fp8 rowwise on AMDGPU (#2483)

f2932b74

Ignore Torchbench CI on Tritonbench paths (#2481)

a8ce4b5a

Add _dynamo.config inline_inbuilt_nn_modules and specialize_float log…

737084ec

Add non-persistent fp8 triton_rowwise kernel (#2484)

6b4f3393

Bump transformer version (#2488)

12820bcc

Add multiple ops support for --op argument (#2490)

a1f4b2e8

Add FusedLinearCrossEntropy (#2485)

dde8528b

Add user release benchmark so that we can run it on pull request (#2489)

bde24013

Install time (#2493)

eae9e50b

Add Tritonbench CI (#2494)

1ac701f7

Log compile ids to pt2_remote_cache and pt2_compile_events

85c33e5b

Trace enter/exit of TorchFunctionModes (#135422) (#137114)

79043be1

Remove ignored modes workaround (#135502) (#137115)

39d65a46

Handle torch function subclass/mode dispatch on generic tensor method…

4fd7c743

adding new configs for servicelab

533d2588

Improve release benchmark suites with a lower value of epoch (#2482)

7742ef2f

Check dyno and dcgm existence before disable them (#2496)

dcd3d319

combining CI and servicelab configs

3a7a4fea

fixing typo in fp8_gemm

b56e2eed

use 3.13 multiline traceback in get_instruction_source_311 (#137617)

f3921ca7

differentiating between some Fbsource only targets and OSS for CI

3900904e

Format `.ci/` / `.github/` / `benchmarks/` / `functorch/` / `tools/` …

f9f52f64

Add AtenOp Benchmarking (#2495)

34d4f94d

change GPT2ForSequenceClassification inference accuracy tolerance (#1…

680d64ea

making CI more flexible for extra data in tritonbench

28d301a4

Add entire _dynamo.config as a json for logging (#137216)

7cb1c0ac

sakipping null values in scribe message

509e94f0

Add fbscribelogger to Dynamo benchmark runner (#137867)

12e1d263

Update the flash-attention submodule (#2500)

ea4433fd

Add host-side Triton TMA support to Dynamo (#137677)

db41e776

Add ncu report analyzer (#2497)

21cc30dc

Change default gpu metric backend (#2501)

c3961913

Update 2.5.0.yaml (#2498)

9e670cd2

Add --op-collection option (#2503)

58f3b1f3

Fix imports

2feadb6a

Add doc for adding custom ops (#2509)

d933cedc

Fix the broken gemm test

384a43d8

Test backward pass in unit test.

eec86128

Make sure all ci-enabled impls are in the output

00c9b9ed

Update AOTEagerandRecordGraphs backend (#138231)

04f0e6cc

Log is_forward field to dynamo_compile scuba table (#2511)

e89c1b37

Revamp PT2 Compile/chromium event logging [1/?]

8358f921

Revert D64438144: Log is_forward field to dynamo_compile scuba table

f7dc0c7c

adding aggregates to servicelab

0a9cd8f0

specifying logged benchmark name for tritonBench servicelab logging

e737b8fe

replace uses of np.ndarray with npt.NDArray

06e35fc5

Disable torch function compilation during guard execution and in comp…

05620407

fixing key error in aggregate data

a21b30e4

Replace __str__ with __repr__ in some places (#136316)

173774d1

Update requirements.txt (#2523)

a45e0dbf

Fixes to prep for weights_only default flip (#2514)

fb590d99

typing compile_fx.py (#138033)

11543183

Add metadata to events in progress, new `dynamo` event

8fce9c12

Log is_forward field to dynamo_compile scuba table (#138505)

e57bbe23

Compiled autograd configs in TLS (#137821)

0e038319

tls access helpers (#138061)

405ba75b

adding fp32 strict and tf32x3 benchmarks for gemm

036012ff

Support range_iterator as a function input (#138657)

367b6ef3

Support overridden __call__ on nn modules (#138619)

b5b342ba

updating hardware and device columns

3245fde9

Release 2.5.1.yaml perf test (#2525)

47ba1ed8

Account for older numpy versions in #2514 (#2524)

4f30c497

fixing gemm for amd

65e5f686

Add logger logging for remote fx graph cache get + put (#2512)

2614ca98

pytorch/benchmark:bisection

f6f1249c

pytorch/benchmark:utils

f8a4e518

Update Typeguard to TypeIs for better type inference (#133814)

bd238116

Use guard_manager consistently instead of check_fn (#138896)

34ea1a1c

Fix naming for AMD in fp8 rowwise fbgemm

713f8002

Back out "tls access helpers (#138061)" and Back out "[compiled autog…

47e3138d

Switch times to us in CompilationMetrics and improvements (#138975)

4ad2712d

add some cpython debugging methods (#138030)

438f82b4

Set use_cuda_graphs in fp8_gemm_rowwise

870be9b4

Remove hammer/generative_recommenders (#2526)

4d6e0fa0

Fix type for "--iter" flag (#2528)

a0890b09

Add start event metadata to collected metadata for PT2 Compile Events

0c8a0f68

Optimize PT2 Compile Events ingestion and column formats

a66ce044

Add isolate mode

cc094dfe

Classify miss-inplaced tensors in logs.

86a366e2

Switch OSS dashboard to use aoti_compile_and_package (#139597)

4a42e064

Specialize symfloats that flow through is_integer (#139572)

3d3b7bb5

facebook-github-bot added cla signed

Add logging for num_triton_bundles

c64ed1e2

Cleanup tl.constexpr HAS_ATTN_SCALE (#2531)

06d867a2

tune tritonbench gemm

672ee070

cut configs into separate file

779c0278

lift free symbols in example_value when create_graph_input (#138363)

abaca229

juliagmt-google closed this 1 year ago

benchmark
update
#2533

Closed

update #2533

benchmark update #2533 Closed

update #2533

benchmark
update
#2533

Closed