pytorch
f4818351 - Revert "add Half support for flash attention on CPU (#118368)" (#119204)

Commit View On GitHub

Commit

231 days ago

Revert "add Half support for flash attention on CPU (#118368)" (#119204) This reverts commit a5a63db3bf937a6eff993d1222fab18cc63f9cb2. Fixes #ISSUE_NUMBER Reverts #118368 Got reverted internally but branch got deleted to automation didn't work Mildly edited stack trace ``` ... return torch._dynamo.disable(fn, recursive)(*args, **kwargs) File "torch/_dynamo/eval_frame.py", line 453, in _fn return fn(*args, **kwargs) File "torch/_dynamo/external_utils.py", line 25, in inner return fn(*args, **kwargs) File "torch/fx/experimental/proxy_tensor.py", line 635, in dispatch_trace graph = tracer.trace(root, concrete_args) File "torch/fx/experimental/proxy_tensor.py", line 995, in trace res = super().trace(root, concrete_args) File "torch/_dynamo/eval_frame.py", line 453, in _fn return fn(*args, **kwargs) File "torch/_dynamo/external_utils.py", line 25, in inner return fn(*args, **kwargs) File "torch/fx/_symbolic_trace.py", line 793, in trace (self.create_arg(fn(*args)),), File "torch/fx/experimental/proxy_tensor.py", line 665, in wrapped out = f(*tensors) File "<string>", line 1, in <lambda> File "torch/_functorch/_aot_autograd/traced_function_transforms.py", line 357, in _functionalized_f_helper f_outs = fn(*f_args) File "torch/_functorch/_aot_autograd/traced_function_transforms.py", line 68, in inner_fn outs = fn(*args) File "torch/_functorch/_aot_autograd/utils.py", line 161, in flat_fn tree_out = fn(*args, **kwargs) File "torch/_functorch/_aot_autograd/traced_function_transforms.py", line 618, in functional_call out = PropagateUnbackedSymInts(mod).run( File "torch/fx/interpreter.py", line 145, in run self.env[node] = self.run_node(node) File "torch/_functorch/_aot_autograd/traced_function_transforms.py", line 593, in run_node result = super().run_node(n) File "torch/fx/interpreter.py", line 202, in run_node return getattr(self, n.op)(n.target, args, kwargs) File "torch/fx/interpreter.py", line 274, in call_function return target(*args, **kwargs) File "torch/_ops.py", line 571, in __call__ return self_._op(*args, **kwargs) File "torch/_subclasses/functional_tensor.py", line 380, in __torch_dispatch__ outs_unwrapped = func._op_dk( File "torch/utils/_stats.py", line 20, in wrapper return fn(*args, **kwargs) File "torch/fx/experimental/proxy_tensor.py", line 744, in __torch_dispatch__ return self.inner_torch_dispatch(func, types, args, kwargs) File "torch/fx/experimental/proxy_tensor.py", line 779, in inner_torch_dispatch return proxy_call(self, func, self.pre_dispatch, args, kwargs) File "torch/fx/experimental/proxy_tensor.py", line 423, in proxy_call r = maybe_handle_decomp(proxy_mode, func, args, kwargs) File "torch/fx/experimental/proxy_tensor.py", line 1225, in maybe_handle_decomp return CURRENT_DECOMPOSITION_TABLE[op](*args, **kwargs) File "torch/_decomp/decompositions.py", line 4322, in scaled_dot_product_flash_attention_for_cpu torch._check( File "torch/__init__.py", line 1133, in _check _check_with(RuntimeError, cond, message) File "torch/__init__.py", line 1116, in _check_with raise error_type(message_evaluated) RuntimeError: query must be FP32, FP64, BF16 but got torch.float16 While executing %_scaled_dot_product_flash_attention_for_cpu : [num_users=1] = call_function[target=torch.ops.aten._scaled_dot_product_flash_attention_for_cpu.default](args = (%l_q_, %l_k_, %l_v_), kwargs = {attn_mask: %l_attn_mask_}) Original traceback: File "executorch/backends/xnnpack/partition/graphs/sdpa.py", line 34, in forward return torch.nn.functional.scaled_dot_product_attention( ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/119204 Approved by: https://github.com/kit1980

Author

clee2000

Committer

pytorchmergebot

Parents

ab613a40

pytorch f4818351 - Revert "add Half support for flash attention on CPU (#118368)" (#119204)

Commit

pytorch
f4818351 - Revert "add Half support for flash attention on CPU (#118368)" (#119204)