[SYCL] Reduce sycl/khr/split_headers/accessor.hpp compile-time overhead (#22073)
This change cuts the compile-time cost of including
`sycl/khr/split_headers/accessor.hpp` by ~27% in a minimal TU (single
`#include <sycl/accessor.hpp>`, -fsycl -fsycl-device-only -std=c++20):
| Metric | Before | After | Delta |
|---------------------|---------|---------|--------------|
| Transitive headers | 402 | 322 | -80 (-20%) |
| Total source-time | 887.9ms | 643.4ms | -244.5ms |
Measured with clang -ftime-trace, attributing self-time per header.
* Introduce `sycl/detail/fwd/buffer.hpp` so accessor.hpp can drop
`#include <sycl/buffer.hpp>` in favor of a forward declaration. The
primary `class buffer` template default arguments live in the new fwd
header (with the corresponding defaults removed from buffer.hpp) to keep
them in one place. Companion fwd-decls in handler.hpp,
buffer_properties.hpp, and detail/backend_traits_opencl.hpp are retired
in favor of the new header.
* Drop dead/unused includes from accessor.hpp:
- `<sycl/properties/buffer_properties.hpp>`: accessor.hpp itself
references no symbol from this header; the SYCL spec also does not
mandate that buffer-construction properties be visible from the accessor
header. The include is moved into buffer.hpp (its natural home) and into
detail/core.hpp so existing tests code that only includes accessor.hpp +
uses `property::buffer::*` still works in the canonical aggregator path.
* Drop `<algorithm>` from sycl/detail/property_list_base.hpp. The
comment claimed the include was for `iter_swap`, but no algorithm call
exists in the file. Removing it shaves ~30ms via the `<bits/ranges_*>`
chain.
* Move the body of `detail::cannot_be_called_on_host` from accessor.hpp
to source/detail/accessor_impl.cpp and drop the unused `<iostream>`
include from accessor.hpp. Adds one new exported symbol. The function
was used in detail namespace and looks as debug infrastructure.
* Remove the debug-only `friend std::ostream& operator<<` from
accessor_iterator (no in-tree consumers) and drop its `<ostream>`
include.
* Split the inline `operator<<(std::ostream&, backend)` from
sycl/backend_types.hpp into a new sycl/detail/backend_types_io.hpp so
backend_types.hpp no longer pulls `<ostream>`. The full operator remains
reachable through the canonical entry points (`<sycl/backend.hpp>` and
`<sycl/detail/core.hpp>`), so user code using either of those continues
to compile.
The four ESIMD type-traits headers (bfloat16/tfloat32/bf8/hf8) were
previously relying on a transitive `<ostream>` to define their own
`operator<<`. They are now self-contained.