[clang][diagnostics] Stable IDs for Clang diagnostics (#168153)
Part of the implementation of [[RFC] Emitting Auditable SARIF Logs from
Clang](https://discourse.llvm.org/t/rfc-emitting-auditable-sarif-logs-from-clang/88624)
SARIF diagnostics require that each rule have a stable `id` property to
identify that rule across runs, even when the compiler or analysis tool
has changed. We were previously setting the `id` property to the numeric
value of the enum value for that diagnostic within the Clang
implementation; this value changes whenever an unrelated diagnostic is
inserted or removed earlier in the list.
This change sets the `id` property to the _text_ of that same enum
value. This value would only change if someone renames the enum value
for that diagnostic, which should happen much less frequently than
renumbering.
For now, we will just assume that renaming happens infrequently enough
that existing consumers of SARIF will not notice. In the future, we
could take advantage of SARIF's support for `deprecatedIds`, which let a
rule specify the IDs by which it was previously known. This would let us
rename, split, or combine diagnostics while still being able to
correlate the new diagnostic IDs with older SARIF logs and/or
suppressions.
Nothing in this change affects how warnings are configured on the
command line or in `#pragma clang diagnostic`. Those still use warning
groups, not the stable IDs.
### Potential discussion topics
From @AaronBallman on the RFC:
>We believe some open questions remain (things like whether a unique ID
is on the per-diagnostic level or on the diagnostic group level, whether
the ID is explicitly spelled in the .td file or implicitly generated,
whether we document the IDs, etc), but we think those questions are best
decided in PR discussions with interested parties rather than an RFC.
As a starting point, this PR proposes the following answers to those
open questions:
- _whether a unique ID is on the per-diagnostic level or on the
diagnostic group level_ - per-diagnostic level. For my justification,
see [this portion of the RFC
discussion](https://discourse.llvm.org/t/rfc-emitting-auditable-sarif-logs-from-clang/88624/11?u=dbartol.).
- _whether the ID is explicitly spelled in the .td file or implicitly
generated_ - Implicitly generated, but I'd be happy to have a way to
explicitly specify it. I just think that the in-code identifier is a
reasonable default, and manually reviewing the IDs of thousands of
existing diagnostics would add little benefit.
- _whether we document the IDs_ - For now, the IDs are only exposed to
the user (and other tools) in the SARIF file, so I don't think we need
to document these. We could certainly add this information to the output
of `diagtool` in the future if users find it relevant.