llvm-project
3da28bfb - [clang][diagnostics] Stable IDs for Clang diagnostics (#168153)

Commit
1 day ago
[clang][diagnostics] Stable IDs for Clang diagnostics (#168153) Part of the implementation of [[RFC] Emitting Auditable SARIF Logs from Clang](https://discourse.llvm.org/t/rfc-emitting-auditable-sarif-logs-from-clang/88624) SARIF diagnostics require that each rule have a stable `id` property to identify that rule across runs, even when the compiler or analysis tool has changed. We were previously setting the `id` property to the numeric value of the enum value for that diagnostic within the Clang implementation; this value changes whenever an unrelated diagnostic is inserted or removed earlier in the list. This change sets the `id` property to the _text_ of that same enum value. This value would only change if someone renames the enum value for that diagnostic, which should happen much less frequently than renumbering. For now, we will just assume that renaming happens infrequently enough that existing consumers of SARIF will not notice. In the future, we could take advantage of SARIF's support for `deprecatedIds`, which let a rule specify the IDs by which it was previously known. This would let us rename, split, or combine diagnostics while still being able to correlate the new diagnostic IDs with older SARIF logs and/or suppressions. Nothing in this change affects how warnings are configured on the command line or in `#pragma clang diagnostic`. Those still use warning groups, not the stable IDs. ### Potential discussion topics From @AaronBallman on the RFC: >We believe some open questions remain (things like whether a unique ID is on the per-diagnostic level or on the diagnostic group level, whether the ID is explicitly spelled in the .td file or implicitly generated, whether we document the IDs, etc), but we think those questions are best decided in PR discussions with interested parties rather than an RFC. As a starting point, this PR proposes the following answers to those open questions: - _whether a unique ID is on the per-diagnostic level or on the diagnostic group level_ - per-diagnostic level. For my justification, see [this portion of the RFC discussion](https://discourse.llvm.org/t/rfc-emitting-auditable-sarif-logs-from-clang/88624/11?u=dbartol.). - _whether the ID is explicitly spelled in the .td file or implicitly generated_ - Implicitly generated, but I'd be happy to have a way to explicitly specify it. I just think that the in-code identifier is a reasonable default, and manually reviewing the IDs of thousands of existing diagnostics would add little benefit. - _whether we document the IDs_ - For now, the IDs are only exposed to the user (and other tools) in the SARIF file, so I don't think we need to document these. We could certainly add this information to the output of `diagtool` in the future if users find it relevant.
Author
Parents
Loading