[`ruff`] Add `non-empty-init-module` (`RUF067`) (#22143)
Summary
--
This PR adds a new rule, `non-empty-init-module`, which restricts the
kind of
code that can be included in an `__init__.py` file. By default,
docstrings,
imports, and assignments to `__all__` are allowed. When the new
configuration
option `lint.ruff.strictly-empty-init-modules` is enabled, no code at
all is
allowed.
This closes #9848, where these two variants correspond to different
rules in the
[`flake8-empty-init-modules`](https://github.com/samueljsb/flake8-empty-init-modules/)
linter. The upstream rules are EIM001, which bans all code, and EIM002,
which
bans non-import/docstring/`__all__` code. Since we discussed folding
these into
one rule on [Discord], I just added the rule to the `RUF` group instead
of
adding a new `EIM` plugin.
I'm not really sure we need to flag docstrings even when the strict
setting is
enabled, but I just followed upstream for now. Similarly, as I noted in
a TODO
comment, we could also allow more statements involving `__all__`, such
as
`__all__.append(...)` or `__all__.extend(...)`. The current version only
allows
assignments, like upstream, as well as annotated and augmented
assignments,
unlike upstream.
I think when we discussed this previously, we considered flagging the
module
itself as containing code, but for now I followed the upstream
implementation of
flagging each statement in the module that breaks the rule (actually the
upstream linter flags each _line_, including comments). This will
obviously be a
bit noisier, emitting many diagnostics for the same module. But this
also seems
preferable because it flags every statement that needs to be fixed up
front
instead of only emitting one diagnostic for the whole file that persists
as you
keep removing more lines. It was also easy to implement in
`analyze::statement`
without a separate visitor.
The first commit adds the rule and baseline tests, the second commit
adds the
option and a diff test showing the additional diagnostics when the
setting is
enabled.
I noticed a small (~2%) performance regression on our largest benchmark,
so I also added a cached `Checker::in_init_module` field and method
instead of the `Checker::path` method. This was almost the only reason
for the `Checker::path` field at all, but there's one remaining
reference in a `warn_user!`
[call](https://github.com/astral-sh/ruff/blob/main/crates/ruff_linter/src/checkers/ast/analyze/definitions.rs#L188).
Test Plan
--
New tests adapted from the upstream linter
## Ecosystem Report
I've spot-checked the ecosystem report, and the results look "correct."
This is obviously a very noisy rule if you do include code in
`__init__.py` files. We could make it less noisy by adding more
exceptions (e.g. allowing `if TYPE_CHECKING` blocks, allowing
`__getattr__` functions, allowing imports from `importlib` assignments),
but I'm sort of inclined just to start simple and see what users need.
[Discord]:
https://discord.com/channels/1039017663004942429/1082324250112823306/1440086001035771985
---------
Co-authored-by: Micha Reiser <micha@reiser.io>