[mypyc] Use faster METH_FASTCALL wrapper functions on Python 3.7+ (#9894)
Implement faster argument parsing based on METH_FASTCALL on supported
Python versions.
Use `vgetargskeywordsfast` extracted from Python 3.9 with some modifications:
* Support required keyword-only arguments, `*args` and `**kwargs`
* Only support the 'O' type (to reduce code size and speed things up)
The modifications are very similar to what we have in the old-style
argument parsing logic.
The legacy calling convention is still used for `__init__` and `__call__`. I'll add
`__call__` support in a separate PR. I haven't looked into supporting `__init__`
yet.
Here are some benchmark results (on Python 3.8)
* keyword_args_from_interpreted: 3.5x faster than before
* positional_args_from_interpreted: 1.4x faster than before
However, the above benchmarks are still slower when compiled. I'll continue
working on further improvements after this PR.
Fixes mypyc/mypyc#578.