mypy
35e843cc - [mypyc] Add efficient librt.base64.b64decode (#20263)

Commit
204 days ago
[mypyc] Add efficient librt.base64.b64decode (#20263) The performance can be 10x faster than stdlib if input is valid base64, or if input has extra non-base64 characters only at the end of input. Similar to the base64 encode implementation I added recently, this uses SIMD instructions when available. The implementation first tries to decode the input optimistically assuming valid base64. If this fails, we'll perform a slow path with a preprocessing step that removes extra characters, and we'll perform a strict base64 decode on the cleaned up input. The semantics aren't 100% compatible with stdlib. First, we raise ValueError on invalid padding instead of `binascii.Error`, since I don't want a runtime dependency on the unrelated a`binascii` module. This needs to be documented, but stdlib can already raise ValueError on other conditions, so the deviation is not huge. Also, some invalid inputs are checked more strictly for padding violations. The stdlib implementation has some mysterious behaviors with invalid inputs that didn't seem worth replicating. The function only accepts a single ASCII str or bytes argument for now, since that seems to be by the far the most common use case. The stdlib function also accepts buffer objects and a `validate` argument. The slow path is still somewhat faster than stdlib (on the order of 1.3x to 2x for longer inputs), at least if the input is much smaller than L1 cache size. Got the initial fast path implementation from ChatGPT, but did a bunch of manual edits afterwards and reviewed carefully.
Author
Parents
Loading