turbo
f696b0f0 - refactor(turborepo): Move paths to UTF-8 (#5248)

Commit
2 years ago
refactor(turborepo): Move paths to UTF-8 (#5248) ### Description We're moving all paths to UTF-8 for a whole bunch of reasons such as: - We know it'll be supported everywhere, across platforms, in the browser, and so on. - We have no evidence that any user is using non-UTF-8 paths - It's very very hard to manipulate paths without converting them to Rust Strings. - For instance the [only way to add a trailing slash](https://users.rust-lang.org/t/trailing-in-paths/43166/8) to a path is by doing `path.push("")` - `bstr` [implicitly converts](https://docs.rs/bstr/latest/bstr/#handling-of-invalid-utf-8) invalid Unicode into replacement characters, which is probably not what we want - `bstr` also explicitly notes that the end result of its conversion functions (which again either error or implicitly convert) is [“you’re guaranteed to write correct code for Unix, at the cost of getting a corner case wrong on Windows”](https://docs.rs/bstr/latest/bstr/#file-paths-and-os-strings) - Considering we know that we have Windows users and are committed to supporting Windows, that should be a higher priority than supporting hypothetical users using non-UTF-8 encodings. - To quote [camino](https://docs.rs/camino/latest/camino/): - “Unicode is the common subset of supported paths across Windows and Unix platforms.” - “The '[makefile problem](https://www.mercurial-scm.org/wiki/EncodingStrategy#The_.22makefile_problem.22)' (which also applies to `Cargo.toml`, and any other metadata file that lists the names of other files) has *no general, cross-platform solution* in systems that support non-UTF-8 paths. However, restricting paths to UTF-8 eliminates this problem.” - Basically, if we have non-Unicode encodings, you could have “packages/星巴克” in your turbo.json that does not match to “packages/星巴克” in your file system because the file system is using big5 and turbo.json is using Unicode. - “There are already many systems, such as Cargo, that only support UTF-8 paths. If your own tool interacts with any such system, you can assume that paths are valid UTF-8 without creating any additional burdens on consumers.” - [npm does not allow even Unicode in package names](https://github.com/npm/validate-npm-package-name). Only url-safe characters, i.e. characters, numbers and a few other ASCII characters - Next has [issues with Unicode paths too](https://github.com/vercel/next.js/issues/10084) - How would you even import a non-Unicode JavaScript file? JavaScript strings are Unicode. - `path-slash` also only works on `AsRef<str` or requires a lossy conversion. - Glob walking appears to assume UTF-8 as well. - This simplifies our code significantly since we can drop a lot of errors on invalid Unicode that are sprinkled throughout the codebase. ### Testing Instructions <!-- Give a quick description of steps to test your changes. --> --------- Co-authored-by: --global <Nicholas Yang>
Author
Parents
Loading