unstructured
01dbc7b4 - fix: `nltk` data download path to prevent redundant nested directories (#3546)

Commit
1 year ago
fix: `nltk` data download path to prevent redundant nested directories (#3546) Closes #3543. ### Summary This PR addresses an issue with the NLTK data download process. Previously, when downloading NLTK data, a nested "nltk_data" directory was created within the parent "nltk_data" directory if the parent directory already existed. This redundant directory structure led to two significant problems: - errors in checking if data had already been downloaded, potentially causing redundant downloads in subsequent calls. - failures in loading models from the downloaded NLTK data due to incorrect path resolution. This fix modifies the NLTK data download logic to prevent creation of unnecessary nested directories. If the download path ends with "nltk_data" and that directory already exists, we now use the existing directory instead of creating a new nested one. ### Testing CI should pass.
Parents
Loading