Logo
Explore Help
Register Sign In
yujunjun/unstructured
1
0
Fork 0
You've already forked unstructured
mirror of https://github.com/Unstructured-IO/unstructured.git synced 2025-07-16 21:45:54 +00:00
Code Issues Packages Projects Releases Wiki Activity
unstructured/typings/nltk/internals.pyi

4 lines
76 B
Python
Raw Permalink Normal View History

fix(CVE-2024-39705): remove nltk download (#3361) ### Summary Addresses [CVE-2024-39705](https://nvd.nist.gov/vuln/detail/CVE-2024-39705), which highlights the risk of remote code execution when running `nltk.download` . Removes `nltk.download` in favor of a `.tgz` file with the appropriate NLTK data files and checking the SHA256 hash to validate the download. An error now raises if `nltk.download` is invoked. The logic for determining the NLTK download directory is borrowed from `nltk`, so users can still set `NLTK_DATA` as they did previously. ### Testing 1. Create a directory called `~/tmp/nltk_test`. Set `NLTK_DATA=${HOME}/tmp/nltk_test`. 2. From a python interactive session, run: ```python from unstructured.nlp.tokenize import download_nltk_packages download_nltk_packages() ``` 3. Run `ls /tmp/nltk_test/nltk_data`. You should see the downloaded data. --------- Co-authored-by: Steve Canny <stcanny@gmail.com>
2024-07-08 18:55:36 -04:00
from __future__ import annotations
def is_writable(path: str) -> bool: ...
Reference in New Issue Copy Permalink
Powered by Gitea Version: 1.23.5 Page: 123ms Template: 4ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API