Update dependency nltk to v3.9.4 [SECURITY]#76
Open
renovate[bot] wants to merge 1 commit into
Open
Conversation
|
Kudos, SonarCloud Quality Gate passed! |
304888b to
5872924
Compare
5872924 to
cd32ee9
Compare
|
cd32ee9 to
508d692
Compare
508d692 to
f80be06
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.











This PR contains the following updates:
==3.4.5→==3.9.4Warning
Some dependencies could not be looked up. Check the Dependency Dashboard for more information.
Inefficient Regular Expression Complexity in nltk (word_tokenize, sent_tokenize)
CVE-2021-43854 / GHSA-f8m6-h2c7-8h9x
More information
Details
Impact
The vulnerability is present in
PunktSentenceTokenizer,sent_tokenizeandword_tokenize. Any users of this class, or these two functions, are vulnerable to a Regular Expression Denial of Service (ReDoS) attack.In short, a specifically crafted long input to any of these vulnerable functions will cause them to take a significant amount of execution time. The effect of this vulnerability is noticeable with the following example:
Which gave the following output during testing:
I canceled the execution of the program after running it for several hours.
If your program relies on any of the vulnerable functions for tokenizing unpredictable user input, then we would strongly recommend upgrading to a version of NLTK without the vulnerability, or applying the workaround described below.
Patches
The problem has been patched in NLTK 3.6.6. After the fix, running the above program gives the following result:
This output shows a linear relationship in execution time versus input length, which is desirable for regular expressions.
We recommend updating to NLTK 3.6.6+ if possible.
Workarounds
The execution time of the vulnerable functions is exponential to the length of a malicious input. With other words, the execution time can be bounded by limiting the maximum length of an input to any of the vulnerable functions. Our recommendation is to implement such a limit.
References
For more information
If you have any questions or comments about this advisory:
Severity
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:H/SC:N/SI:N/SA:NReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
NLTK Vulnerable to REDoS
CVE-2021-3842 / GHSA-rqjh-jp2r-59cj
More information
Details
NLTK is vulnerable to REDoS in some RegexpTaggers used in the functions
get_pos_taggerandmalt_regex_tagger.Severity
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:H/SC:N/SI:N/SA:NReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
NLTK Vulnerable to REDoS
CVE-2021-3828 / GHSA-2ww3-fxvq-293j
More information
Details
The nltk package is vulnerable to ReDoS (regular expression denial of service). An attacker that is able to provide as an input to the [
_read_comparison_block()(https://github.com/nltk/nltk/blob/23f4b1c4b4006b0cb3ec278e801029557cec4e82/nltk/corpus/reader/comparative_sents.py#L259) function in the filenltk/corpus/reader/comparative_sents.pymay cause an application to consume an excessive amount of CPU.Severity
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:H/SC:N/SI:N/SA:NReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
ntlk unsafe deserialization vulnerability
CVE-2024-39705 / GHSA-cgvx-9447-vcch
More information
Details
NLTK through 3.8.1 allows remote code execution if untrusted packages have pickled Python code, and the integrated data package download functionality is used. This affects, for example, averaged_perceptron_tagger and punkt.
Severity
CVSS:4.0/AV:N/AC:H/AT:P/PR:N/UI:A/VC:H/VI:H/VA:H/SC:N/SI:N/SA:NReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
NLTK has a Zip Slip Vulnerability
CVE-2025-14009 / GHSA-7p94-766c-hgjp
More information
Details
A critical vulnerability exists in the NLTK downloader component of nltk/nltk, affecting all versions. The _unzip_iter function in nltk/downloader.py uses zipfile.extractall() without performing path validation or security checks. This allows attackers to craft malicious zip packages that, when downloaded and extracted by NLTK, can execute arbitrary code. The vulnerability arises because NLTK assumes all downloaded packages are trusted and extracts them without validation. If a malicious package contains Python files, such as init.py, these files are executed automatically upon import, leading to remote code execution. This issue can result in full system compromise, including file system access, network access, and potential persistence mechanisms.
Severity
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting') in nltk
CVE-2026-33230 / GHSA-gfwx-w7gr-fvh7
More information
Details
Summary
nltk.app.wordnet_appcontains a reflected cross-site scripting issue in thelookup_...route. A craftedlookup_<payload>URL can inject arbitrary HTML/JavaScript into the response page because attacker-controlledworddata is reflected into HTML without escaping. This impacts users running the local WordNet Browser server and can lead to script execution in the browser origin of that application.Details
The vulnerable flow is in
nltk/app/wordnet_app.py:nltk/app/wordnet_app.py:144lookup_are handled as HTML responses:page, word = page_from_href(sp)nltk/app/wordnet_app.py:755page_from_href()callspage_from_reference(Reference.decode(href))nltk/app/wordnet_app.py:769word = href.wordnltk/app/wordnet_app.py:796wordis inserted directly into the HTML body:body = "The word or words '%s' were not found in the dictionary." % wordThis is inconsistent with the
searchroute, which does escape user input:nltk/app/wordnet_app.py:136word = html.escape(...)As a result, a malicious
lookup_...payload can inject script into the response page.The issue is exploitable because:
Reference.decode()accepts attacker-controlled base64-encoded pickle data for the URL state.wordis reflected into HTML withouthtml.escape().HTTPServer(("", port), MyServerHandler), so it listens on all interfaces by default, not justlocalhost.PoC
docker run -d --name nltk-wordnet-web -p 8002:8002 \ nltk-sandbox \ python -c "import nltk; nltk.download('wordnet', quiet=True); from nltk.app.wordnet_app import wnb; wnb(8002, False)"("<script>alert(1)</script>", {})Encoded payload:
curl -s "http://127.0.0.1:8002/lookup_gAWVIQAAAAAAAACMGTxzY3JpcHQ-YWxlcnQoMSk8L3NjcmlwdD6UfZSGlC4="I also validated the issue directly at function level in Docker:
Observed output:
Impact
This is a reflected XSS issue in the NLTK WordNet Browser web UI.
An attacker who can convince a user to open a crafted
lookup_...URL can execute arbitrary JavaScript in the origin of the local WordNet Browser application. This can be used to:This primarily impacts users who run
nltk.app.wordnet_appas a local or self-hosted HTTP service and open attacker-controlled links.Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:NReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
Unauthenticated remote shutdown in nltk.app.wordnet_app
CVE-2026-33231 / GHSA-jm6w-m3j8-898g
More information
Details
Summary
nltk.app.wordnet_appallows unauthenticated remote shutdown of the local WordNet Browser HTTP server when it is started in its default mode. A simpleGET /SHUTDOWN%20THE%20SERVERrequest causes the process to terminate immediately viaos._exit(0), resulting in a denial of service.Details
The vulnerable logic is in
nltk/app/wordnet_app.py:nltk/app/wordnet_app.py:242server = HTTPServer(("", port), MyServerHandler)nltk/app/wordnet_app.py:87if unquote_plus(sp) == "SHUTDOWN THE SERVER":nltk/app/wordnet_app.py:88server_modenltk/app/wordnet_app.py:93runBrowser=True, thereforeserver_mode=False), the handler terminates the process directly:os._exit(0)This means any party that can reach the listening port can stop the service with a single unauthenticated GET request when the browser is started in its normal mode.
PoC
docker run -d --name nltk-wordnet-web-default-retest -p 8004:8004 \ nltk-sandbox \ python -c "import nltk; nltk.download('wordnet', quiet=True); from nltk.app.wordnet_app import wnb; wnb(8004, True)"Observed result:
Observed result:
Observed results:
Impact
This is an unauthenticated denial-of-service issue in the NLTK WordNet Browser HTTP server.
Any reachable client can terminate the service remotely when the application is started in its default mode. The impact is limited to service availability, but it is still security-relevant because:
This primarily affects users who run
nltk.app.wordnet_appand expose or otherwise allow access to its listening port.Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:HReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
NLTK has Arbitrary File Read via Absolute Path Input in nltk.util.filestring()
CVE-2026-0846 / GHSA-h8wq-7xc4-p3qx
More information
Details
A vulnerability in the
filestring()function of thenltk.utilmodule in nltk version 3.9.2 allows arbitrary file read due to improper validation of input paths. The function directly opens files specified by user input without sanitization, enabling attackers to access sensitive system files by providing absolute paths or traversal paths. This vulnerability can be exploited locally or remotely, particularly in scenarios where the function is used in web APIs or other interfaces that accept user-supplied input.Severity
CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:L/A:LReferences
This data is provided by the GitHub Advisory Database (CC-BY 4.0).
Release Notes
nltk/nltk (nltk)
v3.9.4Compare Source
v3.9.3Compare Source
v3.9.2Compare Source
v3.9.1Compare Source
v3.9Compare Source
v3.8.1Compare Source
v3.8Compare Source
v3.7Compare Source
v3.6.7Compare Source
v3.6.6Compare Source
v3.6.5Compare Source
v3.6.4Compare Source
v3.6.3Compare Source
v3.6.2Compare Source
v3.6.1Compare Source
v3.6Compare Source
v3.5Compare Source
Configuration
📅 Schedule: (UTC)
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.