Skip to content

Commit ff4b2ae

Browse files
committed
fix(core): always omit license details from full-scan diff request (#CE-224 follow-on)
The full-scan diff request (fullscans.stream_diff) now always sets include_license_details=false, decoupled from the --exclude-license-details flag. This prevents the CE-224 truncation crash (Unterminated string / JSON parse failure on large repos, reported by the tremendous org) from recurring even when the flag is not passed. Why this is safe (no output changes): the license fields the diff endpoint can embed are never consumed off the diff. With --generate-license off, the only consumer (the legal/FOSSA artifact builder) never runs. With --generate-license on, get_license_text_via_purl re-fetches license data from the dedicated PURL endpoint and overwrites whatever the diff embedded before anything reads it. Either way the embedded payload was dead weight that only bloated the response. --exclude-license-details still works but its scope is now narrower: it controls only the dashboard report URL, not the internal diff payload. Help text updated. Core.get_added_and_removed_packages(..., include_license_details=True) remains as an explicit override seam (exercised in tests). Minor bump to 2.4.0: outputs are provably unchanged, but this is a deliberate default-behavior change (2.3.0 made the flag propagate; 2.4.0 makes the lean diff the default), which warrants a minor bump per the project's semver policy. Signed-off-by: lelia <2418071+lelia@users.noreply.github.com>
1 parent 152ea21 commit ff4b2ae

7 files changed

Lines changed: 100 additions & 9 deletions

File tree

CHANGELOG.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,46 @@
11
# Changelog
22

3+
## 2.4.0
4+
5+
### Changed: license details are no longer requested on the full-scan diff
6+
7+
The internal full-scan diff request (`fullscans.stream_diff`, used to compare
8+
alerts between two scans) now always sets `include_license_details=false`,
9+
regardless of the `--exclude-license-details` flag.
10+
11+
**Why this is safe (no output changes):** the license fields the diff endpoint
12+
can embed were never actually consumed off the diff:
13+
14+
- With `--generate-license` **off**, the only consumer of a package's
15+
`licenseDetails`/`licenseAttrib` — the legal/FOSSA artifact builder — is never
16+
invoked, so the embedded license data was parsed and immediately discarded.
17+
- With `--generate-license` **on**, the CLI re-fetches license data from the
18+
dedicated PURL endpoint (`get_license_text_via_purl`) and **overwrites**
19+
whatever the diff embedded before anything reads it.
20+
21+
So in every code path the diff's license payload was dead weight. On large
22+
dependency trees it inflated the diff response past ~2.3 MB and truncated it
23+
mid-string, crashing `response.json()` with
24+
`Unterminated string starting at: ...` (CE-224, reported by the `tremendous`
25+
org). Dropping it keeps the diff lean with **zero change to any output
26+
artifact** (SBOM, legal/FOSSA attribution, report contents).
27+
28+
**Why a minor bump (2.4.0), not a patch:** this is a deliberate default-behavior
29+
change. 2.3.0 fixed the `--exclude-license-details` flag so it correctly
30+
propagated to the diff; this release goes further and makes the lean diff the
31+
default so the crash cannot recur even when the flag is not passed. Per the
32+
project's semver policy a default-behavior change warrants a minor bump, even
33+
though outputs are provably unchanged.
34+
35+
**Effect on `--exclude-license-details`:** the flag still works, but its scope is
36+
now narrower — it controls only the human-facing dashboard report URL
37+
(`?include_license_details=false`), not the internal diff payload. Its `--help`
38+
text was updated to reflect this.
39+
40+
Override seam: `Core.get_added_and_removed_packages(..., include_license_details=True)`
41+
can still request embedded license details explicitly (used in tests); nothing
42+
in the CLI wires the user flag to it anymore.
43+
344
## 2.3.1
445

546
### New: brotli-compressed `.socket.facts.json` upload

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ build-backend = "hatchling.build"
66

77
[project]
88
name = "socketsecurity"
9-
version = "2.3.1"
9+
version = "2.4.0"
1010
requires-python = ">= 3.11"
1111
license = {"file" = "LICENSE"}
1212
dependencies = [

socketsecurity/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
__author__ = 'socket.dev'
2-
__version__ = '2.3.1'
2+
__version__ = '2.4.0'
33
USER_AGENT = f'SocketPythonCLI/{__version__}'

socketsecurity/config.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -705,7 +705,12 @@ def create_argument_parser() -> argparse.ArgumentParser:
705705
"--exclude-license-details",
706706
dest="exclude_license_details",
707707
action="store_true",
708-
help="Exclude license details from the diff report (boosts performance for large repos)"
708+
help=(
709+
"Exclude license details from the dashboard report URL. "
710+
"As of 2.4.0 the internal diff request always omits license details "
711+
"(they were unused there and bloated large-repo responses), so this "
712+
"flag now only affects the report link, not diff performance."
713+
)
709714
)
710715
output_group.add_argument(
711716
"--max-purl-batch-size",

socketsecurity/core/__init__.py

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1070,14 +1070,35 @@ def get_added_and_removed_packages(
10701070
self,
10711071
head_full_scan_id: str,
10721072
new_full_scan_id: str,
1073-
include_license_details: bool = True
1073+
include_license_details: bool = False
10741074
) -> Tuple[Dict[str, Package], Dict[str, Package], Dict[str, Package]]:
10751075
"""
10761076
Get packages that were added and removed between scans.
10771077
10781078
Args:
10791079
head_full_scan_id: Previous scan (maybe None if first scan)
10801080
new_full_scan_id: New scan just created
1081+
include_license_details: Whether to ask the diff endpoint to embed
1082+
per-package license attribution/details in the response.
1083+
1084+
Defaults to ``False`` on purpose. The diff endpoint exists to
1085+
compare alerts between two scans; the license fields it can embed
1086+
are never consumed off the diff:
1087+
* When ``--generate-license`` is OFF, the only consumer of
1088+
``Package.licenseDetails``/``licenseAttrib`` (the legal/FOSSA
1089+
artifact builder) is never invoked, so the embedded license
1090+
data is parsed and then dropped on the floor.
1091+
* When ``--generate-license`` is ON, ``get_license_text_via_purl``
1092+
re-fetches license data from the dedicated PURL endpoint and
1093+
OVERWRITES whatever the diff embedded, before anything reads it.
1094+
Either way the embedded license payload is dead weight, and on
1095+
large dependency trees it inflated the diff response past ~2.3MB
1096+
and truncated it mid-string, crashing ``response.json()``
1097+
(CE-224, customer: Tremendous). Defaulting to ``False`` keeps the
1098+
diff lean with zero change to any output artifact. The parameter
1099+
is retained as an explicit override seam, not wired to the
1100+
``--exclude-license-details`` user flag (which still governs the
1101+
human-facing dashboard report URL).
10811102
10821103
Returns:
10831104
Tuple of (added_packages, removed_packages) dictionaries
@@ -1299,15 +1320,23 @@ def create_new_diff(
12991320
except OSError as e:
13001321
log.warning(f"Failed to clean up temporary file {temp_file}: {e}")
13011322

1302-
# Handle diff generation - now we always have both scans
1323+
# Handle diff generation - now we always have both scans.
1324+
#
1325+
# Note: we intentionally do NOT forward params.include_license_details
1326+
# (the --exclude-license-details user flag) into the diff request. The
1327+
# diff path never consumes embedded license data (see
1328+
# get_added_and_removed_packages docstring), so requesting it only bloats
1329+
# the response and risks the CE-224 truncation crash on large repos. The
1330+
# user flag still controls the dashboard report URL below; it just no
1331+
# longer gates this internal diff payload.
13031332
(
13041333
added_packages,
13051334
removed_packages,
13061335
packages
13071336
) = self.get_added_and_removed_packages(
13081337
head_full_scan_id,
13091338
new_full_scan.id,
1310-
include_license_details=getattr(params, "include_license_details", True)
1339+
include_license_details=False
13111340
)
13121341

13131342
# Separate unchanged packages from added/removed for --strict-blocking support

tests/core/test_sdk_methods.py

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -95,13 +95,17 @@ def test_get_added_and_removed_packages(core):
9595
# Get two different scans to compare
9696
added, removed, all_packages = core.get_added_and_removed_packages("head", "new")
9797

98-
# Verify SDK was called correctly
98+
# Verify SDK was called correctly.
99+
# include_license_details defaults to "false": the diff path never consumes
100+
# embedded license data (license artifacts come from the PURL endpoint), so
101+
# requesting it only bloats the response and risks the CE-224 truncation
102+
# crash on large repos.
99103
core.sdk.fullscans.stream_diff.assert_called_once_with(
100104
core.config.org_slug,
101105
"head",
102106
"new",
103107
use_types=True,
104-
include_license_details="true",
108+
include_license_details="false",
105109
)
106110

107111
# Verify the results
@@ -116,6 +120,18 @@ def test_get_added_and_removed_packages(core):
116120
assert "dp2_t1" in removed # Verify transitive dependencies are also tracked
117121
assert "pypi/direct_package_1@1.6.0" in all_packages # Unchanged package is in full package map
118122

123+
def test_get_added_and_removed_packages_license_override(core):
124+
"""The include_license_details override seam still works when explicitly requested."""
125+
core.get_added_and_removed_packages("head", "new", include_license_details=True)
126+
127+
core.sdk.fullscans.stream_diff.assert_called_once_with(
128+
core.config.org_slug,
129+
"head",
130+
"new",
131+
use_types=True,
132+
include_license_details="true",
133+
)
134+
119135
def test_empty_alerts_preserved(core):
120136
"""Test that empty alerts arrays stay as empty arrays and don't become None"""
121137
# Get the scan that contains dp2 (which has empty alerts array)

uv.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)