Skip to content

Commit d944ae3

Browse files
committed
feat(core): brotli-compress .socket.facts.json on full-scan upload
Compress the reachability facts file to a `.socket.facts.json.br` multipart part before uploading it as part of a full scan. The Socket API transparently decompresses parts named exactly `.socket.facts.json.br` and stores plain JSON, so the stored result is unchanged while the on-the-wire payload shrinks by roughly 10-40x for typical facts files. This keeps large tier-1 reachability facts files under the API's per-file upload size cap. Previously an oversized facts file made the full-scan upload fail (surfaced as an HTTP 4xx/502 with the scan stuck and no report produced). - Compress at the upload boundary (Core.create_full_scan); the on-disk file is left untouched so local consumers still read plain .socket.facts.json. - Only files whose basename is exactly .socket.facts.json are compressed (the API matches that exact name); a custom --reach-output-file name and empty placeholder files are left as plain uploads. - Never blocks an upload: any compression failure falls back to the plain file. - Add brotli (CPython) / brotlicffi (PyPy) dependency.
1 parent cdd3bf6 commit d944ae3

6 files changed

Lines changed: 365 additions & 5 deletions

File tree

CHANGELOG.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,31 @@
11
# Changelog
22

3+
## 2.3.1
4+
5+
### New: brotli-compressed `.socket.facts.json` upload
6+
7+
The reachability facts file (`.socket.facts.json`) is now brotli-compressed before it is
8+
uploaded as part of a full scan. The Socket API transparently decompresses any multipart
9+
part named exactly `.socket.facts.json.br` and stores it as plain `.socket.facts.json`, so
10+
the stored result is unchanged — but the on-the-wire payload shrinks dramatically (a
11+
~262 MB facts file compresses to roughly 15–30 MB).
12+
13+
This fixes large tier‑1 reachability scans that previously failed when the uncompressed
14+
facts file exceeded the API's per‑file upload size cap (surfaced to the CLI as an HTTP
15+
4xx/“502”, leaving the scan stuck with no report).
16+
17+
Details:
18+
19+
- Compression happens at the upload boundary (`Core.create_full_scan`); the file on disk is
20+
left untouched, so local consumers (SARIF/JSON output, tier‑1 finalize, alert selection)
21+
continue to read the plain `.socket.facts.json`.
22+
- Only a file whose basename is exactly `.socket.facts.json` is compressed (the API matches
23+
that exact name). A custom `--reach-output-file` name is uploaded uncompressed, as before.
24+
- Empty baseline-scan placeholder files are not compressed.
25+
- Compression never blocks an upload: if it fails for any reason it falls back to uploading
26+
the plain file.
27+
- Adds a `brotli` (CPython) / `brotlicffi` (PyPy) dependency.
28+
329
## 2.3.0
430

531
### New: `--exit-code-on-api-error`

pyproject.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ build-backend = "hatchling.build"
66

77
[project]
88
name = "socketsecurity"
9-
version = "2.3.0"
9+
version = "2.3.1"
1010
requires-python = ">= 3.11"
1111
license = {"file" = "LICENSE"}
1212
dependencies = [
@@ -19,6 +19,8 @@ dependencies = [
1919
"socketdev>=3.0.33,<4.0.0",
2020
"bs4>=0.0.2",
2121
"markdown>=3.10",
22+
"brotli>=1.0.9; platform_python_implementation == 'CPython'",
23+
"brotlicffi>=1.0.9; platform_python_implementation != 'CPython'",
2224
]
2325
readme = "README.md"
2426
description = "Socket Security CLI for CI/CD"

socketsecurity/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
__author__ = 'socket.dev'
2-
__version__ = '2.3.0'
2+
__version__ = '2.3.1'
33
USER_AGENT = f'SocketPythonCLI/{__version__}'

socketsecurity/core/__init__.py

Lines changed: 116 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,24 @@
5151

5252
_HUMANIZE_BOUNDARY = re.compile(r"(?<=[a-z0-9])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])")
5353

54+
# Reachability facts-file upload compression.
55+
#
56+
# The Socket full-scan endpoint transparently brotli-decompresses any multipart part
57+
# whose basename is exactly ``.socket.facts.json.br`` and stores it as plain
58+
# ``.socket.facts.json``. Compressing the facts file on upload keeps it well under the
59+
# server's per-file size cap (a ~262 MB facts file compresses to roughly 15-30 MB),
60+
# which is required for large reachability (tier 1) scans to succeed.
61+
#
62+
# The server matches the *exact* name ``.socket.facts.json.br``, so we only compress
63+
# files whose basename is exactly ``.socket.facts.json`` (a custom ``--reach-output-file``
64+
# name would not be decompressed server-side, so it is left as a plain upload).
65+
SOCKET_FACTS_FILENAME = ".socket.facts.json"
66+
SOCKET_FACTS_BROTLI_FILENAME = ".socket.facts.json.br"
67+
# Brotli quality (0-11); 5 is a good speed/ratio tradeoff for large JSON payloads.
68+
SOCKET_FACTS_BROTLI_QUALITY = 5
69+
# Largest brotli window (2**24 bytes); improves the ratio on large facts files.
70+
SOCKET_FACTS_BROTLI_LGWIN = 24
71+
5472

5573
def _humanize_alert_type(alert_type: str) -> str:
5674
"""Convert a camelCase/PascalCase alert type into a Title-Cased label.
@@ -544,6 +562,91 @@ def finalize_tier1_scan(self, full_scan_id: str, facts_file_path: str) -> bool:
544562
log.debug(f"Unable to finalize tier 1 scan: {e}")
545563
return False
546564

565+
@staticmethod
566+
def _compress_facts_file(source_path: str) -> str:
567+
"""Brotli-compress a ``.socket.facts.json`` file to a sibling ``.socket.facts.json.br``.
568+
569+
The source is streamed in chunks so a large facts file (hundreds of MB) never has
570+
to be held in memory at once. The compressed file is written next to the source so
571+
that the multipart key the SDK derives keeps the same directory prefix, only with a
572+
``.br`` basename.
573+
574+
Args:
575+
source_path: Path to the plain ``.socket.facts.json`` file.
576+
577+
Returns:
578+
Path to the compressed sibling file.
579+
"""
580+
# Imported lazily so the dependency is only needed when actually uploading a facts
581+
# file. brotlicffi is the API-compatible fallback used on PyPy / non-CPython runtimes.
582+
try:
583+
import brotli
584+
except ImportError:
585+
import brotlicffi as brotli
586+
587+
target_path = os.path.join(os.path.dirname(source_path), SOCKET_FACTS_BROTLI_FILENAME)
588+
compressor = brotli.Compressor(
589+
quality=SOCKET_FACTS_BROTLI_QUALITY,
590+
lgwin=SOCKET_FACTS_BROTLI_LGWIN,
591+
)
592+
chunk_size = 1024 * 1024 # 1 MiB
593+
with open(source_path, "rb") as src, open(target_path, "wb") as dst:
594+
while True:
595+
chunk = src.read(chunk_size)
596+
if not chunk:
597+
break
598+
compressed = compressor.process(chunk)
599+
if compressed:
600+
dst.write(compressed)
601+
dst.write(compressor.finish())
602+
return target_path
603+
604+
def _compress_facts_files_for_upload(self, files: List[str]) -> Tuple[List[str], List[str]]:
605+
"""Replace any ``.socket.facts.json`` upload entry with a brotli-compressed ``.br`` sibling.
606+
607+
The Socket full-scan endpoint transparently decompresses a multipart part named
608+
exactly ``.socket.facts.json.br``, so compressing here keeps a large facts file under
609+
the server's per-file size cap without changing the stored result. Files whose
610+
basename is not exactly ``.socket.facts.json`` are left untouched (the server only
611+
matches that exact name), as are empty placeholder files (e.g. baseline scans).
612+
613+
Compression never blocks an upload: if it fails for any reason (missing optional
614+
``brotli`` dependency, unwritable directory, etc.) the original plain file is used.
615+
616+
Args:
617+
files: The list of file paths about to be uploaded.
618+
619+
Returns:
620+
``(upload_files, temp_paths)`` where ``upload_files`` is the possibly-rewritten
621+
list to upload and ``temp_paths`` are compressed files the caller must delete
622+
once the upload completes.
623+
"""
624+
upload_files: List[str] = []
625+
temp_paths: List[str] = []
626+
for file_path in files:
627+
try:
628+
if (
629+
os.path.basename(file_path) == SOCKET_FACTS_FILENAME
630+
and os.path.isfile(file_path)
631+
and os.path.getsize(file_path) > 0
632+
):
633+
compressed_path = self._compress_facts_file(file_path)
634+
log.debug(
635+
f"Brotli-compressed {file_path} for upload: "
636+
f"{os.path.getsize(file_path)} -> {os.path.getsize(compressed_path)} bytes "
637+
f"(uploading as {SOCKET_FACTS_BROTLI_FILENAME})"
638+
)
639+
upload_files.append(compressed_path)
640+
temp_paths.append(compressed_path)
641+
continue
642+
except Exception as e:
643+
# Never let compression break an upload: fall back to the plain file.
644+
log.warning(
645+
f"Failed to brotli-compress facts file {file_path}, uploading uncompressed: {e}"
646+
)
647+
upload_files.append(file_path)
648+
return upload_files, temp_paths
649+
547650
def create_full_scan(self, files: List[str], params: FullScanParams, base_paths: Optional[List[str]] = None) -> FullScan:
548651
"""
549652
Creates a new full scan via the Socket API.
@@ -559,7 +662,19 @@ def create_full_scan(self, files: List[str], params: FullScanParams, base_paths:
559662
log.info("Creating new full scan")
560663
create_full_start = time.time()
561664

562-
res = self.sdk.fullscans.post(files, params, use_types=True, use_lazy_loading=True, max_open_files=50, base_paths=base_paths)
665+
# Brotli-compress the reachability facts file (if present) so it is uploaded as a
666+
# `.socket.facts.json.br` part. The API decompresses it server-side, keeping a large
667+
# facts file under the per-file upload size cap. See _compress_facts_files_for_upload.
668+
upload_files, compressed_temp_files = self._compress_facts_files_for_upload(files)
669+
try:
670+
res = self.sdk.fullscans.post(upload_files, params, use_types=True, use_lazy_loading=True, max_open_files=50, base_paths=base_paths)
671+
finally:
672+
for temp_file in compressed_temp_files:
673+
try:
674+
os.unlink(temp_file)
675+
log.debug(f"Cleaned up temporary compressed facts file: {temp_file}")
676+
except OSError as cleanup_error:
677+
log.debug(f"Failed to clean up temporary compressed facts file {temp_file}: {cleanup_error}")
563678
if not res.success:
564679
log.error(f"Error creating full scan: {res.message}, status: {res.status}")
565680
raise Exception(f"Error creating full scan: {res.message}, status: {res.status}")
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
"""Tests for brotli compression of the reachability facts file on upload.
2+
3+
The Socket full-scan endpoint transparently decompresses a multipart part named exactly
4+
`.socket.facts.json.br`, so the CLI compresses the facts file before uploading it. These
5+
tests cover the helpers in `Core` that do that rewriting.
6+
"""
7+
import json
8+
import os
9+
10+
try:
11+
import brotli
12+
except ImportError: # pragma: no cover - PyPy / non-CPython fallback
13+
import brotlicffi as brotli
14+
15+
from socketsecurity.core import (
16+
SOCKET_FACTS_BROTLI_FILENAME,
17+
SOCKET_FACTS_FILENAME,
18+
Core,
19+
)
20+
21+
22+
def _write(path, data: bytes):
23+
with open(path, "wb") as f:
24+
f.write(data)
25+
return path
26+
27+
28+
def test_compress_facts_file_roundtrips(tmp_path):
29+
"""The compressed sibling decompresses back to the exact original bytes."""
30+
source = tmp_path / SOCKET_FACTS_FILENAME
31+
payload = json.dumps({"components": [{"id": str(i)} for i in range(1000)]}).encode()
32+
_write(str(source), payload)
33+
34+
compressed_path = Core._compress_facts_file(str(source))
35+
36+
# Compressed file is a sibling named exactly `.socket.facts.json.br`.
37+
assert compressed_path == str(tmp_path / SOCKET_FACTS_BROTLI_FILENAME)
38+
assert os.path.basename(compressed_path) == SOCKET_FACTS_BROTLI_FILENAME
39+
# The original is untouched (other code paths still read it locally).
40+
assert source.read_bytes() == payload
41+
# Roundtrip matches.
42+
with open(compressed_path, "rb") as f:
43+
assert brotli.decompress(f.read()) == payload
44+
45+
46+
def test_compress_for_upload_rewrites_facts_entry(tmp_path):
47+
"""A `.socket.facts.json` entry is replaced by its `.br` sibling; others pass through."""
48+
core = Core.__new__(Core)
49+
facts = _write(str(tmp_path / SOCKET_FACTS_FILENAME), b'{"a": 1}')
50+
manifest = _write(str(tmp_path / "package.json"), b"{}")
51+
52+
upload_files, temp_paths = core._compress_facts_files_for_upload([facts, manifest])
53+
54+
expected_br = str(tmp_path / SOCKET_FACTS_BROTLI_FILENAME)
55+
assert upload_files == [expected_br, manifest]
56+
assert temp_paths == [expected_br]
57+
assert os.path.isfile(expected_br)
58+
# Non-facts files are never compressed.
59+
assert manifest in upload_files
60+
61+
62+
def test_compress_for_upload_preserves_directory_prefix(tmp_path):
63+
"""The `.br` sibling keeps the facts file's directory so the relative key is preserved."""
64+
core = Core.__new__(Core)
65+
subdir = tmp_path / "nested"
66+
subdir.mkdir()
67+
facts = _write(str(subdir / SOCKET_FACTS_FILENAME), b'{"a": 1}')
68+
69+
upload_files, temp_paths = core._compress_facts_files_for_upload([facts])
70+
71+
assert upload_files == [str(subdir / SOCKET_FACTS_BROTLI_FILENAME)]
72+
assert temp_paths == [str(subdir / SOCKET_FACTS_BROTLI_FILENAME)]
73+
74+
75+
def test_empty_facts_file_is_not_compressed(tmp_path):
76+
"""Empty placeholder facts files (e.g. baseline scans) are uploaded as-is."""
77+
core = Core.__new__(Core)
78+
empty_facts = _write(str(tmp_path / SOCKET_FACTS_FILENAME), b"")
79+
80+
upload_files, temp_paths = core._compress_facts_files_for_upload([empty_facts])
81+
82+
assert upload_files == [empty_facts]
83+
assert temp_paths == []
84+
assert not (tmp_path / SOCKET_FACTS_BROTLI_FILENAME).exists()
85+
86+
87+
def test_custom_named_facts_file_is_not_compressed(tmp_path):
88+
"""A custom --reach-output-file name is not compressed (server only matches the exact name)."""
89+
core = Core.__new__(Core)
90+
custom = _write(str(tmp_path / "custom.facts.json"), b'{"a": 1}')
91+
92+
upload_files, temp_paths = core._compress_facts_files_for_upload([custom])
93+
94+
assert upload_files == [custom]
95+
assert temp_paths == []
96+
97+
98+
def test_compression_failure_falls_back_to_plain_file(tmp_path, monkeypatch):
99+
"""If compression raises, the original plain file is uploaded instead of failing."""
100+
core = Core.__new__(Core)
101+
facts = _write(str(tmp_path / SOCKET_FACTS_FILENAME), b'{"a": 1}')
102+
103+
def boom(_source_path):
104+
raise RuntimeError("brotli unavailable")
105+
106+
monkeypatch.setattr(Core, "_compress_facts_file", staticmethod(boom))
107+
108+
upload_files, temp_paths = core._compress_facts_files_for_upload([facts])
109+
110+
assert upload_files == [facts]
111+
assert temp_paths == []

0 commit comments

Comments
 (0)