Skip to content

Commit 20957bc

Browse files
committed
refactor(manifest/bazel): walker takes injected prune policy; reuse IGNORED_DIRS
`findWorkspaceRoots` no longer hardcodes the directory-prune set — callers pass `ignoreDirNames: ReadonlySet<string>` and `ignoreDirPrefixes: readonly string[]` via options. Neither defaults to anything; absent means no pruning. This keeps the walker decoupled from any particular ignore policy and avoids duplicating the codebase-wide `IGNORED_DIRS` list. `src/utils/glob.mts` exports `IGNORED_DIRS` so the orchestrator can compose it with Bazel-specific extras. The orchestrator's composed set: `IGNORED_DIRS` plus `.hg`, `.idea`, `.pnpm-store`, `.socket-auto-manifest`, `.svn`, `.vscode`; prefixes `bazel-` and `dist`. Also tighten `MAX_WALK_DEPTH` from 16 → 8. Deepest workspace marker observed across the surveyed OSS corpus is 9 (bazel-self test fixtures); deepest in realistic application code is 7 (checkmk's thirdparty layout). The cap gives one level of headroom over the realistic max while still guarding against pathological symlink loops that slipped past any prefix prune the caller supplied. Walker test rewritten against the new injected API: covers the no-prune-by-default case (`node_modules/MODULE.bazel` surfaces unless the caller ignores `node_modules`), injected name and prefix prunes, and the bazel-* symlink case under the prefix injection.
1 parent fa3d3de commit 20957bc

4 files changed

Lines changed: 111 additions & 57 deletions

File tree

src/commands/manifest/bazel/bazel-workspace-walk.mts

Lines changed: 37 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,13 @@
66
* `examples/<name>/MODULE.bazel`); the per-workspace algorithm in the
77
* orchestrator runs once per discovered root.
88
*
9-
* Pruning matches the now-deleted `bazel-lockfile-discovery.mts`: skip
10-
* directories that obviously aren't Bazel workspaces (`.git`, `node_modules`,
11-
* `.socket-auto-manifest`, etc.) and Bazel's `bazel-*` convenience symlinks
12-
* that point into <output_base> (tens of GiB of generated state). Also
13-
* prunes `dist*` build-output directories.
9+
* The walker is dependency-injected with the directory-prune policy:
10+
* callers pass the set of basenames and basename prefixes the walk must
11+
* refuse to descend into. This module intentionally hardcodes none of
12+
* the "common" prunes (`.git`, `node_modules`, …) — Bazel callers compose
13+
* the codebase-wide `IGNORED_DIRS` list (`src/utils/glob.mts`) with the
14+
* Bazel-specific bits (`bazel-*` output_base symlinks,
15+
* `.socket-auto-manifest`, build-output `dist*`).
1416
*/
1517

1618
import { readdirSync } from 'node:fs'
@@ -21,38 +23,43 @@ import { logger } from '@socketsecurity/registry/lib/logger'
2123
// Hard ceiling on number of workspace roots we will surface. Real monorepos
2224
// have well under 50; this cap is a guard against pathological inputs.
2325
const MAX_WORKSPACE_ROOTS = 256
24-
// Hard ceiling on directory walk depth. Real workspaces nest <8 deep; the
25-
// cap protects against pathological symlink loops that slipped past the
26-
// `bazel-*` prefix prune.
27-
const MAX_WALK_DEPTH = 16
28-
// Directory basenames the walk refuses to descend into. None of these
29-
// contain Bazel workspaces, and node_modules / .git can be enormous.
30-
const PRUNE_DIR_NAMES = new Set([
31-
'.git',
32-
'.hg',
33-
'.idea',
34-
'.pnpm-store',
35-
'.socket-auto-manifest',
36-
'.svn',
37-
'.vscode',
38-
'node_modules',
39-
])
40-
// Directory basename prefixes the walk refuses to descend into. Bazel's
41-
// `bazel-out`, `bazel-bin`, `bazel-testlogs`, and `bazel-<workspace>`
42-
// convenience symlinks all point into the output_base. `dist`-prefixed
43-
// directories are build artefacts, not workspaces.
44-
const PRUNE_DIR_PREFIXES = ['bazel-', 'dist']
26+
// Hard ceiling on directory walk depth. Deepest workspace marker observed
27+
// across the OSS corpus surveyed is 9 (bazel-self test fixtures); deepest
28+
// in realistic application code is 7 (checkmk's thirdparty layout). Cap
29+
// is set to 8 — one level of headroom over the realistic max, while still
30+
// guarding against pathological symlink loops that slipped past any
31+
// prefix prune.
32+
const MAX_WALK_DEPTH = 8
4533
// Files whose presence promotes a directory to a workspace root.
4634
const WORKSPACE_MARKER_FILES = new Set([
4735
'MODULE.bazel',
4836
'WORKSPACE',
4937
'WORKSPACE.bazel',
5038
])
5139

52-
// Walks the tree rooted at `cwd` and returns absolute paths to every
40+
export type FindWorkspaceRootsOptions = {
41+
cwd: string
42+
// Directory basenames to skip outright (exact match). Pass the union of
43+
// the codebase-wide ignore set (`IGNORED_DIRS` in `src/utils/glob.mts`)
44+
// and any caller-specific additions (e.g. `.socket-auto-manifest`).
45+
ignoreDirNames?: ReadonlySet<string>
46+
// Directory basename prefixes to skip. Bazel callers pass `['bazel-',
47+
// 'dist']` so the walk never descends into Bazel's output_base symlinks
48+
// or build-output directories.
49+
ignoreDirPrefixes?: readonly string[]
50+
verbose?: boolean
51+
}
52+
53+
const EMPTY_SET: ReadonlySet<string> = new Set()
54+
const EMPTY_ARRAY: readonly string[] = []
55+
56+
// Walks the tree rooted at `opts.cwd` and returns absolute paths to every
5357
// directory that contains at least one workspace marker file. Output is
5458
// sorted for determinism.
55-
export function findWorkspaceRoots(cwd: string, verbose?: boolean): string[] {
59+
export function findWorkspaceRoots(opts: FindWorkspaceRootsOptions): string[] {
60+
const { cwd, verbose } = opts
61+
const ignoreDirNames = opts.ignoreDirNames ?? EMPTY_SET
62+
const ignoreDirPrefixes = opts.ignoreDirPrefixes ?? EMPTY_ARRAY
5663
const out: string[] = []
5764
// Tuple stack: [absolute dir, depth from cwd].
5865
const stack: Array<[string, number]> = [[cwd, 0]]
@@ -98,11 +105,11 @@ export function findWorkspaceRoots(cwd: string, verbose?: boolean): string[] {
98105
continue
99106
}
100107
const name = entry.name
101-
if (PRUNE_DIR_NAMES.has(name)) {
108+
if (ignoreDirNames.has(name)) {
102109
continue
103110
}
104111
let pruned = false
105-
for (const prefix of PRUNE_DIR_PREFIXES) {
112+
for (const prefix of ignoreDirPrefixes) {
106113
if (name.startsWith(prefix)) {
107114
pruned = true
108115
break

src/commands/manifest/bazel/bazel-workspace-walk.test.mts

Lines changed: 50 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,22 @@ function touch(file: string): void {
1717
writeFileSync(file, '')
1818
}
1919

20+
// Standard prune set Bazel callers pass: the codebase-wide IGNORED_DIRS
21+
// (.git, node_modules, etc.) plus the walker's own output dir, plus
22+
// `bazel-*` output_base symlinks and `dist*` build outputs. Replicated
23+
// inline here so the test stays decoupled from `src/utils/glob.mts`.
24+
const BAZEL_IGNORE_NAMES: ReadonlySet<string> = new Set([
25+
'.git',
26+
'.hg',
27+
'.idea',
28+
'.pnpm-store',
29+
'.socket-auto-manifest',
30+
'.svn',
31+
'.vscode',
32+
'node_modules',
33+
])
34+
const BAZEL_IGNORE_PREFIXES: readonly string[] = ['bazel-', 'dist']
35+
2036
describe('bazel-workspace-walk', () => {
2137
let tmp: string
2238

@@ -31,48 +47,54 @@ describe('bazel-workspace-walk', () => {
3147
describe('findWorkspaceRoots', () => {
3248
it('returns the root when only the root has MODULE.bazel', () => {
3349
touch(path.join(tmp, 'MODULE.bazel'))
34-
expect(findWorkspaceRoots(tmp)).toEqual([tmp])
50+
expect(findWorkspaceRoots({ cwd: tmp })).toEqual([tmp])
3551
})
3652

3753
it('detects WORKSPACE and WORKSPACE.bazel as root markers', () => {
3854
touch(path.join(tmp, 'WORKSPACE'))
39-
expect(findWorkspaceRoots(tmp)).toEqual([tmp])
55+
expect(findWorkspaceRoots({ cwd: tmp })).toEqual([tmp])
4056
rmSync(path.join(tmp, 'WORKSPACE'))
4157
touch(path.join(tmp, 'WORKSPACE.bazel'))
42-
expect(findWorkspaceRoots(tmp)).toEqual([tmp])
58+
expect(findWorkspaceRoots({ cwd: tmp })).toEqual([tmp])
4359
})
4460

4561
it('finds nested workspaces at arbitrary depth', () => {
4662
touch(path.join(tmp, 'MODULE.bazel'))
4763
touch(path.join(tmp, 'examples', 'dagger', 'MODULE.bazel'))
4864
touch(path.join(tmp, 'examples', 'android', 'nested', 'WORKSPACE.bazel'))
49-
const found = findWorkspaceRoots(tmp).map(p => path.relative(tmp, p))
50-
expect(found).toEqual([
51-
'',
52-
'examples/android/nested',
53-
'examples/dagger',
54-
])
65+
const found = findWorkspaceRoots({ cwd: tmp }).map(p =>
66+
path.relative(tmp, p),
67+
)
68+
expect(found).toEqual(['', 'examples/android/nested', 'examples/dagger'])
5569
})
5670

5771
it('returns [] when there is no workspace root', () => {
5872
writeFileSync(path.join(tmp, 'README.md'), '')
59-
expect(findWorkspaceRoots(tmp)).toEqual([])
73+
expect(findWorkspaceRoots({ cwd: tmp })).toEqual([])
74+
})
75+
76+
it('does NOT prune by default — pruning policy is caller-supplied', () => {
77+
touch(path.join(tmp, 'MODULE.bazel'))
78+
touch(path.join(tmp, 'node_modules', 'MODULE.bazel'))
79+
const found = findWorkspaceRoots({ cwd: tmp }).map(p =>
80+
path.relative(tmp, p),
81+
)
82+
expect(found).toEqual(['', 'node_modules'])
6083
})
6184

62-
it('prunes .git / node_modules / .socket-auto-manifest', () => {
85+
it('prunes injected ignoreDirNames', () => {
6386
touch(path.join(tmp, 'MODULE.bazel'))
64-
// Sub-MODULE.bazel files inside pruned dirs must not be surfaced.
6587
for (const dir of ['node_modules', '.git', '.socket-auto-manifest']) {
6688
touch(path.join(tmp, dir, 'sub', 'MODULE.bazel'))
6789
}
68-
const found = findWorkspaceRoots(tmp).map(p => path.relative(tmp, p))
90+
const found = findWorkspaceRoots({
91+
cwd: tmp,
92+
ignoreDirNames: BAZEL_IGNORE_NAMES,
93+
}).map(p => path.relative(tmp, p))
6994
expect(found).toEqual([''])
7095
})
7196

72-
it('prunes bazel-* convenience symlinks', () => {
73-
// Simulate `bazel-out` pointing at a directory that contains a copy of
74-
// MODULE.bazel. The walk must skip it; otherwise discovery would
75-
// surface generated workspaces from <output_base>.
97+
it('prunes injected ignoreDirPrefixes (bazel-* symlinks)', () => {
7698
const fakeOutputBase = mkdtempSync(
7799
path.join(os.tmpdir(), 'sock-fake-outbase-'),
78100
)
@@ -83,42 +105,45 @@ describe('bazel-workspace-walk', () => {
83105
touch(path.join(fakeOutputBase, 'external', 'maven', 'MODULE.bazel'))
84106
symlinkSync(fakeOutputBase, path.join(tmp, 'bazel-out'))
85107
touch(path.join(tmp, 'MODULE.bazel'))
86-
const found = findWorkspaceRoots(tmp).map(p => path.relative(tmp, p))
108+
const found = findWorkspaceRoots({
109+
cwd: tmp,
110+
ignoreDirPrefixes: BAZEL_IGNORE_PREFIXES,
111+
}).map(p => path.relative(tmp, p))
87112
expect(found).toEqual([''])
88113
} finally {
89114
rmSync(fakeOutputBase, { recursive: true, force: true })
90115
}
91116
})
92117

93-
it('prunes dist* build-output directories', () => {
118+
it('prunes injected dist* prefix', () => {
94119
touch(path.join(tmp, 'MODULE.bazel'))
95120
touch(path.join(tmp, 'dist', 'MODULE.bazel'))
96121
touch(path.join(tmp, 'distribution', 'MODULE.bazel'))
97-
const found = findWorkspaceRoots(tmp).map(p => path.relative(tmp, p))
122+
const found = findWorkspaceRoots({
123+
cwd: tmp,
124+
ignoreDirPrefixes: BAZEL_IGNORE_PREFIXES,
125+
}).map(p => path.relative(tmp, p))
98126
expect(found).toEqual([''])
99127
})
100128

101129
it('returns absolute, sorted paths', () => {
102130
touch(path.join(tmp, 'z', 'MODULE.bazel'))
103131
touch(path.join(tmp, 'a', 'MODULE.bazel'))
104132
touch(path.join(tmp, 'm', 'MODULE.bazel'))
105-
const found = findWorkspaceRoots(tmp)
133+
const found = findWorkspaceRoots({ cwd: tmp })
106134
expect(found).toEqual([
107135
path.join(tmp, 'a'),
108136
path.join(tmp, 'm'),
109137
path.join(tmp, 'z'),
110138
])
111-
// Absolute.
112139
for (const p of found) {
113140
expect(path.isAbsolute(p)).toBe(true)
114141
}
115142
})
116143

117144
it('handles an unreadable directory by skipping it (no throw)', () => {
118145
touch(path.join(tmp, 'MODULE.bazel'))
119-
// Reference a path that does not exist as cwd; the walker must not
120-
// throw — it should return [] (no entries to read).
121-
expect(findWorkspaceRoots(path.join(tmp, 'nope'))).toEqual([])
146+
expect(findWorkspaceRoots({ cwd: path.join(tmp, 'nope') })).toEqual([])
122147
})
123148
})
124149
})

src/commands/manifest/bazel/extract_bazel_to_maven.mts

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ import {
3030
} from './bazel-workspace-detect.mts'
3131
import { findWorkspaceRoots } from './bazel-workspace-walk.mts'
3232
import { getErrorCause } from '../../../utils/errors.mts'
33+
import { IGNORED_DIRS } from '../../../utils/glob.mts'
3334

3435
import type {
3536
CqueryRepoResult,
@@ -101,6 +102,22 @@ type Sidecar = {
101102
const DEFAULT_PER_REPO_TIMEOUT_MS = 60_000
102103
const REAP_TIMEOUT_MS = 10_000
103104

105+
// Composed prune policy passed to the workspace walker. Reuses the
106+
// codebase-wide `IGNORED_DIRS` and augments it with: the walker's own
107+
// output dir (`.socket-auto-manifest`), VCS/IDE dirs not in the shared
108+
// list (`.hg`, `.svn`, `.idea`, `.vscode`, `.pnpm-store`), Bazel's
109+
// `bazel-*` output_base symlinks, and `dist*` build-output dirs.
110+
const WORKSPACE_WALK_IGNORE_NAMES: ReadonlySet<string> = new Set([
111+
...IGNORED_DIRS,
112+
'.hg',
113+
'.idea',
114+
'.pnpm-store',
115+
'.socket-auto-manifest',
116+
'.svn',
117+
'.vscode',
118+
])
119+
const WORKSPACE_WALK_IGNORE_PREFIXES: readonly string[] = ['bazel-', 'dist']
120+
104121
type CoordPair = { groupArtifact: string; version: string }
105122

106123
// Splits "g:a:v" -> { groupArtifact: "g:a", version: "v" }.
@@ -396,7 +413,12 @@ export async function extractBazelToMaven(
396413
const allArtifacts: ExtractedArtifact[] = []
397414

398415
try {
399-
const workspaceRoots = findWorkspaceRoots(cwd, verbose)
416+
const workspaceRoots = findWorkspaceRoots({
417+
cwd,
418+
ignoreDirNames: WORKSPACE_WALK_IGNORE_NAMES,
419+
ignoreDirPrefixes: WORKSPACE_WALK_IGNORE_PREFIXES,
420+
verbose,
421+
})
400422
if (!workspaceRoots.length) {
401423
logger.warn(
402424
`No Bazel workspace found at ${cwd} or beneath (looked for MODULE.bazel / WORKSPACE / WORKSPACE.bazel).`,

src/utils/glob.mts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ const DEFAULT_IGNORE_FOR_GIT_IGNORE = defaultIgnore.filter(
2222
p => !p.endsWith('.gitignore'),
2323
)
2424

25-
const IGNORED_DIRS = [
25+
export const IGNORED_DIRS = [
2626
// Taken from ignore-by-default:
2727
// https://github.com/novemberborn/ignore-by-default/blob/v2.1.0/index.js
2828
'.git', // Git repository files, see <https://git-scm.com/>

0 commit comments

Comments
 (0)