feat(skills): add dependency-pruning skill#21
Conversation
Audits a repository's dependencies across Python, JS/TS, Go, Rust, and other ecosystems to surface four categories of action: - Remove: unused packages (confirmed via tool output + manual grep) - Optimize: JS/TS packages with import styles that block tree-shaking - Vendor/rewrite: packages where only ≤3 symbols are used and the package is small enough to inline (configurable thresholds) - Migrate: deprecated, sunset, or abandoned packages with known migration targets Includes blind-spot guidance for Django projects (deptry false positives, INSTALLED_APPS string loading), server runtime packages (check Dockerfile + git history for in-flight migrations before flagging for removal), and CLI-invoked developer tooling (ipdb, bpython, pdbpp, etc. that static analysis always marks unused). Evaluated over 2 iterations against ocw-studio; skill achieves 93% assertion pass rate vs 60% for the no-skill baseline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a new 'dependency-pruning' skill, including its documentation, evaluation prompts, and reference guides for unused dependency detection across various ecosystems such as Python, JS/TS, Go, and Rust. The reviewer's feedback provides valuable and highly actionable improvements to the fallback scripts and shell commands. Specifically, the feedback addresses portability issues with GNU grep, a performance bug when measuring single-file Python modules, compatibility fallbacks for tomllib on Python versions older than 3.11, shell quoting issues in Node.js, and a regex parsing bug in Cargo.toml that incorrectly treats metadata keys as dependencies.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| rg "from ${PKG}(\.\w+)? import (\w+)" --no-filename -o --include="*.py" \ | ||
| | grep -oP 'import \K\w+' | sort -u |
There was a problem hiding this comment.
Using grep -oP relies on GNU grep, which is not pre-installed on macOS by default and can cause portability issues. We can achieve the same result directly using ripgrep's replace feature (-r), which is cleaner and more portable.
| rg "from ${PKG}(\.\w+)? import (\w+)" --no-filename -o --include="*.py" \ | |
| | grep -oP 'import \K\w+' | sort -u | |
| rg "from ${PKG}(?:\.\w+)? import (\w+)" -g "*.py" -o -r '$1' --no-filename | sort -u |
| # Python | ||
| python -c " | ||
| import importlib.util, pathlib | ||
| spec = importlib.util.find_spec('${PKG}') | ||
| if spec and spec.origin: | ||
| root = pathlib.Path(spec.origin).parent | ||
| lines = sum(len(f.read_text(errors='ignore').splitlines()) for f in root.rglob('*.py')) | ||
| print(lines) | ||
| " |
There was a problem hiding this comment.
If the package is a single-file module (e.g., six.py), pathlib.Path(spec.origin).parent will resolve to the site-packages directory. Running root.rglob('*.py') on site-packages will scan every single installed package in the environment, causing massive performance issues and incorrect line counts. We should check if the origin is a directory package (__init__.py) or a single-file module.
| # Python | |
| python -c " | |
| import importlib.util, pathlib | |
| spec = importlib.util.find_spec('${PKG}') | |
| if spec and spec.origin: | |
| root = pathlib.Path(spec.origin).parent | |
| lines = sum(len(f.read_text(errors='ignore').splitlines()) for f in root.rglob('*.py')) | |
| print(lines) | |
| " | |
| # Python | |
| python -c " | |
| import importlib.util, pathlib | |
| spec = importlib.util.find_spec('${PKG}') | |
| if spec and spec.origin: | |
| origin = pathlib.Path(spec.origin) | |
| if origin.name == '__init__.py': | |
| root = origin.parent | |
| lines = sum(len(f.read_text(errors='ignore').splitlines()) for f in root.rglob('*.py')) | |
| else: | |
| lines = len(origin.read_text(errors='ignore').splitlines()) | |
| print(lines) | |
| " |
| python -c " | ||
| import tomllib, pathlib, subprocess, sys | ||
|
|
||
| with open('pyproject.toml', 'rb') as f: | ||
| data = tomllib.load(f) | ||
|
|
||
| deps = data.get('project', {}).get('dependencies', []) | ||
| # Strip version specifiers | ||
| pkgs = [d.split('[')[0].split('>=')[0].split('==')[0].split('<')[0].strip().lower().replace('-','_') for d in deps] | ||
|
|
||
| for pkg in pkgs: | ||
| result = subprocess.run(['rg', '-l', pkg, '--include=*.py', '--glob=!tests/'], capture_output=True, text=True) | ||
| if not result.stdout.strip(): | ||
| print(f'UNUSED: {pkg}') | ||
| else: | ||
| print(f'used: {pkg} ({len(result.stdout.strip().splitlines())} files)') | ||
| " |
There was a problem hiding this comment.
This script has two issues: first, tomllib is only available in Python 3.11+, so it will fail on older Python versions. We should add a fallback to pip._vendor.tomli or tomli. Second, splitting on [ and operators like >= is fragile and fails on complex PEP 508 specifiers (e.g., with environment markers or other operators). Using a simple regex to extract the package name is much more robust.
python -c "
import pathlib, subprocess, sys, re
try:
import tomllib
except ImportError:
try:
import pip._vendor.tomli as tomllib
except ImportError:
print('Error: tomllib or tomli required')
sys.exit(1)
with open('pyproject.toml', 'rb') as f:
data = tomllib.load(f)
deps = data.get('project', {}).get('dependencies', [])
pkgs = []
for d in deps:
match = re.match(r'^([a-zA-Z0-9_.-]+)', d)
if match:
pkgs.append(match.group(1).lower().replace('-', '_'))
for pkg in pkgs:
result = subprocess.run(['rg', '-l', pkg, '--include=*.py', '--glob=!tests/'], capture_output=True, text=True)
if not result.stdout.strip():
print(f'UNUSED: {pkg}')
else:
print(f'used: {pkg} ({len(result.stdout.strip().splitlines())} files)')
"| const deps = Object.keys({...(pkg.dependencies||{}), ...(pkg.devDependencies||{})}); | ||
| for (const dep of deps) { | ||
| try { | ||
| const out = execSync(\`rg -l '\"'\${dep}'\"\\|'\${dep}' ' src/ --include='*.{ts,tsx,js,jsx}'\`, {stdio:['pipe','pipe','pipe']}).toString(); |
There was a problem hiding this comment.
The nested single-quotes inside the shell command can lead to syntax errors or unexpected behavior depending on the shell environment. We can simplify the pattern and use double quotes to make the command much cleaner and more robust.
| const out = execSync(\`rg -l '\"'\${dep}'\"\\|'\${dep}' ' src/ --include='*.{ts,tsx,js,jsx}'\`, {stdio:['pipe','pipe','pipe']}).toString(); | |
| const out = execSync("rg -l \"['\\\"]" + dep + "['\\\"]\" src/ --include='*.{ts,tsx,js,jsx}'", {stdio:['pipe','pipe','pipe']}).toString(); |
| python3 -c " | ||
| import re, subprocess, pathlib | ||
|
|
||
| cargo = pathlib.Path('Cargo.toml').read_text() | ||
| deps = re.findall(r'^(\w[\w-]*)\s*=', cargo, re.MULTILINE) | ||
|
|
||
| for dep in deps: | ||
| crate_name = dep.replace('-', '_') | ||
| result = subprocess.run(['rg', '-l', crate_name, 'src/'], capture_output=True, text=True) | ||
| if result.stdout.strip(): | ||
| print(f'used: {dep}') | ||
| else: | ||
| result2 = subprocess.run(['rg', '-l', dep, 'src/'], capture_output=True, text=True) | ||
| print(f'UNUSED: {dep}' if not result2.stdout.strip() else f'used: {dep}') | ||
| " | ||
| ``` |
There was a problem hiding this comment.
Using re.findall(r'^(\w[\w-]*)\s*=') on Cargo.toml will match metadata keys at the start of lines (such as name, version, edition, publish), treating them as dependencies and incorrectly flagging them as unused. We should parse Cargo.toml properly using tomllib (with a fallback to tomli) to extract actual dependencies from the relevant sections.
python3 -c "
import pathlib, subprocess, sys
try:
import tomllib
except ImportError:
try:
import pip._vendor.tomli as tomllib
except ImportError:
print('Error: tomllib or tomli required')
sys.exit(1)
cargo_data = tomllib.loads(pathlib.Path('Cargo.toml').read_text())
deps = []
for section in ['dependencies', 'dev-dependencies', 'build-dependencies']:
deps.extend(cargo_data.get(section, {}).keys())
for dep in sorted(set(deps)):
crate_name = dep.replace('-', '_')
result = subprocess.run(['rg', '-l', crate_name, 'src/'], capture_output=True, text=True)
if result.stdout.strip():
print(f'used: {dep}')
else:
result2 = subprocess.run(['rg', '-l', dep, 'src/'], capture_output=True, text=True)
print(f'UNUSED: {dep}' if not result2.stdout.strip() else f'used: {dep}')
"There was a problem hiding this comment.
Pull request overview
Adds a new process skill, dependency-pruning, intended to help audit and reduce dependency footprint across multiple ecosystems (Python, JS/TS, Go, Rust, etc.) by producing an evidence-backed report and optionally applying safe changes.
Changes:
- Adds
skills/process/dependency-pruning/SKILL.mddefining a phased dependency-audit workflow (unused deps, vendoring candidates, tree-shaking/import-style issues, deprecation/sunset migrations). - Adds supporting reference material and eval scenarios under
skills/process/dependency-pruning/references/andskills/process/dependency-pruning/evals/. - Registers the new skill in
skills/README.mdandskills/process/README.md.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/README.md | Adds dependency-pruning to the top-level skills index. |
| skills/process/README.md | Adds dependency-pruning to the process skills index. |
| skills/process/dependency-pruning/SKILL.md | Introduces the new dependency-pruning skill instructions and report format. |
| skills/process/dependency-pruning/references/unused-detection.md | Adds per-ecosystem command reference for detecting unused dependencies. |
| skills/process/dependency-pruning/evals/evals.json | Adds evaluation prompts/expectations for the new skill. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| **Django / Python projects**: deptry's DEP002 false-positive rate can be very | ||
| high (sometimes 30+ flags for a single project) because PyPI package names | ||
| rarely match their Python module names: | ||
| - `djangorestframework` → `rest_framework` | ||
| - `beautifulsoup4` → `bs4` | ||
| - `pyyaml` → `yaml` | ||
| - `pygithub` → `github` | ||
| - `psycopg2-binary` → `psycopg2` | ||
|
|
||
| When you see many DEP002 warnings on a Django project, verify each one manually | ||
| rather than reporting them all as unused. After the audit, suggest adding a | ||
| `[tool.deptry.package_module_name_map]` section to `pyproject.toml` so future | ||
| runs are accurate. |
| ## Remove — Unused Dependencies | ||
| | Package | Ecosystem | Evidence of non-use | | ||
| | ddt | Python | No `import ddt` or `from ddt` in any test file | | ||
|
|
||
| ## Optimize Import Style (JS/TS) | ||
| | Package | Current import | Issue | Fix | | ||
| | lodash | `import _ from 'lodash'` | Prevents tree-shaking; full ~72KB ships | Switch to `lodash-es` or per-function imports | | ||
|
|
||
| ## Vendor/Rewrite Candidates | ||
| | Package | Used symbols | Package LOC | Replacement sketch | | ||
| | waait | default (1) | 1 LOC | `const wait = (ms=0) => new Promise(r => setTimeout(r, ms))` | | ||
|
|
||
| ## Migrate Away From | ||
| | Package | Status | Migration target | | ||
| | react-ga | GA3 sunset Jul 2023 | PostHog (already wired), or GA4 via gtag | | ||
|
|
||
| ## Dev-only Misclassifications | ||
| | Package | Currently | Should be | | ||
| | ipython | dependencies | dev dependencies | |
| for pkg in pkgs: | ||
| result = subprocess.run(['rg', '-l', pkg, '--include=*.py', '--glob=!tests/'], capture_output=True, text=True) | ||
| if not result.stdout.strip(): | ||
| print(f'UNUSED: {pkg}') | ||
| else: | ||
| print(f'used: {pkg} ({len(result.stdout.strip().splitlines())} files)') |
| node -e " | ||
| const pkg = require('./package.json'); | ||
| const { execSync } = require('child_process'); | ||
| const deps = Object.keys({...(pkg.dependencies||{}), ...(pkg.devDependencies||{})}); | ||
| for (const dep of deps) { | ||
| try { | ||
| const out = execSync(\`rg -l '\"'\${dep}'\"\\|'\${dep}' ' src/ --include='*.{ts,tsx,js,jsx}'\`, {stdio:['pipe','pipe','pipe']}).toString(); | ||
| console.log(out.trim() ? 'used: '+dep : 'UNUSED: '+dep); | ||
| } catch { console.log('UNUSED: '+dep); } | ||
| } | ||
| " |
| Run in a temp directory to avoid mutating the real go.mod: | ||
|
|
||
| ```bash | ||
| # Non-destructive: show what's unused | ||
| cp go.mod /tmp/go.mod.bak && cp go.sum /tmp/go.sum.bak | ||
| go mod tidy -v 2>&1 | grep "^removing" | ||
| # Restore | ||
| cp /tmp/go.mod.bak go.mod && cp /tmp/go.sum.bak go.sum | ||
| ``` |
What are the relevant tickets?
N/A
Description (What does it do?)
Adds a new
dependency-pruningskill underskills/process/that audits a repository's dependencies across Python, JS/TS, Go, Rust, and other ecosystems to surface four categories of action:deptry,depcheck,cargo machete,go mod tidy) plus manual grep to catch packages the tools missimport _ from 'lodash'vsimport { debounce } from 'lodash-es')react-ga→ GA4/PostHog after Universal Analytics sunset)The skill includes explicit blind-spot guidance to avoid common false positives:
gunicorn,uwsgi,granian,hypercornare invoked via CLI in Dockerfile/K8s — check deployment configs AND git history for in-flight server migrations before flagging for removalipdb,pdbpp,bpython,ptpython,debugpyetc. are terminal tools, not app imports — flag as "move to dev deps" rather than "remove"After reporting, the skill offers to execute safe changes (removals, import-style fixes, vendor stubs) and delegates PR creation to the
create-ol-pull-requestskill if available.Evaluation
Evaluated over 2 iterations against ocw-studio (Django 5.2 + React/lodash). The skill achieves 93% assertion pass rate vs 60% for the no-skill baseline across three eval scenarios:
Key iteration-2 improvements over iteration-1: added Phase 3b (import style / tree-shaking for JS/TS), Phase 4 (deprecated/sunset detection), Django INSTALLED_APPS blind spot, server runtime caveat, and developer tooling caveat.
How can this be tested?
Point the skill at any repo with Python or JS/TS dependencies:
Verify the report:
pyproject.toml) and JS (package.json) ecosystemsuwsgior active server runtimes for removal without checking deployment configsipdb/bpythonfor removal (marks as "move to dev deps")react-ga(GA3 sunset)Additional Context
The skill is designed to be conservative — it requires evidence before flagging anything as removable, and asks the user to confirm before executing any changes. The "Optimize Import Style" category is the highest-ROI output for JS/TS-heavy repos: switching from
import _ from 'lodash'tolodash-estypically saves 40–70 KB gzipped.