Skip to content

feat!: migrate CLI to scrapegraph-js v2 API#16

Open
VinciGit00 wants to merge 2 commits intomainfrom
feat/migrate-to-sdk-v2
Open

feat!: migrate CLI to scrapegraph-js v2 API#16
VinciGit00 wants to merge 2 commits intomainfrom
feat/migrate-to-sdk-v2

Conversation

@VinciGit00
Copy link
Copy Markdown
Member

Summary

Migrates just-scrape to the scrapegraph-js v2 SDK (head 096c110). The v2 API consolidates endpoints and drops legacy ones; the CLI now mirrors that surface.

Commands kept (rewritten against v2)

Command Notes
scrape <url> Multi-format: -f markdown,html,screenshot,branding,links,images,summary,json (comma-separate for multi-format output)
crawl <url> Polls crawl.startcrawl.get until completed / failed / deleted
history [service] [id] New response shape (data[] + pagination), service filter optional
credits Uses v2 getCredits; response exposes remaining, used, plan, jobs.{crawl,monitor}
validate Uses v2 checkHealth (/health)

Commands added

  • extract <url> -p "…" — structured extraction with optional --schema
  • search "<query>" — web search with optional -p/--schema, --country, --time-range, --format
  • monitor <action>create / list / get / update / delete / pause / resume / activity

Commands removed (not in v2 API)

Removed Replacement
smart-scraper <url> -p "…" scrape <url> -f json -p "…" or extract <url> -p "…"
search-scraper "…" search "…" (query is now positional; -p is the extraction prompt)
markdownify <url> scrape <url> (markdown is the default format)
scrape <url> (raw HTML only) scrape <url> -f html
sitemap <url> Removed — use crawl with --include-patterns
agentic-scraper Removed
generate-schema Removed

Other changes

  • package.json: scrapegraph-js pinned to github:ScrapeGraphAI/scrapegraph-js#096c110 (PR feat!: migrate CLI to scrapegraph-js v2 API #13 head); CLI version bumped 0.2.11.0.0 to track SDK v2.0.0.
  • src/lib/env.ts: bridges legacy SGAI_TIMEOUT_S / JUST_SCRAPE_TIMEOUT_SSGAI_TIMEOUT (SDK v2 renamed the var). SGAI_API_URL default is now https://api.scrapegraphai.com/api/v2 (baked into the SDK).
  • README rewritten: new command docs, migration table, env-var table.
  • Smoke test updated to assert v2 exports (scrape, extract, search, crawl.start, monitor.create, ScrapeGraphAI).

Test plan

  • bun run check — tsc + biome clean
  • bun test — smoke test passes (v2 exports resolvable)
  • bun run build — bundles cleanly to dist/cli.mjs
  • just-scrape --help — lists 8 commands (scrape/extract/search/crawl/monitor/history/credits/validate)
  • just-scrape <cmd> --help — all subcommand help renders correctly
  • Live smoke test against the v2 API — blocked on scrapegraph-js#13 deploying to production; current /api/v2/* returns 404 in prod, which is expected while the SDK PR is open

🤖 Generated with Claude Code

VinciGit00 and others added 2 commits April 15, 2026 21:55
Aligns the CLI with scrapegraph-js PR #13 (v2 SDK). The v2 API
consolidates endpoints and drops legacy ones; the CLI follows suit.

Commands kept (rewritten against v2 types):
- scrape — multi-format (markdown/html/screenshot/branding/links/images/summary/json)
- crawl — polls until the job reaches a terminal state
- history — new response shape (data/pagination), service filter optional
- credits, validate — re-wired to getCredits / checkHealth

Commands added:
- extract — structured extraction with prompt + schema
- search — web search + optional extraction
- monitor — create/list/get/update/delete/pause/resume/activity

Commands removed (no longer in v2 API):
- smart-scraper   (use `scrape -f json -p ...` or `extract`)
- search-scraper  (use `search`)
- markdownify     (use `scrape` — markdown is the default format)
- sitemap, agentic-scraper, generate-schema

Other changes:
- package.json: scrapegraph-js pinned to github:ScrapeGraphAI/scrapegraph-js#096c110,
  CLI bumped 0.2.1 → 1.0.0 to track SDK v2.0.0
- src/lib/env.ts: bridges legacy SGAI_TIMEOUT_S / JUST_SCRAPE_TIMEOUT_S → SGAI_TIMEOUT
  (renamed by SDK v2)
- README + smoke test updated

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
scrapegraph-js is pinned to a GitHub commit (PR #13 head) that ships
without a prebuilt dist/, so module resolution fails on a fresh install.
Build it in-place after bun install so tsc/biome/bun test can resolve it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant