Skip to content

feat: v2 SDK rewrite with Pydantic + httpx#84

Open
FrancescoSaverioZuppichini wants to merge 16 commits intomainfrom
feat/v2-migration
Open

feat: v2 SDK rewrite with Pydantic + httpx#84
FrancescoSaverioZuppichini wants to merge 16 commits intomainfrom
feat/v2-migration

Conversation

@FrancescoSaverioZuppichini
Copy link
Copy Markdown
Member

Summary

Complete SDK rewrite matching the JS SDK 1:1:

  • Pydantic v2 for all request/response models with automatic camelCase serialization
  • httpx for sync and async HTTP clients
  • ApiResult[T] wrapper pattern - no exceptions, just status: "success" | "error"
  • Nested resources - sgai.crawl.start(), sgai.monitor.create(), sgai.history.list()
  • uv as package manager with modern src/ layout

Changes

  • Restructured from nested scrapegraph-py/ to root-level uv library
  • All Pydantic models in single schemas.py with CamelModel base class
  • Sync client (ScrapeGraphAI) and async client (AsyncScrapeGraphAI)
  • 32 examples (16 sync + 16 async) for all endpoints
  • 28 unit tests with mocked httpx
  • Simplified CI workflows for uv

API Surface

from scrapegraph_py import ScrapeGraphAI, ScrapeRequest

sgai = ScrapeGraphAI()  # reads SGAI_API_KEY from env
result = sgai.scrape(ScrapeRequest(url="https://example.com"))

if result.status == "success":
    print(result.data["results"]["markdown"]["data"])

Test plan

  • All 28 unit tests pass (uv run pytest tests/test_client.py -v)
  • Integration tests pass with real API key
  • Lint passes (uv run ruff check .)
  • CI workflow runs successfully

🤖 Generated with Claude Code

- Delete .agent/ documentation folder (unused)
- Simplify CLAUDE.md from 370 to ~90 lines
- Remove stale docs (HEALTHCHECK.md, IMPLEMENTATION_SUMMARY.md, TOON_INTEGRATION_SUMMARY.md)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
BREAKING CHANGE: Complete project restructure

- Remove nested scrapegraph-py/ folder
- Initialize as uv library with src/ layout
- Clean slate for v2 API rewrite

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ScrapeGraphAI sync client with httpx
- Add AsyncScrapeGraphAI async client
- Add Pydantic models for all request/response types
- Add nested resources: crawl, monitor, history
- Return ApiResult wrapper (never raises)
- Support SGAI_API_KEY, SGAI_DEBUG, SGAI_TIMEOUT_S env vars

API surface:
- client.scrape(ScrapeRequest)
- client.extract(ExtractRequest)
- client.search(SearchRequest)
- client.credits()
- client.health()
- client.crawl.start/get/stop/resume/delete
- client.monitor.create/list/get/update/delete/pause/resume
- client.history.list/get

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- scrape: basic, json extraction, pdf, multi-format, fetchconfig
- extract: basic, with schema
- search: basic, with extraction
- crawl: basic, with formats
- monitor: basic, with webhook
- utilities: credits, health, history

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Delete types.py, everything in schemas.py
- Remove Api prefix from response models
- Pre-compile server timing regex
- Fix json field shadowing with aliases

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Follows Pydantic v2 best practices for type safety

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Test credits, scrape, extract, search, history, crawl
- Fix HttpUrl serialization (mode="json" in model_dump)
- Add python-dotenv for loading .env

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace manual _to_camel with Pydantic's built-in alias_generator
- CamelModel base class handles snake_case -> camelCase conversion
- Simplify _serialize to single model_dump call
- Add async versions of all 16 examples
- Update README with expanded async client docs and examples table
- Add banner from JS SDK

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add .pytest_cache/, .ruff_cache/, .mypy_cache/ to gitignore
- Add common Python build/test artifacts
- Remove obsolete update-requirements.yml workflow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove obsolete pylint.yml and test.yml (referenced old structure)
- Add ci.yml with simple lint + test jobs using uv
- Update release.yml for root-level project
- Update python-publish.yml for uv build

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Test request construction, response parsing, error handling
- Mock httpx.Client.request instead of hitting real API
- Test all endpoints: scrape, extract, search, crawl, monitor, history
- Test HTTP errors (401, 402, 429), timeouts
- Test camelCase serialization
- Update CI to run test_client.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 14, 2026

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 3 package(s) with unknown licenses.
See the Details below.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 8c59f38.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

License Issues

uv.lock

PackageVersionLicenseIssue Type
pydantic2.13.0NullUnknown License
pytest9.0.3NullUnknown License
ruff0.15.10NullUnknown License

OpenSSF Scorecard

Scorecard details
PackageVersionScoreDetails
pip/annotated-types 0.7.0 UnknownUnknown
pip/anyio 4.13.0 UnknownUnknown
pip/certifi 2026.2.25 🟢 6.6
Details
CheckScoreReason
Code-Review🟢 5Found 1/2 approved changesets -- score normalized to 5
Maintained🟢 1010 commit(s) and 2 issue activity found in the last 90 days -- score normalized to 10
Binary-Artifacts🟢 10no binaries found in the repo
Security-Policy🟢 10security policy file detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Token-Permissions🟢 10GitHub workflow tokens follow principle of least privilege
Pinned-Dependencies🟢 5dependency not pinned by hash detected -- score normalized to 5
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Fuzzing⚠️ 0project is not fuzzed
License🟢 9license file detected
Signed-Releases⚠️ -1no releases found
Packaging🟢 10packaging workflow detected
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/colorama 0.4.6 UnknownUnknown
pip/h11 0.16.0 🟢 4.4
Details
CheckScoreReason
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1packaging workflow not detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Code-Review🟢 5Found 9/18 approved changesets -- score normalized to 5
Maintained⚠️ 00 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 0
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Binary-Artifacts🟢 10no binaries found in the repo
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Security-Policy⚠️ 0security policy file not detected
Fuzzing🟢 10project is fuzzed
Signed-Releases⚠️ -1no releases found
License🟢 10license file detected
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/httpcore 1.0.9 UnknownUnknown
pip/httpx 0.28.1 UnknownUnknown
pip/idna 3.11 UnknownUnknown
pip/iniconfig 2.3.0 UnknownUnknown
pip/packaging 26.0 UnknownUnknown
pip/pluggy 1.6.0 UnknownUnknown
pip/pydantic 2.13.0 UnknownUnknown
pip/pydantic-core 2.46.0 🟢 7.1
Details
CheckScoreReason
Code-Review🟢 10all changesets reviewed
Maintained🟢 1030 commit(s) and 18 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Binary-Artifacts🟢 10no binaries found in the repo
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies🟢 8dependency not pinned by hash detected -- score normalized to 8
Fuzzing🟢 10project is fuzzed
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Security-Policy🟢 10security policy file detected
Branch-Protection⚠️ 1branch protection is not maximal on development and all release branches
Packaging🟢 10packaging workflow detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/pygments 2.20.0 UnknownUnknown
pip/pytest 9.0.3 UnknownUnknown
pip/pytest-asyncio 1.3.0 UnknownUnknown
pip/python-dotenv 1.2.2 UnknownUnknown
pip/ruff 0.15.10 UnknownUnknown
pip/typing-extensions 4.15.0 UnknownUnknown
pip/typing-inspection 0.4.2 UnknownUnknown

Scanned Files

  • .github/workflows/test.yml
  • scrapegraph-py/requirements-test.txt
  • scrapegraph-py/uv.lock
  • uv.lock

- Run ruff format on src/
- Add ruff config to pyproject.toml (line-length=100, ignore E501)
- Fix import ordering

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Format test files with ruff
- Add per-file ignores for tests (F841, E402)
- Update CI to check src/ tests/ only

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ResponseModel base class with camelCase alias generator
- Change all response models to inherit from ResponseModel
- Use TypeAdapter for proper generic type parsing
- Update all examples to use attribute access (res.data.results)
- Fix all test mocks with complete required fields

This follows industry standard SDK patterns where typed objects are
returned for IDE autocompletion and type safety.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant