Skip to content

Latest commit

 

History

History

README.md

examples — runnable scripts

Each script is self-contained and runs against the bundled sample/acme_invoice.pdf + sample/acme_invoice.png fixtures. The base URL defaults to http://localhost:8000 (override with TURBO_OCR_BASE_URL).

export TURBO_OCR_BASE_URL=http://localhost:8000  # optional, this is the default
python examples/00_quickstart.py
python examples/02_pdf_to_markdown.py
# Example What it shows README section
00 00_quickstart.py The smallest useful script — sync image OCR. Quickstart
01 01_image_ocr_with_layout.py Image OCR with layout, reading_order, include_blocks; dump blocks as JSON. Image OCR
02 02_pdf_to_markdown.py PDF -> Markdown via render_to_markdown. PDF -> Markdown
03 03_searchable_pdf.py PDF -> searchable PDF, then verified with pypdf. Searchable PDF
04 04_async_client.py AsyncClient + asyncio.gather for concurrent OCR. Async
05 05_batch.py recognize_batch over multiple images. Batch
06 06_grpc.py GrpcClient — same surface as Client, gRPC transport. gRPC
07 07_retry_and_timeout.py Custom RetryPolicy + per-request timeout=. Retry policy
08 08_custom_httpx_client.py Pass your own httpx.Client for TLS / limits. Custom httpx.Client
09 09_markdown_style.py Register a custom layout label + renderer on MarkdownStyle. Custom Markdown labels
10 10_tables_and_formulas.py Iterate response.tables / response.formulas — see caveat below. Tables and formulas
11 11_folder_pipeline.py AsyncClient + asyncio.Semaphore for a bounded-concurrency folder pipeline.
12 12_hooks_and_logging.py httpx event hooks + the SDK's stdlib logger. Logging

Tables and formulas — partial support today. As of server v2.2.3, the server detects table and formula regions (you get a bounding_box and row-major OCR'd text) but does not emit cell structure or LaTeX source. Table.html, Table.cells, and Formula.latex are always None in the current responses. The SDK is forward-compatible: when the server ships table-structure-recognition and LaTeX OCR, those fields will populate without any SDK code changes.

Sample fixtures

sample/acme_invoice.pdf and sample/acme_invoice.png are a fictional two-page ACME Corp invoice (line items, totals, terms). They're committed so the examples run out of the box; regenerate with:

python examples/sample/generate.py

Requires reportlab and pypdfium2 (both are dev-time only — not runtime dependencies of the SDK itself).