Each script is self-contained and runs against the bundled
sample/acme_invoice.pdf +
sample/acme_invoice.png fixtures. The base URL
defaults to http://localhost:8000 (override with TURBO_OCR_BASE_URL).
export TURBO_OCR_BASE_URL=http://localhost:8000 # optional, this is the default
python examples/00_quickstart.py
python examples/02_pdf_to_markdown.py| # | Example | What it shows | README section |
|---|---|---|---|
| 00 | 00_quickstart.py |
The smallest useful script — sync image OCR. | Quickstart |
| 01 | 01_image_ocr_with_layout.py |
Image OCR with layout, reading_order, include_blocks; dump blocks as JSON. |
Image OCR |
| 02 | 02_pdf_to_markdown.py |
PDF -> Markdown via render_to_markdown. |
PDF -> Markdown |
| 03 | 03_searchable_pdf.py |
PDF -> searchable PDF, then verified with pypdf. |
Searchable PDF |
| 04 | 04_async_client.py |
AsyncClient + asyncio.gather for concurrent OCR. |
Async |
| 05 | 05_batch.py |
recognize_batch over multiple images. |
Batch |
| 06 | 06_grpc.py |
GrpcClient — same surface as Client, gRPC transport. |
gRPC |
| 07 | 07_retry_and_timeout.py |
Custom RetryPolicy + per-request timeout=. |
Retry policy |
| 08 | 08_custom_httpx_client.py |
Pass your own httpx.Client for TLS / limits. |
Custom httpx.Client |
| 09 | 09_markdown_style.py |
Register a custom layout label + renderer on MarkdownStyle. |
Custom Markdown labels |
| 10 | 10_tables_and_formulas.py |
Iterate response.tables / response.formulas — see caveat below. |
Tables and formulas |
| 11 | 11_folder_pipeline.py |
AsyncClient + asyncio.Semaphore for a bounded-concurrency folder pipeline. |
— |
| 12 | 12_hooks_and_logging.py |
httpx event hooks + the SDK's stdlib logger. | Logging |
Tables and formulas — partial support today. As of server v2.2.3, the server detects table and formula regions (you get a
bounding_boxand row-major OCR'dtext) but does not emit cell structure or LaTeX source.Table.html,Table.cells, andFormula.latexare alwaysNonein the current responses. The SDK is forward-compatible: when the server ships table-structure-recognition and LaTeX OCR, those fields will populate without any SDK code changes.
sample/acme_invoice.pdf and sample/acme_invoice.png are a fictional
two-page ACME Corp invoice (line items, totals, terms). They're committed
so the examples run out of the box; regenerate with:
python examples/sample/generate.pyRequires reportlab and pypdfium2 (both are dev-time only — not runtime
dependencies of the SDK itself).