Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions skills/hotdata-analytics/references/WORKFLOWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,11 @@ hotdata query "SELECT ..."

Land a smaller table — pick one:

**Datasets** (CSV/JSON/URL/SQL snapshot → `datasets.<schema>.<table>`):
**Datasets** (SQL query or saved query → `datasets.<schema>.<table>`):

```bash
hotdata datasets create --label "chain revenue slice" --sql "SELECT ..." [--table-name chain_revenue_slice]
hotdata datasets create --label "from saved" --query-id <query_id> [--table-name ...]
hotdata datasets create --name chain_revenue_slice [--description "chain revenue slice"] --sql "SELECT ..."
hotdata datasets create --name chain_from_saved [--description "from saved"] --query-id <query_id>
```

**Managed database** (parquet → `<database>.<schema>.<table>`):
Expand All @@ -95,7 +95,7 @@ hotdata query "SELECT * FROM datasets.main.chain_revenue_slice WHERE ..."

### Naming and documentation

- Prefer predictable `--table-name` values: `chain_<topic>_<YYYYMMDD>`.
- Prefer predictable `--name` values: `chain_<topic>_<YYYYMMDD>`.
- Record long-lived chains in **context:DATAMODEL → Derived tables (Chain)** with the **full** SQL name you use (`datasets.…` or `database.schema.table`).
- Promote join/grain findings to **context:DATAMODEL** when they should be shared or persisted (**`hotdata`** skill).

Expand Down
16 changes: 7 additions & 9 deletions skills/hotdata-search/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,26 +42,24 @@ hotdata search "<query>" --table <connection.schema.table> [--type vector] [--co

## Indexes (BM25 and vector)

Indexes attach to a **connection table** (`--connection-id` + `--schema` + `--table`) or a **dataset** (`--dataset-id`). Scopes are mutually exclusive for create/delete.
Indexes attach to a **managed database table** (`--catalog`) or a **dataset** (`--dataset-id`). Create is not supported on raw connection tables via CLI. `list` and `delete` accept `--connection-id` for connection-scoped operations.

```bash
# List — workspace scan on connection tables (filter with -c / --schema / --table)
# List — workspace scan (filter by connection, schema, table, or dataset)
hotdata indexes list [--connection-id <id>] [--schema <schema>] [--table <table>] [--workspace-id <ws>] [--output table|json|yaml]
hotdata indexes list --dataset-id <dataset_id> [--workspace-id <ws>] [--output table|json|yaml]

# Managed database (catalog alias — uses the active database when the catalog matches)
# Create — managed database table (catalog alias)
hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
--column <col> --type bm25|vector \
[--name <name>] [--metric l2|cosine|dot] [--async] \
[--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]

# Connection table (raw connection ID)
hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
--column <col> --type bm25|vector [--name <name>] ...
hotdata indexes delete --connection-id <id> --schema <schema> --table <table> --name <name>

# Dataset
# Create — dataset
hotdata indexes create --dataset-id <dataset_id> --column <col> --type bm25|vector [--name <name>] ...

# Delete — connection table or dataset
hotdata indexes delete --connection-id <id> --schema <schema> --table <table> --name <name>
hotdata indexes delete --dataset-id <dataset_id> --name <name>
```

Expand Down
2 changes: 1 addition & 1 deletion skills/hotdata/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ Catalog, skill decision tree, epic flows (onboard, chain, retrieval), and datase

Top-level subcommands (each detailed below): **`auth`**, **`datasets`**, **`query`**, **`workspaces`**, **`connections`**, **`databases`**, **`tables`**, **`skills`**, **`results`**, **`jobs`**, **`indexes`**, **`embedding-providers`**, **`search`**, **`queries`**, **`context`**, **`completions`**. Search, indexes (bm25/vector), and embedding providers are documented in **`hotdata-search`**; query history, results, Chain, and OLAP patterns in **`hotdata-analytics`**.

Global CLI options: **`--api-key`**, **`-v` / `--version`**, **`-h` / `--help`**. Hidden developer flag: **`--debug`** (verbose HTTP logs).
Global CLI options: **`--api-key`**, **`-v` / `--version`**, **`-h` / `--help`**, **`--no-input`** (disable interactive prompts; commands that require input will error instead — useful in CI or non-TTY environments). Hidden developer flag: **`--debug`** (verbose HTTP logs).

### List Workspaces
```
Expand Down
29 changes: 15 additions & 14 deletions skills/hotdata/references/WORKFLOWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Load **`hotdata`** first for auth and workspace setup. Add a sub-skill only when
| User goal | Skill | Key commands |
|-----------|--------|----------------|
| Login, workspaces, connections, tables, context | **`hotdata`** | `auth`, `workspaces`, `connections`, `tables`, `context` |
| Upload CSV/JSON/URL or SQL-derived tables | **`hotdata`** | `datasets create`, `databases …` (see below) |
| Load parquet files or materialize SQL tables | **`hotdata`** | `databases create` + `databases load`, `datasets create --sql` |
| SQL analytics, aggregations, history, Chain | **`hotdata-analytics`** | `query`, `queries`, `results`, `datasets create --sql` |
| BM25 / vector search, retrieval indexes | **`hotdata-search`** | `search`, `indexes create`, `embedding-providers` |
| Geospatial / PostGIS-style SQL | **`hotdata-geospatial`** | `query` with `ST_*`, WKB columns |
Expand Down Expand Up @@ -51,8 +51,8 @@ End-to-end checklists. Use the linked sections for command detail and guardrails

1. [ ] Run base SQL: `hotdata query "SELECT …"` — poll `hotdata query status <id>` if async
2. [ ] Materialize one way:
- [ ] **Dataset:** `hotdata datasets create --label "…" --sql "SELECT …" [--table-name …]`
- [ ] **Managed DB:** `hotdata databases create --name … --table ` then `hotdata databases tables load --file ./….parquet`
- [ ] **Dataset:** `hotdata datasets create --name <name> [--description "…"] --sql "SELECT …"`
- [ ] **Managed DB:** `hotdata databases create --catalog <alias> --table <name>` then `hotdata databases load --catalog <alias> --table <name> --file ./….parquet`
3. [ ] Copy **`full_name`** from create output (or `datasets list` **FULL NAME**)
4. [ ] Chain: `hotdata query "SELECT … FROM <full_name> WHERE …"`
5. [ ] Record stable chains in **context:DATAMODEL** when they should outlive the session
Expand Down Expand Up @@ -84,26 +84,27 @@ Both land queryable tables in the workspace; the path depends on **format** and

| | **Datasets** | **Managed databases** |
|---|-------------|------------------------|
| **Best for** | CSV, JSON, URL import, stdin, SQL/query snapshot | Parquet files you own; catalog-style `name.schema.table` |
| **SQL prefix** | `datasets.<schema>.<table>` (often `datasets.main.*`) | `<database>.<schema>.<table>` (database = connection name) |
| **CLI** | `hotdata datasets create` | `hotdata databases create` + `databases tables load` |
| **Declare schema up front** | No | Yes — `--table` on create (required before load on current API) |
| **Parquet** | Yes (`--file`, `--url`, `--upload-id`) | **Only** parquet on `tables load` |
| **Refresh upstream** | `datasets refresh` (URL/query sources) | Replace via `tables load` again |
| **Best for** | SQL or saved-query snapshot | Parquet files you own; catalog-style `alias.schema.table` |
| **SQL prefix** | `datasets.<schema>.<table>` (often `datasets.main.*`) | `<catalog>.<schema>.<table>` where catalog = `--catalog` alias |
| **CLI** | `hotdata datasets create --sql “…”` | `hotdata databases create --catalog` + `databases load` |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: curly quotes “…” inside this inline-code command read as literal characters and aren't valid shell quoting if copied. Straight quotes match the rest of the command examples in this doc. (not blocking)

Suggested change
| **CLI** | `hotdata datasets create --sql “…”` | `hotdata databases create --catalog` + `databases load` |
| **CLI** | `hotdata datasets create --sql "…"` | `hotdata databases create --catalog` + `databases load` |

| **Declare schema up front** | No | Yes — `--table` on create (auto-declared on first `databases load`) |
| **Parquet file uploads** | Not supported via CLI | `databases load --file` / `--url` / `--upload-id` |
| **Refresh** | `datasets refresh` (re-runs source query) | Replace via `databases load` again |

**Rule of thumb:** CSV/JSON or “upload a file from a URL” → **datasets**. Parquet catalog you control as **`mydb.public.orders`** → **databases**.
**Rule of thumb:** SQL or saved-query materialization → **datasets**. Parquet files you control as **`mydb.public.orders`** → **databases**.

### Workflow: dataset upload and query

1. Authenticate and set workspace (`hotdata auth`, `hotdata workspaces set` if needed).
2. Create the dataset (one source):
2. Create the dataset — `--name` is the SQL table name (required); `--description` is the display label (optional):

```bash
hotdata datasets create --label "Orders" --file ./orders.csv
# or: --url "https://example.com/orders.parquet"
# or: --sql "SELECT ..." # materialize from a query
hotdata datasets create --name orders --sql "SELECT ..."
# or: --query-id <saved_query_id>
```

For parquet file uploads use **managed databases** instead (see below).

3. Note the printed **`full_name`** (e.g. `datasets.main.orders`) — do not assume `datasets.main`.
4. Inspect if needed: `hotdata datasets list`, `hotdata datasets <dataset_id>`.
5. Query:
Expand Down
Loading