From 022868e03875c36fd04f5944e2eb6bfa940936cc Mon Sep 17 00:00:00 2001 From: Eddie A Tejeda <669988+eddietejeda@users.noreply.github.com> Date: Wed, 22 Apr 2026 20:47:08 -0700 Subject: [PATCH 1/3] docs(skill): align Hotdata SKILL with current CLI flags - Document -o/--output instead of nonexistent --format - Add datasets create --upload-id and --format - Document HOTDATA_WORKSPACE lock, queries without -w, jobs list pagination - Mention global --debug; clarify workspaces list default marker --- skills/hotdata/SKILL.md | 39 ++++++++++++++++++++++++++------------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/skills/hotdata/SKILL.md b/skills/hotdata/SKILL.md index c94d3d6..b7e6c44 100644 --- a/skills/hotdata/SKILL.md +++ b/skills/hotdata/SKILL.md @@ -25,9 +25,17 @@ API key resolution (lowest to highest priority): API URL defaults to `https://api.hotdata.dev/v1` or overridden via `HOTDATA_API_URL`. +Optional: pass **`--debug`** on any command to print verbose HTTP request/response details. + ## Workspace ID -All commands that accept `--workspace-id` are optional. If omitted, the active workspace is used. Use `hotdata workspaces set` to switch the active workspace interactively, or pass a workspace ID directly: `hotdata workspaces set `. The active workspace is shown with a `*` marker in `hotdata workspaces list`. **Omit `--workspace-id` unless you need to target a specific workspace.** +Commands that accept `-w` / `--workspace-id` default to the active workspace from config when omitted. Use `hotdata workspaces set` to switch interactively, or `hotdata workspaces set ` for a direct choice. In `hotdata workspaces list`, the `*` marker labels the **default** workspace the CLI resolves to. + +**`hotdata queries` does not take `-w`:** query run history always uses the active workspace—set it with `workspaces set` first if needed. + +If **`HOTDATA_WORKSPACE`** is set in the environment, the workspace is **locked** to that value: passing a different `-w` / `--workspace-id` is an error, and **`hotdata workspaces set` fails** (“workspace is locked”). **`workspaces set` is also blocked** while the current process was started under **`hotdata sandbox run`** (nested workspace changes are not allowed in that tree). + +**Omit `-w` / `--workspace-id` unless you need to target a specific workspace** (and it is not locked by env or session). ## Workspace context (API) @@ -60,9 +68,9 @@ Full step-by-step procedures: [references/WORKFLOWS.md](references/WORKFLOWS.md) ### List Workspaces ``` -hotdata workspaces list [--format table|json|yaml] +hotdata workspaces list [-o table|json|yaml] ``` -Returns workspaces with `public_id`, `name`, `active`, `favorite`, `provision_status`. +Returns workspaces with `public_id`, `name`, `active`, `favorite`, `provision_status`. Table output marks the default workspace with `*`. ### List Connections ``` @@ -83,15 +91,15 @@ hotdata connections refresh [-w ] #### Step 1 — Discover available connection types ``` -hotdata connections create list [--workspace-id ] [--format table|json|yaml] +hotdata connections create list [-w ] [-o table|json|yaml] ``` Returns all available connection types with `name` and `label`. #### Step 2 — Inspect the schema for a specific type ``` -hotdata connections create list [--workspace-id ] [--format json] +hotdata connections create list [-w ] [-o json] ``` -Returns `config` and `auth` JSON Schema objects describing all required and optional fields for that connection type. Use `--format json` to get the full schema detail. +Returns `config` and `auth` JSON Schema objects describing all required and optional fields for that connection type. Use **`-o json`** to get the full schema detail. - `config` — connection configuration fields (host, port, database, etc.). May be `null` for services that need no configuration. - `auth` — authentication fields (password, token, credentials, etc.). May be `null` for services that need no authentication. May be a `oneOf` with multiple authentication method options. @@ -140,7 +148,7 @@ hotdata connections create \ ### List Tables and Columns ``` -hotdata tables list [--workspace-id ] [--connection-id ] [--schema ] [--table ] [--limit ] [--cursor ] [--format table|json|yaml] +hotdata tables list [-w ] [-c ] [--schema ] [--table ] [--limit ] [--cursor ] [-o table|json|yaml] ``` - Default format is `table`. - **Always use this command to inspect available tables and columns.** Do NOT use the `query` command to query `information_schema` for this purpose. @@ -156,7 +164,7 @@ Datasets are managed files uploaded to Hotdata and queryable as tables. #### List datasets ``` -hotdata datasets list [--workspace-id ] [--limit ] [--offset ] [--format table|json|yaml] +hotdata datasets list [-w ] [--limit ] [--offset ] [-o table|json|yaml] ``` - Default format is `table`. - Returns `id`, `label`, `table_name`, `created_at`. @@ -164,7 +172,7 @@ hotdata datasets list [--workspace-id ] [--limit ] [--offset #### Get dataset details ``` -hotdata datasets [--workspace-id ] [--format table|json|yaml] +hotdata datasets [-w ] [-o table|json|yaml] ``` - Shows dataset metadata and a full column listing with `name`, `data_type`, `nullable`. - Use this to inspect schema before querying. @@ -175,12 +183,14 @@ hotdata datasets create --label "My Dataset" --file data.csv [--table-name my_da hotdata datasets create --label "My Dataset" --sql "SELECT * FROM ..." [--table-name my_dataset] [--workspace-id ] hotdata datasets create --label "My Dataset" --query-id [--table-name my_dataset] [--workspace-id ] hotdata datasets create --label "My Dataset" --url "https://example.com/data.parquet" [--table-name my_dataset] [--workspace-id ] +hotdata datasets create --label "My Dataset" --upload-id [--format csv|json|parquet] [--table-name my_dataset] [-w ] ``` - `--file` uploads a local file. Omit to pipe data via stdin: `cat data.csv | hotdata datasets create --label "My Dataset"` - `--sql` creates a dataset from a SQL query result. - `--query-id` creates a dataset from a previously saved query. - `--url` imports data directly from a URL (supports csv, json, parquet). -- `--file`, `--sql`, `--query-id`, and `--url` are mutually exclusive. +- `--upload-id` uses an upload the API already accepted; **`--format`** (default `csv`) applies only with `--upload-id`. +- `--file`, `--sql`, `--query-id`, `--url`, and `--upload-id` are mutually exclusive. - Format is auto-detected from file extension (`.csv`, `.json`, `.parquet`) or file content. - `--label` is optional when `--file` is provided — defaults to the filename without extension. Required for `--sql` and `--query-id`. - `--table-name` is optional — derived from the label if omitted. @@ -251,11 +261,14 @@ hotdata results [-w ] [-o table|json|csv] hotdata queries list [--limit ] [--cursor ] [--status ] [-o table|json|yaml] hotdata queries [-o table|json|yaml] ``` +These commands use the **active workspace only** (there is no `-w` / `--workspace-id` on `queries`); set the default workspace with `workspaces set` if needed. - `list` shows query runs with status, creation time, duration, row count, and a truncated SQL preview (default limit 20). - `--status` filters by run status (comma-separated, e.g. `--status running,failed`). - View a run by ID to see full metadata (timings, `result_id`, snapshot, hashes) and the formatted, syntax-highlighted SQL. - If a run has a `result_id`, fetch its rows with `hotdata results `. +To create a dataset from a **saved query** still registered for the workspace, use **`hotdata datasets create --query-id `** (this CLI does not expose separate saved-query create/run subcommands). + ### Search ``` # BM25 full-text search @@ -286,8 +299,8 @@ hotdata indexes create -c --schema --table --na ### Jobs ``` -hotdata jobs list [--workspace-id ] [--job-type ] [--status ] [--all] [--format table|json|yaml] -hotdata jobs [--workspace-id ] [--format table|json|yaml] +hotdata jobs list [-w ] [--job-type ] [--status ] [--all] [--limit ] [--offset ] [-o table|json|yaml] +hotdata jobs [-w ] [-o table|json|yaml] ``` - `list` shows only active jobs (`pending`, `running`) by default. Use `--all` to see all jobs. - `--job-type`: `data_refresh_table`, `data_refresh_connection`, `create_index`. @@ -395,7 +408,7 @@ Other commands (not covered in detail above): `hotdata connections new` (interac ``` 2. Inspect the schema for the desired type: ``` - hotdata connections create list --format json + hotdata connections create list -o json ``` 3. Collect required config and auth field values from the user or environment. **Never hardcode credentials — use env vars or files.** 4. Create the connection: From 683513aeb79a7c5006a4a913936348b0b78821ac Mon Sep 17 00:00:00 2001 From: Eddie A Tejeda <669988+eddietejeda@users.noreply.github.com> Date: Wed, 22 Apr 2026 21:25:44 -0700 Subject: [PATCH 2/3] docs(skill): sandbox datasets, long flags, and WORKFLOWS - SKILL: document sandbox run vs active sandbox, datasets schema (datasets. vs main), full_name/FULL NAME, list scope, sandbox context for queries, PostgreSQL quoted identifiers - SKILL: use long CLI flags (--workspace-id, --output, --connection-id) - WORKFLOWS: new Sandboxes and datasets section; align Chain, Model refresh, History, Indexes examples; cross-cutting pointers --- skills/hotdata/SKILL.md | 103 ++++++++++++++----------- skills/hotdata/references/WORKFLOWS.md | 53 +++++++++---- 2 files changed, 97 insertions(+), 59 deletions(-) diff --git a/skills/hotdata/SKILL.md b/skills/hotdata/SKILL.md index b7e6c44..c6fe9c7 100644 --- a/skills/hotdata/SKILL.md +++ b/skills/hotdata/SKILL.md @@ -29,13 +29,13 @@ Optional: pass **`--debug`** on any command to print verbose HTTP request/respon ## Workspace ID -Commands that accept `-w` / `--workspace-id` default to the active workspace from config when omitted. Use `hotdata workspaces set` to switch interactively, or `hotdata workspaces set ` for a direct choice. In `hotdata workspaces list`, the `*` marker labels the **default** workspace the CLI resolves to. +Commands that accept `--workspace-id` default to the active workspace from config when omitted. Use `hotdata workspaces set` to switch interactively, or `hotdata workspaces set ` for a direct choice. In `hotdata workspaces list`, the `*` marker labels the **default** workspace the CLI resolves to. -**`hotdata queries` does not take `-w`:** query run history always uses the active workspace—set it with `workspaces set` first if needed. +**`hotdata queries` does not accept `--workspace-id`:** query run history always uses the active workspace—set it with `workspaces set` first if needed. -If **`HOTDATA_WORKSPACE`** is set in the environment, the workspace is **locked** to that value: passing a different `-w` / `--workspace-id` is an error, and **`hotdata workspaces set` fails** (“workspace is locked”). **`workspaces set` is also blocked** while the current process was started under **`hotdata sandbox run`** (nested workspace changes are not allowed in that tree). +If **`HOTDATA_WORKSPACE`** is set in the environment, the workspace is **locked** to that value: passing a different `--workspace-id` is an error, and **`hotdata workspaces set` fails** (“workspace is locked”). **`workspaces set` is also blocked** while the current process was started under **`hotdata sandbox run`** (nested workspace changes are not allowed in that tree). -**Omit `-w` / `--workspace-id` unless you need to target a specific workspace** (and it is not locked by env or session). +**Omit `--workspace-id` unless you need to target a specific workspace** (and it is not locked by env or session). ## Workspace context (API) @@ -68,21 +68,21 @@ Full step-by-step procedures: [references/WORKFLOWS.md](references/WORKFLOWS.md) ### List Workspaces ``` -hotdata workspaces list [-o table|json|yaml] +hotdata workspaces list [--output table|json|yaml] ``` Returns workspaces with `public_id`, `name`, `active`, `favorite`, `provision_status`. Table output marks the default workspace with `*`. ### List Connections ``` -hotdata connections list [-w ] [-o table|json|yaml] -hotdata connections [-w ] [-o table|json|yaml] +hotdata connections list [--workspace-id ] [--output table|json|yaml] +hotdata connections [--workspace-id ] [--output table|json|yaml] ``` - `list` returns `id`, `name`, `source_type` for each connection. - Pass a connection ID to view details (id, name, source type, table counts). ### Refresh connection schema ``` -hotdata connections refresh [-w ] +hotdata connections refresh [--workspace-id ] ``` - Refreshes the connection’s catalog so new or changed tables and columns appear in `hotdata tables list` and queries. - Use after DDL or other changes in the source database when the workspace view is stale. @@ -91,15 +91,15 @@ hotdata connections refresh [-w ] #### Step 1 — Discover available connection types ``` -hotdata connections create list [-w ] [-o table|json|yaml] +hotdata connections create list [--workspace-id ] [--output table|json|yaml] ``` Returns all available connection types with `name` and `label`. #### Step 2 — Inspect the schema for a specific type ``` -hotdata connections create list [-w ] [-o json] +hotdata connections create list [--workspace-id ] [--output json] ``` -Returns `config` and `auth` JSON Schema objects describing all required and optional fields for that connection type. Use **`-o json`** to get the full schema detail. +Returns `config` and `auth` JSON Schema objects describing all required and optional fields for that connection type. Use **`--output json`** to get the full schema detail. - `config` — connection configuration fields (host, port, database, etc.). May be `null` for services that need no configuration. - `auth` — authentication fields (password, token, credentials, etc.). May be `null` for services that need no authentication. May be a `oneOf` with multiple authentication method options. @@ -148,7 +148,7 @@ hotdata connections create \ ### List Tables and Columns ``` -hotdata tables list [-w ] [-c ] [--schema ] [--table ] [--limit ] [--cursor ] [-o table|json|yaml] +hotdata tables list [--workspace-id ] [--connection-id ] [--schema ] [--table ] [--limit ] [--cursor ] [--output table|json|yaml] ``` - Default format is `table`. - **Always use this command to inspect available tables and columns.** Do NOT use the `query` command to query `information_schema` for this purpose. @@ -164,18 +164,20 @@ Datasets are managed files uploaded to Hotdata and queryable as tables. #### List datasets ``` -hotdata datasets list [-w ] [--limit ] [--offset ] [-o table|json|yaml] +hotdata datasets list [--workspace-id ] [--limit ] [--offset ] [--output table|json|yaml] ``` - Default format is `table`. -- Returns `id`, `label`, `table_name`, `created_at`. +- Returns `id`, `label`, and `created_at`; table output includes a **`FULL NAME`** column (`datasets..
`). - Results are paginated (default 100). Use `--offset` to fetch further pages. +- **There is no filter for “this sandbox only.”** `datasets list` always returns **all** datasets in the workspace. To tell sandbox-scoped datasets from workspace-wide ones, read **`FULL NAME`**: the middle segment is the sandbox id (e.g. `datasets.s_ufmblmvq.tac_csat`) for sandbox data, and usually **`main`** (e.g. `datasets.main.my_table`) for ordinary uploads. #### Get dataset details ``` -hotdata datasets [-w ] [-o table|json|yaml] +hotdata datasets [--workspace-id ] [--output table|json|yaml] ``` - Shows dataset metadata and a full column listing with `name`, `data_type`, `nullable`. - Use this to inspect schema before querying. +- For the **qualified SQL name**, prefer **`FULL NAME` from `datasets list`** or the **`full_name` printed by `datasets create`**—especially for sandbox datasets, where the schema is **`datasets.`**, not `datasets.main`. #### Create a dataset ``` @@ -183,7 +185,7 @@ hotdata datasets create --label "My Dataset" --file data.csv [--table-name my_da hotdata datasets create --label "My Dataset" --sql "SELECT * FROM ..." [--table-name my_dataset] [--workspace-id ] hotdata datasets create --label "My Dataset" --query-id [--table-name my_dataset] [--workspace-id ] hotdata datasets create --label "My Dataset" --url "https://example.com/data.parquet" [--table-name my_dataset] [--workspace-id ] -hotdata datasets create --label "My Dataset" --upload-id [--format csv|json|parquet] [--table-name my_dataset] [-w ] +hotdata datasets create --label "My Dataset" --upload-id [--format csv|json|parquet] [--table-name my_dataset] [--workspace-id ] ``` - `--file` uploads a local file. Omit to pipe data via stdin: `cat data.csv | hotdata datasets create --label "My Dataset"` - `--sql` creates a dataset from a SQL query result. @@ -194,28 +196,34 @@ hotdata datasets create --label "My Dataset" --upload-id [--format c - Format is auto-detected from file extension (`.csv`, `.json`, `.parquet`) or file content. - `--label` is optional when `--file` is provided — defaults to the filename without extension. Required for `--sql` and `--query-id`. - `--table-name` is optional — derived from the label if omitted. +- After **`datasets create`**, the CLI prints a **`full_name`** line (for example `datasets.main.my_table` or `datasets.s_ufmblmvq.tac_csat`). **Always use that `full_name` in SQL**—do not assume `datasets.main`. #### Querying datasets -Datasets are queryable using the catalog `datasets` and schema `main`. Always reference dataset tables as: +Workspace-scoped datasets (created **outside** a sandbox, or the usual “main” catalog path) are referenced as **`datasets.main.`**. + +**Sandbox-created datasets** use the **sandbox id as the schema**, not `main`, for example: ``` -datasets.main. +datasets.. ``` -For example: +(e.g. `datasets.s_ufmblmvq.tac_csat`). The create output’s **`full_name`** is authoritative—copy it into `FROM` / `JOIN` clauses instead of guessing `datasets.main.…`. + +Example (workspace dataset on `main`): ``` hotdata query "SELECT * FROM datasets.main.my_dataset LIMIT 10" ``` -Use `hotdata datasets ` to look up the `table_name` before writing queries. + +Use `hotdata datasets ` to inspect schema and names before writing queries. ### Workspace context (named Markdown) Reads and writes workspace **context API** documents. **`show`** needs no local file; **`push`** / **`pull`** use **`./.md`** in the current directory only as the CLI transport format. See [Workspace context (API)](#workspace-context-api). ``` -hotdata context list [-w ] [--prefix ] [-o table|json|yaml] -hotdata context show [-w ] -hotdata context pull [-w ] [--force] [--dry-run] -hotdata context push [-w ] [--dry-run] +hotdata context list [--workspace-id ] [--prefix ] [--output table|json|yaml] +hotdata context show [--workspace-id ] +hotdata context pull [--workspace-id ] [--force] [--dry-run] +hotdata context push [--workspace-id ] [--dry-run] ``` - `list` — names, `updated_at`, and character counts for each stored context. Use `--prefix` to narrow names (case-sensitive). @@ -227,13 +235,13 @@ hotdata context push [-w ] [--dry-run] ### Execute SQL Query ``` -hotdata query "" [-w ] [--connection ] [-o table|json|csv] -hotdata query status [-o table|json|csv] +hotdata query "" [--workspace-id ] [--connection ] [--output table|json|csv] +hotdata query status [--output table|json|csv] ``` - Default output is `table`, which prints results with row count and execution time. - Use `--connection` to scope the query to a specific connection. - Use `hotdata tables list` to discover tables and columns — do not query `information_schema` directly. -- **Always use PostgreSQL dialect SQL.** +- **Always use PostgreSQL dialect SQL.** Column names that are **not** all-lowercase (e.g. from CSV headers like `CustomerName`) are **case-sensitive**; quote them with **double quotes** in SQL, e.g. `"CustomerName"`. - Long-running queries automatically fall back to async execution and return a `query_run_id`. - Use `hotdata query status ` to poll for results. - Exit codes for `query status`: `0` = succeeded, `1` = failed, `2` = still running (poll again). @@ -242,7 +250,7 @@ hotdata query status [-o table|json|csv] ### Query results #### List stored results ``` -hotdata results list [-w ] [--limit ] [--offset ] [-o table|json|yaml] +hotdata results list [--workspace-id ] [--limit ] [--offset ] [--output table|json|yaml] ``` - Lists recent stored query results with `id`, `status`, and `created_at`. - Results are paginated; when more are available, the CLI prints a hint with the next `--offset`. @@ -250,7 +258,7 @@ hotdata results list [-w ] [--limit ] [--offset ] [-o ta #### Get result by ID ``` -hotdata results [-w ] [-o table|json|csv] +hotdata results [--workspace-id ] [--output table|json|csv] ``` - Retrieves a previously executed query result by its result ID. - Query output also includes a `result-id` in the footer (e.g. `[result-id: rslt...]`). @@ -258,10 +266,10 @@ hotdata results [-w ] [-o table|json|csv] ### Query Run History ``` -hotdata queries list [--limit ] [--cursor ] [--status ] [-o table|json|yaml] -hotdata queries [-o table|json|yaml] +hotdata queries list [--limit ] [--cursor ] [--status ] [--output table|json|yaml] +hotdata queries [--output table|json|yaml] ``` -These commands use the **active workspace only** (there is no `-w` / `--workspace-id` on `queries`); set the default workspace with `workspaces set` if needed. +These commands use the **active workspace only** (the `queries` command has no `--workspace-id` flag); set the default workspace with `workspaces set` if needed. - `list` shows query runs with status, creation time, duration, row count, and a truncated SQL preview (default limit 20). - `--status` filters by run status (comma-separated, e.g. `--status running,failed`). - View a run by ID to see full metadata (timings, `result_id`, snapshot, hashes) and the formatted, syntax-highlighted SQL. @@ -272,7 +280,7 @@ To create a dataset from a **saved query** still registered for the workspace, u ### Search ``` # BM25 full-text search -hotdata search "query text" --table --column [--select ] [--limit ] [-o table|json|csv] +hotdata search "query text" --table --column [--select ] [--limit ] [--output table|json|csv] # Vector search with --model (calls OpenAI to embed the query) hotdata search "query text" --table
--column --model text-embedding-3-small [--limit ] @@ -290,8 +298,8 @@ echo '[0.1, -0.2, ...]' | hotdata search --table
--column --schema --table
[-w ] [-o table|json|yaml] -hotdata indexes create -c --schema --table
--name --columns [--type sorted|bm25|vector] [--metric l2|cosine|dot] [--async] +hotdata indexes list --connection-id --schema --table
[--workspace-id ] [--output table|json|yaml] +hotdata indexes create --connection-id --schema --table
--name --columns [--workspace-id ] [--type sorted|bm25|vector] [--metric l2|cosine|dot] [--async] ``` - `list` shows indexes on a table with name, type, columns, status, and creation date. - `create` creates an index. Use `--type bm25` for full-text search, `--type vector` for vector search (requires `--metric`). @@ -299,8 +307,8 @@ hotdata indexes create -c --schema --table
--na ### Jobs ``` -hotdata jobs list [-w ] [--job-type ] [--status ] [--all] [--limit ] [--offset ] [-o table|json|yaml] -hotdata jobs [-w ] [-o table|json|yaml] +hotdata jobs list [--workspace-id ] [--job-type ] [--status ] [--all] [--limit ] [--offset ] [--output table|json|yaml] +hotdata jobs [--workspace-id ] [--output table|json|yaml] ``` - `list` shows only active jobs (`pending`, `running`) by default. Use `--all` to see all jobs. - `--job-type`: `data_refresh_table`, `data_refresh_connection`, `create_index`. @@ -318,15 +326,17 @@ hotdata auth logout # Remove saved auth for the default profile Sandboxes are for **ad-hoc, exploratory work** that does not need to be long-lived. They group related CLI activity (queries, dataset operations, etc.) under a single context so it can be tracked and cleaned up together. **Datasets created inside a sandbox are tied to that sandbox and will be removed when the sandbox ends.** If you need data to persist beyond the sandbox, create datasets outside of a sandbox context. +**Active sandbox in config vs `sandbox run`:** If you already have the right sandbox selected (`hotdata sandbox new` or `hotdata sandbox set ` shows it with `*` in `sandbox list`), run follow-up commands **directly** (`hotdata datasets create …`, `hotdata query …`, etc.). The CLI attaches the sandbox from saved config to API requests. **`hotdata sandbox run ` with no sandbox ID before `run` always creates a brand-new sandbox** and runs the child under that new ID—it does **not** reuse the active sandbox from config. To wrap a command in an **existing** sandbox, use **`hotdata sandbox run [args…]`**. + > **IMPORTANT: If `HOTDATA_SANDBOX` is set in the environment, you are inside an active sandbox. NEVER attempt to unset, override, or work around this variable. Do not clear it, do not start a new sandbox, do not run `sandbox run` or `sandbox new` or `sandbox set`. All your work should be attributed to the current sandbox. Attempting to nest or escape a sandbox will fail with an error.** ``` -hotdata sandbox list [-w ] [-o table|json|yaml] -hotdata sandbox [-w ] [-o table|json|yaml] -hotdata sandbox new [--name "Sandbox Name"] [-o table|json|yaml] +hotdata sandbox list [--workspace-id ] [--output table|json|yaml] +hotdata sandbox [--workspace-id ] [--output table|json|yaml] +hotdata sandbox new [--name "Sandbox Name"] [--output table|json|yaml] hotdata sandbox set [] hotdata sandbox read -hotdata sandbox update [] [--name "New Name"] [--markdown "..."] [-o table|json|yaml] +hotdata sandbox update [] [--name "New Name"] [--markdown "..."] [--output table|json|yaml] hotdata sandbox run [args...] hotdata sandbox run [args...] ``` @@ -336,8 +346,10 @@ hotdata sandbox run [args...] - `set` switches the active sandbox. Omit the ID to clear. Blocked inside an existing sandbox. - `read` prints the markdown content of the current sandbox. Use this to retrieve sandbox state at the start of work or between steps. - `update` modifies a sandbox's name or markdown. Defaults to the active sandbox if no ID is given. The `--markdown` field is for writing details about the work being done in the sandbox — observations, intermediate findings, next steps, etc. This state persists for the life of the sandbox and is the primary way to record context that should survive across commands or agent invocations within the sandbox. -- `run` launches a command with `HOTDATA_SANDBOX` and `HOTDATA_WORKSPACE` set in the child process environment. Creates a new sandbox unless a sandbox ID is provided before `run`. Blocked inside an existing sandbox. -- When inside a sandbox (HOTDATA_SANDBOX is set), all API requests automatically include the sandbox ID — no extra flags needed. +- `run` launches a command with `HOTDATA_SANDBOX` and `HOTDATA_WORKSPACE` set in the child process environment. **`hotdata sandbox run `** (no ID before `run`) **always POSTs a new sandbox**; it never picks up the active sandbox from `sandbox set` / `sandbox new`. Use **`hotdata sandbox run `** to run under an existing sandbox. Blocked inside an existing sandbox. +- When `HOTDATA_SANDBOX` is set **or** a sandbox is the saved default (`sandbox new` / `sandbox set`), the CLI includes sandbox scope on API calls — no extra sandbox flags on `query`, `datasets`, etc. + +**Sandbox-scoped data access:** Queries and other operations against **sandbox-only** resources must run with sandbox context attached—either the **active sandbox** in config (`sandbox set`) or a child process started with **`hotdata sandbox run …`** (which sets `HOTDATA_SANDBOX`). Running `hotdata query` or similar **with no sandbox in config and not under `sandbox … run`** can produce **access denied** for tables or datasets that exist only inside a sandbox. #### Example: Building a data model in a sandbox @@ -395,9 +407,10 @@ Other commands (not covered in detail above): `hotdata connections new` (interac ``` hotdata tables list --connection-id ``` -4. Run SQL: +4. Run SQL, quoting **mixed-case or upper-case** column names with **double quotes** (PostgreSQL treats unquoted identifiers as lowercased): ``` hotdata query "SELECT 1" + hotdata query "SELECT \"CustomerName\" FROM datasets.main.my_csv LIMIT 10" ``` ## Workflow: Creating a Connection @@ -408,7 +421,7 @@ Other commands (not covered in detail above): `hotdata connections new` (interac ``` 2. Inspect the schema for the desired type: ``` - hotdata connections create list -o json + hotdata connections create list --output json ``` 3. Collect required config and auth field values from the user or environment. **Never hardcode credentials — use env vars or files.** 4. Create the connection: diff --git a/skills/hotdata/references/WORKFLOWS.md b/skills/hotdata/references/WORKFLOWS.md index 2d269ee..b651b04 100644 --- a/skills/hotdata/references/WORKFLOWS.md +++ b/skills/hotdata/references/WORKFLOWS.md @@ -1,6 +1,6 @@ # Hotdata CLI workflows -Procedures for **Model**, **History**, **Chain**, and **Indexes**. These compose existing `hotdata` commands; they are not separate subcommands. +Procedures for **Model**, **History**, **Chain**, **Indexes**, and **sandboxes with datasets** (see **Sandboxes and datasets**). These compose existing `hotdata` commands; they are not separate subcommands. ## Where things live @@ -8,7 +8,7 @@ Procedures for **Model**, **History**, **Chain**, and **Indexes**. These compose |--------|----------| | **Model** | **Workspace context API** — stem **`DATAMODEL`** (`hotdata context show DATAMODEL`, `context push` / `pull` with `./DATAMODEL.md` in the project cwd only as the CLI file surface). Never store workspace-specific model text inside agent skill directories. | | **History** | `hotdata queries list` / `queries ` for query runs (execution history); `hotdata results list` / `results ` for row data. | -| **Chain** | Intermediate tables in **`datasets.main.*`**; document stable chains in **workspace context `DATAMODEL`** under **Derived tables (Chain)**. | +| **Chain** | Intermediate tables in **`datasets..
`** — usually **`datasets.main.*`** for workspace-wide materializations; **sandbox uploads** use **`datasets..*`** (see **Sandboxes and datasets** below). Document stable chains in **workspace context `DATAMODEL`** under **Derived tables (Chain)**. | | **Indexes** | Recommendations and live objects in Hotdata (`indexes list` / `indexes create`). Record rationale in **`DATAMODEL`** (e.g. Search & index summary) or a dedicated context stem if you split concerns. | --- @@ -42,6 +42,8 @@ hotdata datasets list hotdata datasets # schema detail per dataset ``` +`datasets list` returns **every** dataset in the workspace (no sandbox-only filter). Use the **`FULL NAME`** column (`datasets..
`): **`main`** in the middle segment is the usual workspace catalog; a value like **`s_…`** is the **sandbox id** for sandbox-scoped datasets. + Use output to update **Connections**, **Tables**, **Columns**, and **Datasets** in **workspace context `DATAMODEL`** (edit via `./DATAMODEL.md` + `hotdata context push DATAMODEL`, or your editor workflow). Optional: small exploratory queries once names are known: ```bash @@ -52,6 +54,22 @@ hotdata query "SELECT * FROM ..
LIMIT 5" --- +## Sandboxes and datasets + +Use this when work is isolated in a **sandbox** (exploratory runs, ephemeral datasets). + +**Active sandbox vs `sandbox run`:** After `hotdata sandbox new` or `hotdata sandbox set `, run **`hotdata datasets create`**, **`hotdata query`**, etc. **directly** — the CLI attaches the sandbox from saved config. **`hotdata sandbox run `** (no sandbox id before `run`) **always creates a new sandbox**; it does **not** reuse the active one. To wrap a command in an **existing** sandbox, use **`hotdata sandbox run [args…]`**. + +**Qualified table names:** Workspace-wide dataset tables are typically **`datasets.main.`**. Datasets created **inside** a sandbox use **`datasets..`**, not `main`. After **`datasets create`**, use the printed **`full_name`**; after **`datasets list`**, use the **`FULL NAME`** column — do not assume `datasets.main` for sandbox data. + +**Access:** Queries against sandbox-only tables need sandbox context: **active sandbox in config** (`sandbox set`) **or** commands run under **`hotdata sandbox run …`**. Otherwise you may see **access denied**. + +**Listing:** `datasets list` does not filter by sandbox; use **`FULL NAME`** to distinguish `…main…` from `…s_…` rows. + +**SQL:** Column names from uploads that are not all-lowercase are **case-sensitive** in PostgreSQL; quote with double quotes (e.g. `"CustomerName"`). + +--- + ## History **Goal:** Find prior work: query runs (execution history) and stored result rows. @@ -68,8 +86,8 @@ hotdata queries ### Results ```bash -hotdata results list [-w ] [--limit N] [--offset N] -hotdata results [-w ] +hotdata results list [--workspace-id ] [--limit N] [--offset N] +hotdata results [--workspace-id ] ``` Query footers include a `result-id` when applicable—record it for later, or pick it up from `queries `. **Prefer `hotdata results ` over re-running identical heavy SQL.** @@ -80,7 +98,7 @@ Query footers include a `result-id` when applicable—record it for later, or pi **Goal:** Follow-up analysis on a **bounded** intermediate without rescanning huge base tables. -**Pattern:** materialize → query `datasets.main.*`. +**Pattern:** materialize → query using the dataset’s **qualified name** (`datasets..
`). 1. **Base** — run SQL: @@ -101,14 +119,20 @@ Query footers include a `result-id` when applicable—record it for later, or pi hotdata datasets create --label "from saved" --query-id [--table-name ...] ``` -3. **Chain** — query the dataset: + Note the **`full_name`** line in the output (e.g. `datasets.main.chain_revenue_slice` or `datasets.s_….…` inside a sandbox). + +3. **Chain** — query the dataset using that **`full_name`** (or **`FULL NAME`** from `datasets list`); do not hardcode `datasets.main` if the schema segment is a sandbox id: ```bash - hotdata datasets list # find table_name if needed - hotdata query "SELECT * FROM datasets.main. WHERE ..." + hotdata datasets list # FULL NAME column: datasets..
+ hotdata query "SELECT * FROM datasets.main. WHERE ..." # workspace / no sandbox + # Sandbox example (use the actual full_name from create or list): + # hotdata query "SELECT * FROM datasets.s_ufmblmvq. WHERE ..." ``` -**Naming:** Prefer predictable `--table-name` values, e.g. `chain__`, and list long-lived chains in **DATAMODEL → Derived tables (Chain)** in workspace context. + For **sandbox-scoped** chain tables, ensure an **active sandbox** (`sandbox set`) or run the query inside **`hotdata sandbox run hotdata query "…"`**. Quote mixed-case columns: e.g. `"Revenue"`. + +**Naming:** Prefer predictable `--table-name` values, e.g. `chain__`, and list long-lived chains in **DATAMODEL → Derived tables (Chain)** in workspace context (record the **full** `datasets..
` you use in SQL). --- @@ -138,7 +162,7 @@ High-cardinality **text** columns (`title`, `body`, `description`, …) may warr For each `connection.schema.table` you care about: ```bash -hotdata indexes list -c --schema --table
[-w ] +hotdata indexes list --connection-id --schema --table
[--workspace-id ] ``` Skip creating a duplicate: same table + overlapping columns + same purpose (e.g. another bm25 on the same column). @@ -149,15 +173,15 @@ Use stable names (e.g. `idx_
__`). Examples: ```bash # Sorted (default) — filters, joins, ordering on scalar columns -hotdata indexes create -c --schema --table
\ +hotdata indexes create --connection-id --schema --table
\ --name idx_orders_created --columns created_at --type sorted # BM25 — full-text on one text column (required for bm25_search on that column) -hotdata indexes create -c --schema --table
\ +hotdata indexes create --connection-id --schema --table
\ --name idx_posts_body_bm25 --columns body --type bm25 # Vector — embeddings; requires --metric -hotdata indexes create -c --schema --table
\ +hotdata indexes create --connection-id --schema --table
\ --name idx_chunks_embedding --columns embedding --type vector --metric l2 ``` @@ -177,5 +201,6 @@ Re-run representative **`hotdata query`** or **`hotdata search`** workloads. Upd ## Cross-cutting -- **Workspace:** Use active workspace or `-w` / `--workspace-id` when targeting a non-default workspace. +- **Workspace:** Use active workspace or `--workspace-id` when targeting a non-default workspace. +- **Sandboxes:** See **Sandboxes and datasets** above (`sandbox run` vs direct commands, `full_name`, access denied without context). - **Jobs:** For async work (indexes, some refreshes), `hotdata jobs list` and `hotdata jobs `. From 4042f3ea2762fbbc9bb2ca7ce42d52b2d4567e01 Mon Sep 17 00:00:00 2001 From: Eddie Tejeda <669988+eddietejeda@users.noreply.github.com> Date: Thu, 23 Apr 2026 18:47:50 -0700 Subject: [PATCH 3/3] docs(skill): unify dataset SQL as datasets..
Align the Chain workflow bullet and Querying datasets section on the qualified name pattern; keep full_name guidance and workspace example. --- skills/hotdata/SKILL.md | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/skills/hotdata/SKILL.md b/skills/hotdata/SKILL.md index c6fe9c7..0e71ffe 100644 --- a/skills/hotdata/SKILL.md +++ b/skills/hotdata/SKILL.md @@ -59,7 +59,7 @@ These are **patterns** built from the commands below—not separate CLI subcomma - **Model** — Markdown semantic map of your workspace (entities, keys, joins). **Store and read it via workspace context** (`hotdata context show DATAMODEL`, `context push DATAMODEL`); refresh content using `connections`, `connections refresh`, `tables list`, and `datasets list`. For a **deep** modeling pass (connector enrichment, indexes, per-table detail), see [references/MODEL_BUILD.md](references/MODEL_BUILD.md). - **History** — Inspect prior activity via `hotdata queries list` (query runs) and `hotdata results list` / `results ` (row data). -- **Chain** — Follow-ups via **`datasets create`** then `query` against `datasets.main.
`. +- **Chain** — Follow-ups via **`datasets create`** then `query` against `datasets..
`. - **Indexes** — Review SQL and schema, compare to existing indexes, create **sorted**, **bm25**, or **vector** indexes when it clearly helps; see [references/WORKFLOWS.md](references/WORKFLOWS.md#indexes). Full step-by-step procedures: [references/WORKFLOWS.md](references/WORKFLOWS.md). @@ -200,13 +200,7 @@ hotdata datasets create --label "My Dataset" --upload-id [--format c #### Querying datasets -Workspace-scoped datasets (created **outside** a sandbox, or the usual “main” catalog path) are referenced as **`datasets.main.`**. - -**Sandbox-created datasets** use the **sandbox id as the schema**, not `main`, for example: -``` -datasets.. -``` -(e.g. `datasets.s_ufmblmvq.tac_csat`). The create output’s **`full_name`** is authoritative—copy it into `FROM` / `JOIN` clauses instead of guessing `datasets.main.…`. +Qualified dataset tables are **`datasets..`**: **`main`** for workspace-scoped datasets (created outside a sandbox), or the **sandbox id** for sandbox-created data (e.g. `datasets.s_ufmblmvq.tac_csat`). The create output’s **`full_name`** is authoritative—copy it into `FROM` / `JOIN` clauses instead of guessing `datasets.main.…`. Example (workspace dataset on `main`): ```