diff --git a/skills/hotdata/SKILL.md b/skills/hotdata/SKILL.md index a40b4eb..8cb3b35 100644 --- a/skills/hotdata/SKILL.md +++ b/skills/hotdata/SKILL.md @@ -1,6 +1,6 @@ --- name: hotdata -description: Use this skill when the user wants to run hotdata CLI commands, query the Hotdata API, list workspaces, list connections, create connections, list tables, manage datasets, execute SQL queries, inspect query run history, search tables, manage indexes, manage sandboxes, manage workspace context and the data model via the context API (`hotdata context`), or interact with the hotdata service. Activate when the user says "run hotdata", "query hotdata", "list workspaces", "list connections", "create a connection", "list tables", "list datasets", "create a dataset", "upload a dataset", "execute a query", "search a table", "list indexes", "create an index", "list query runs", "list past queries", "query history", "list sandboxes", "create a sandbox", "run a sandbox", "workspace context", "pull context", "push context", "data model", or asks you to use the hotdata CLI. +description: Use this skill when the user wants to run hotdata CLI commands, query the Hotdata API, list workspaces, list connections, create connections, list tables, manage datasets, execute SQL queries, inspect query run history, search tables, manage indexes, manage sandboxes, manage workspace context and stored docs such as context:DATAMODEL via the context API (`hotdata context`), or interact with the hotdata service. Activate when the user says "run hotdata", "query hotdata", "list workspaces", "list connections", "create a connection", "list tables", "list datasets", "create a dataset", "upload a dataset", "execute a query", "search a table", "list indexes", "create an index", "list query runs", "list past queries", "query history", "list sandboxes", "create a sandbox", "run a sandbox", "workspace context", "pull context", "push context", "data model", "context:DATAMODEL", or asks you to use the hotdata CLI. version: 0.1.12 --- @@ -39,25 +39,37 @@ If **`HOTDATA_WORKSPACE`** is set in the environment, the workspace is **locked* ## Workspace context (API) -The workspace stores **named Markdown documents** only through the Hotdata **context API** (`/v1/context`). The **authoritative** copy always lives on the server under a **name** (stem) such as `DATAMODEL` or `GLOSSARY`. +**Notation `context:`:** In this skill, **`context:DATAMODEL`**, **`context:GLOSSARY`**, and **`context:`** mean the **authoritative Markdown document** stored on the server under that **stem** via the Hotdata **context API** (`/v1/context`, `hotdata context …`). That is **not** the same as generic English (“a data model”, “a glossary”), and **not** the same as local `./DATAMODEL.md` except as **pull/push transport**. **CLI commands use the bare stem** (no `context:` prefix): e.g. `hotdata context show DATAMODEL`, `hotdata context push GLOSSARY`. -The CLI command **`hotdata context push`** reads **`./.md`** and **`pull`** writes that file in the **current working directory**—those files exist only as a **transport surface** for the API, not as a second source of truth. **`hotdata context show `** prints Markdown to stdout so agents can read the model **without** any local file. Context names follow SQL table–identifier rules (ASCII letters, digits, underscore; no dot in the API name; max 128 characters; SQL reserved words are not allowed). +The workspace stores those documents only through the **context API**. The **authoritative** copy always lives on the server under the stem; common stems are **`context:DATAMODEL`** (semantic map) and **`context:GLOSSARY`** (glossary / runbooks). -**Agents (Claude and similar): treat workspace context as the only store for the data model and shared narrative docs.** +The CLI command **`hotdata context push`** reads **`./.md`** and **`pull`** writes that file in the **current working directory**—those files exist only as a **transport surface** for the API, not as a second source of truth. **`hotdata context show `** prints Markdown to stdout so agents can read **`context:`** without any local file. Stems follow SQL table–identifier rules (ASCII letters, digits, underscore; no dot in the API name; max 128 characters; SQL reserved words are not allowed). -1. **Before** planning queries, explaining schema, or modeling, load the workspace: `hotdata context show DATAMODEL` (and `hotdata context list` for other stems such as `GLOSSARY`). Handle a missing context by starting from [references/DATA_MODEL.template.md](references/DATA_MODEL.template.md) and pushing when ready. -2. **After** you change the model, persist it with **`hotdata context push DATAMODEL`**. The CLI requires a local `./DATAMODEL.md` for that step: write the body there (from `context show`, the template, or your edits), then run `push` from the project directory. -3. Use **`hotdata context pull DATAMODEL`** when you intentionally want a local `./DATAMODEL.md` copy (for example a human editor); it still reflects API state, not a parallel document. +**Agents (Claude and similar):** use workspace context as the only durable store for **context:DATAMODEL**, **context:GLOSSARY**, and any other **`context:`** documents you introduce. Keep transient analysis notes in **sandbox markdown** or the conversation until you **promote** them into **context:DATAMODEL** when they should guide the whole workspace ([details below](#analysis-modeling-vs-contextdatamodel)). -The standard stem for the workspace semantic model is **`DATAMODEL`**. Add other stems the same way (e.g. **`GLOSSARY`**) for glossary or runbooks. +1. **Before** planning non-trivial queries, explaining schema to others, or editing **context:DATAMODEL**, load it: `hotdata context show DATAMODEL` (and `hotdata context list` for other stems such as **context:GLOSSARY**). Handle a missing document by starting from [references/DATA_MODEL.template.md](references/DATA_MODEL.template.md) and pushing when ready. +2. **After** you change **context:DATAMODEL**, persist with **`hotdata context push DATAMODEL`**. The CLI requires a local `./DATAMODEL.md` for that step: write the body there (from `context show`, the template, or your edits), then run `push` from the project directory. +3. Use **`hotdata context pull DATAMODEL`** when you intentionally want a local `./DATAMODEL.md` copy (for example a human editor); it still reflects API state for **context:DATAMODEL**, not a parallel document. -Use [references/DATA_MODEL.template.md](references/DATA_MODEL.template.md) and [references/MODEL_BUILD.md](references/MODEL_BUILD.md) for **what to write inside** the Markdown you store in context. Never put workspace-specific model text inside agent skill install paths—only in **workspace context** (and transient `./.md` for push/pull when needed). +The standard stem for the workspace semantic map is **`DATAMODEL`** (skill notation **context:DATAMODEL**). Add other stems the same way (e.g. **`GLOSSARY`** → **context:GLOSSARY**) for glossary or runbooks. + +### Analysis modeling vs context:DATAMODEL + +Keep two layers separate: + +- **Analysis modeling (day to day)** — Understanding data *for the current task*: exploratory SQL, join checks, column semantics for one report, hypotheses, scratch notes. Often conversational or short-lived. **Sandbox markdown** (`sandbox update --markdown`) is the right home while you explore; it dies with the sandbox unless you copy it elsewhere. + +- **context:DATAMODEL (Hotdata workspace data model)** — A **durable, workspace-wide** map stored only via the **context API**: entities and tables across connections, PK/FK relationships, how datasets tie back to sources, naming and query conventions the **whole team** should rely on. This is **higher-level shared structure**, not a transcript of one investigation. + +**Promotion:** When analysis findings should **outlive** the sandbox or session and **guide everyone**, merge them into **context:DATAMODEL** (`hotdata context show DATAMODEL` → reconcile → `hotdata context push DATAMODEL`). You do **not** need to update **context:DATAMODEL** after every ad-hoc query—only when the workspace story or join graph meaningfully changes. + +Use [references/DATA_MODEL.template.md](references/DATA_MODEL.template.md) and [references/MODEL_BUILD.md](references/MODEL_BUILD.md) for **what to write inside** the Markdown you store under **context:** stems. Never put workspace-specific model text inside agent skill install paths—only in **workspace context** (and transient `./.md` for push/pull when needed). ## Multi-step workflows (Model, History, Chain, Indexes) These are **patterns** built from the commands below—not separate CLI subcommands: -- **Model** — Markdown semantic map of your workspace (entities, keys, joins). **Store and read it via workspace context** (`hotdata context show DATAMODEL`, `context push DATAMODEL`); refresh content using `connections`, `connections refresh`, `tables list`, and `datasets list`. For a **deep** modeling pass (connector enrichment, indexes, per-table detail), see [references/MODEL_BUILD.md](references/MODEL_BUILD.md). +- **Model (`context:DATAMODEL`)** — The **shared** Markdown semantic map of the workspace (entities, keys, joins across connections). **Store and read it only via workspace context** (`hotdata context show DATAMODEL`, `context push DATAMODEL`); refresh using `connections`, `connections refresh`, `tables list`, and `datasets list`. For a **deep** pass (connector enrichment, indexes, per-table detail), see [references/MODEL_BUILD.md](references/MODEL_BUILD.md). Contrast **analysis modeling** in sandboxes or chat (see [Analysis modeling vs context:DATAMODEL](#analysis-modeling-vs-contextdatamodel)). - **History** — Inspect prior activity via `hotdata queries list` (query runs) and `hotdata results list` / `results ` (row data). - **Chain** — Follow-ups via **`datasets create`** then `query` against `datasets..`. - **Indexes** — Review SQL and schema, compare to existing indexes, create **sorted**, **bm25**, or **vector** indexes when it clearly helps; see [references/WORKFLOWS.md](references/WORKFLOWS.md#indexes). @@ -225,7 +237,7 @@ hotdata context push [--workspace-id ] [--dry-run] - `pull` — download context `name` to `./.md`. Refuses to overwrite an existing file unless `--force`. `--dry-run` prints target path and size only. - `push` — upload `./.md` to upsert context `name` on the server. `--dry-run` prints size only. Body size must stay within the API limit (order of 512k characters). -**Convention:** `DATAMODEL` is the primary workspace data model; `GLOSSARY` (or other stems) for additional narrative context. Same identifier rules as SQL table names. +**Convention:** **context:DATAMODEL** is the primary workspace semantic map; **context:GLOSSARY** (or other **`context:`** docs) for additional narrative context. Same identifier rules as SQL table names. CLI: `hotdata context show DATAMODEL` (bare stem). ### Execute SQL Query ``` @@ -345,13 +357,13 @@ hotdata sandbox run [args...] **Sandbox-scoped data access:** Queries and other operations against **sandbox-only** resources must run with sandbox context attached—either the **active sandbox** in config (`sandbox set`) or a child process started with **`hotdata sandbox run …`** (which sets `HOTDATA_SANDBOX`). Running `hotdata query` or similar **with no sandbox in config and not under `sandbox … run`** can produce **access denied** for tables or datasets that exist only inside a sandbox. -#### Example: Building a data model in a sandbox +#### Example: Building a sales pipeline -Use a sandbox to explore tables and iteratively build a model description in the sandbox markdown. +Use a sandbox to explore tables and capture **analysis-oriented** notes in sandbox markdown (keys, joins, open questions)—**day-to-day modeling** for this investigation, not **context:DATAMODEL** until you promote it. 1. Start a sandbox: ``` - hotdata sandbox new --name "Model: sales pipeline" + hotdata sandbox new --name "Sales pipeline" ``` 2. Inspect tables and columns: ``` @@ -381,14 +393,14 @@ Use a sandbox to explore tables and iteratively build a model description in the - check how line_items joins to deals - confirm revenue column semantics" ``` -5. Continue exploring and update the markdown as the model takes shape. The sandbox markdown is the living artifact for **that sandbox**. -6. When the model should **outlive the sandbox** or be shared with the whole workspace, promote it to workspace context: save the consolidated Markdown as `./DATAMODEL.md` in the project directory and run `hotdata context push DATAMODEL` (or merge with `hotdata context show DATAMODEL` first, then push). +5. Continue exploring and update the markdown as your **analysis picture** takes shape. Sandbox markdown is the living artifact for **that sandbox** only. +6. When that picture should become **context:DATAMODEL** (outlive the sandbox or be shared with everyone), promote it: save consolidated Markdown as `./DATAMODEL.md` and run `hotdata context push DATAMODEL` (or merge with `hotdata context show DATAMODEL` first, then push). Other commands (not covered in detail above): `hotdata connections new` (interactive connection wizard), `hotdata skills install|status`, `hotdata completions `. ## Workflow: Running a Query -0. (Recommended for agents) Load the workspace data model when available: run `hotdata context show DATAMODEL`. If the command errors because no context exists yet, proceed without a stored model. +0. (Recommended for agents) When the query depends on **workspace-wide** table relationships or naming conventions, load **context:DATAMODEL**: `hotdata context show DATAMODEL`. If the command errors because no document exists yet, proceed without it—ad-hoc analysis does not require populated **context:DATAMODEL**. 1. List connections: ``` hotdata connections list diff --git a/skills/hotdata/references/DATA_MODEL.template.md b/skills/hotdata/references/DATA_MODEL.template.md index 6ac396e..5fc039e 100644 --- a/skills/hotdata/references/DATA_MODEL.template.md +++ b/skills/hotdata/references/DATA_MODEL.template.md @@ -1,6 +1,6 @@ # Data model — `` -> **Storage:** This Markdown structure is kept in **workspace context** under the name **`DATAMODEL`**. Use `hotdata context show DATAMODEL` to read it; maintain `./DATAMODEL.md` in your **project directory** (where you run `hotdata`) only when editing, then `hotdata context push DATAMODEL`. Do not use `docs/DATA_MODEL.md` or other repo paths as the source of truth. +> **Storage:** This Markdown structure is **context:DATAMODEL**—the document stored in the workspace **context API** under stem `DATAMODEL`. Use `hotdata context show DATAMODEL` to read it; maintain `./DATAMODEL.md` in your **project directory** (where you run `hotdata`) only when editing, then `hotdata context push DATAMODEL`. Do not use `docs/DATA_MODEL.md` or other repo paths as the source of truth. (**`context:DATAMODEL`** in skills means that API document, not generic “data model” prose.) > Do not commit workspace-specific content into agent skill folders. > For a **full** build (per-table detail, connector enrichment, index summary), follow [MODEL_BUILD.md](MODEL_BUILD.md) from the installed skill’s `references/` (or this repo’s `skills/hotdata/references/`). Relative links to `MODEL_BUILD.md` below work only while this file lives next to those references; in your project, open that path separately if the link 404s. diff --git a/skills/hotdata/references/MODEL_BUILD.md b/skills/hotdata/references/MODEL_BUILD.md index b109b09..d0e3969 100644 --- a/skills/hotdata/references/MODEL_BUILD.md +++ b/skills/hotdata/references/MODEL_BUILD.md @@ -1,8 +1,10 @@ # Building a workspace data model (advanced) -Optional **deep pass** for a single authoritative markdown model stored in **workspace context**. For a short checklist only, use the **Model** section in [WORKFLOWS.md](WORKFLOWS.md) and [DATA_MODEL.template.md](DATA_MODEL.template.md). +Optional **deep pass** for a single authoritative markdown document stored as **`context:DATAMODEL`** (workspace **context API**). For a short checklist only, use the **Model** section in [WORKFLOWS.md](WORKFLOWS.md) and [DATA_MODEL.template.md](DATA_MODEL.template.md). -**Output:** The live document is **`DATAMODEL`** in the context API. Maintain it with `hotdata context show DATAMODEL`, edit `./DATAMODEL.md` in the **project directory** where you run `hotdata`, then **`hotdata context push DATAMODEL`**. Do not use `docs/`, `DATA_MODEL.md`, or other repo-only paths as the system of record. Never store workspace-specific model text inside agent skill folders. +**Notation:** **`context:DATAMODEL`** is the live server document; **not** the same phrase as “building a data model” for a one-off analysis. **CLI** uses the bare stem: `hotdata context show DATAMODEL`. + +**Output:** Maintain **context:DATAMODEL** with `hotdata context show DATAMODEL`, edit `./DATAMODEL.md` in the **project directory** where you run `hotdata`, then **`hotdata context push DATAMODEL`**. Do not use `docs/`, `DATA_MODEL.md`, or other repo-only paths as the system of record. Never store workspace-specific model text inside agent skill folders. --- @@ -53,7 +55,7 @@ Use **connector and tooling docs** when `source_type` (or table shapes) match: - **Vectors** — Columns typed as lists of floats (e.g. embedding columns) are candidates for vector search; note them. - **Well-known SaaS shapes** — Apply general patterns (e.g. Stripe charges/customers, HubSpot contacts/deals) only when naming and structure fit; **link** the doc you used so a human can verify. -Do **not** invent facts: if context is missing, say so and suggest a small sample query: +Do **not** invent facts: if **context:DATAMODEL** (or needed facts) is missing, say so and suggest a small sample query: ```bash hotdata query "SELECT * FROM ..
LIMIT 5" @@ -95,7 +97,7 @@ When suggesting a new index, use the same connection/schema/table/column names a ## 6. Document structure -This Markdown body is what you store under **`DATAMODEL`** (`hotdata context push DATAMODEL`). Start from [DATA_MODEL.template.md](DATA_MODEL.template.md) and extend as needed: +This Markdown body is what you store as **context:DATAMODEL** (`hotdata context push DATAMODEL`). Start from [DATA_MODEL.template.md](DATA_MODEL.template.md) and extend as needed: - **Overview** — Domains and what the workspace is for. - **Per connection** — Optional subsection per source; for **deep** models, **repeat** one block per `connection.schema.table` (grain, column table with name/type/nullable/PK-FK/notes, relationships, queryability, caveats)—the template’s single `####` heading is a pattern to copy for each table. diff --git a/skills/hotdata/references/WORKFLOWS.md b/skills/hotdata/references/WORKFLOWS.md index b651b04..4cca437 100644 --- a/skills/hotdata/references/WORKFLOWS.md +++ b/skills/hotdata/references/WORKFLOWS.md @@ -2,26 +2,28 @@ Procedures for **Model**, **History**, **Chain**, **Indexes**, and **sandboxes with datasets** (see **Sandboxes and datasets**). These compose existing `hotdata` commands; they are not separate subcommands. +**Notation:** **`context:`** (e.g. **`context:DATAMODEL`**, **`context:GLOSSARY`**) means the **workspace document** stored under that stem via the **context API**—not generic “data model” language and not local files except as `pull`/`push` transport. **CLI** still uses bare stems: `hotdata context show DATAMODEL`. + ## Where things live | Concept | Location | |--------|----------| -| **Model** | **Workspace context API** — stem **`DATAMODEL`** (`hotdata context show DATAMODEL`, `context push` / `pull` with `./DATAMODEL.md` in the project cwd only as the CLI file surface). Never store workspace-specific model text inside agent skill directories. | +| **Model** | **`context:DATAMODEL`** — workspace context API (`hotdata context show DATAMODEL`, `context push` / `pull` with `./DATAMODEL.md` in the project cwd only as the CLI file surface). Never store workspace-specific model text inside agent skill directories. | | **History** | `hotdata queries list` / `queries ` for query runs (execution history); `hotdata results list` / `results ` for row data. | -| **Chain** | Intermediate tables in **`datasets..
`** — usually **`datasets.main.*`** for workspace-wide materializations; **sandbox uploads** use **`datasets..*`** (see **Sandboxes and datasets** below). Document stable chains in **workspace context `DATAMODEL`** under **Derived tables (Chain)**. | -| **Indexes** | Recommendations and live objects in Hotdata (`indexes list` / `indexes create`). Record rationale in **`DATAMODEL`** (e.g. Search & index summary) or a dedicated context stem if you split concerns. | +| **Chain** | Intermediate tables in **`datasets..
`** — usually **`datasets.main.*`** for workspace-wide materializations; **sandbox uploads** use **`datasets..*`** (see **Sandboxes and datasets** below). Document stable chains in **context:DATAMODEL** under **Derived tables (Chain)**. | +| **Indexes** | Recommendations and live objects in Hotdata (`indexes list` / `indexes create`). Record rationale in **context:DATAMODEL** (e.g. Search & index summary) or a dedicated **context:** stem if you split concerns. | --- ## Model -**Goal:** A markdown map of entities, keys, grain, and how connections relate—on top of the live **catalog** from Hotdata. +**Goal:** A markdown map of entities, keys, grain, and how connections relate—stored as **context:DATAMODEL** on top of the live **catalog** from Hotdata. ### Initialize -1. Use [DATA_MODEL.template.md](DATA_MODEL.template.md) in this skill bundle as the **structure** for what you store in workspace context. -2. In the **project directory** where you run `hotdata`, create or refresh `./DATAMODEL.md` (from the template, from `hotdata context show DATAMODEL`, or from `hotdata context pull DATAMODEL`), fill workspace-specific sections as you discover schema, then **`hotdata context push DATAMODEL`** so the workspace owns the document. -3. Agents that skip local files: `hotdata context show DATAMODEL` to read; when updating, write `./DATAMODEL.md` then `hotdata context push DATAMODEL`. +1. Use [DATA_MODEL.template.md](DATA_MODEL.template.md) in this skill bundle as the **structure** for what you store as **context:DATAMODEL**. +2. In the **project directory** where you run `hotdata`, create or refresh `./DATAMODEL.md` (from the template, from `hotdata context show DATAMODEL`, or from `hotdata context pull DATAMODEL`), fill workspace-specific sections as you discover schema, then **`hotdata context push DATAMODEL`** so the server owns **context:DATAMODEL**. +3. Agents that skip local files: `hotdata context show DATAMODEL` to read **context:DATAMODEL**; when updating, write `./DATAMODEL.md` then `hotdata context push DATAMODEL`. ### Deep model pass (optional) @@ -44,7 +46,7 @@ hotdata datasets # schema detail per dataset `datasets list` returns **every** dataset in the workspace (no sandbox-only filter). Use the **`FULL NAME`** column (`datasets..
`): **`main`** in the middle segment is the usual workspace catalog; a value like **`s_…`** is the **sandbox id** for sandbox-scoped datasets. -Use output to update **Connections**, **Tables**, **Columns**, and **Datasets** in **workspace context `DATAMODEL`** (edit via `./DATAMODEL.md` + `hotdata context push DATAMODEL`, or your editor workflow). Optional: small exploratory queries once names are known: +Use output to update **Connections**, **Tables**, **Columns**, and **Datasets** in **context:DATAMODEL** (edit via `./DATAMODEL.md` + `hotdata context push DATAMODEL`, or your editor workflow). Optional: small exploratory queries once names are known: ```bash hotdata query "SELECT * FROM ..
LIMIT 5" @@ -132,7 +134,7 @@ Query footers include a `result-id` when applicable—record it for later, or pi For **sandbox-scoped** chain tables, ensure an **active sandbox** (`sandbox set`) or run the query inside **`hotdata sandbox run hotdata query "…"`**. Quote mixed-case columns: e.g. `"Revenue"`. -**Naming:** Prefer predictable `--table-name` values, e.g. `chain__`, and list long-lived chains in **DATAMODEL → Derived tables (Chain)** in workspace context (record the **full** `datasets..
` you use in SQL). +**Naming:** Prefer predictable `--table-name` values, e.g. `chain__`, and list long-lived chains in **context:DATAMODEL → Derived tables (Chain)** (record the **full** `datasets..
` you use in SQL). --- @@ -189,7 +191,7 @@ Large builds: add `--async` and track with **`hotdata jobs list`** / **`hotdata ### 4. Verify -Re-run representative **`hotdata query`** or **`hotdata search`** workloads. Update **DATAMODEL → Search & index summary** in workspace context (`hotdata context push DATAMODEL` after editing `./DATAMODEL.md`) so future agents see what exists. +Re-run representative **`hotdata query`** or **`hotdata search`** workloads. Update **context:DATAMODEL → Search & index summary** (`hotdata context push DATAMODEL` after editing `./DATAMODEL.md`) so future agents see what exists. ### Guardrails