Hotdata CLI
Command line interface for Hotdata.
Homebrew
brew install hotdata-dev/tap/cliBinary (macOS, Linux)
Download a binary from Releases.
Build from source (requires Rust)
cargo build --release
cp target/release/hotdata /usr/local/bin/hotdataRun either of the following (they are equivalent):
hotdata auth login
# or
hotdata authThis launches a browser window where you can authorize the CLI to access your Hotdata account.
Alternatively, authenticate with an API key using the --api-key flag:
hotdata <command> --api-key <api_key>Or set the HOTDATA_API_KEY environment variable (also loaded from .env files):
export HOTDATA_API_KEY=<api_key>
hotdata <command>API key priority (lowest to highest): config file → HOTDATA_API_KEY env var → --api-key flag.
| Command | Subcommands | Description |
|---|---|---|
auth |
login, status, logout |
login or bare auth opens browser login; status / logout manage the saved profile |
workspaces |
list, set |
Manage workspaces |
connections |
list, create, refresh, new |
Manage connections |
databases |
list, create, delete, tables |
Managed databases (create and load tables via parquet) |
tables |
list |
List tables and columns |
context |
list, show, pull, push |
Workspace Markdown context (e.g. data model DATAMODEL) via the context API |
query |
Execute a SQL query | |
queries |
list |
Inspect query run history |
search |
Full-text search across a table column | |
indexes |
list, create, delete |
Manage indexes on a table |
embedding-providers |
list, get, create, update, delete |
Manage embedding providers used by vector indexes |
results |
list |
Retrieve stored query results |
jobs |
list |
Manage background jobs |
skills |
install, status |
Manage the hotdata agent skill |
| Option | Description | Type | Default |
|---|---|---|---|
--api-key |
API key (overrides env var and config) | string | |
-v, --version |
Print version | boolean | |
-h, --help |
Print help | boolean |
hotdata workspaces list [--format table|json|yaml]
hotdata workspaces set [<workspace_id>]listshows all workspaces with a*marker on the active one.setswitches the active workspace. Omit the ID for interactive selection.- The active workspace is used as the default for all commands that accept
--workspace-id.
hotdata connections list [-w <id>] [-o table|json|yaml]
hotdata connections <connection_id> [-w <id>] [-o table|json|yaml]
hotdata connections refresh <connection_id> [-w <id>] [--data] [--schema <name> --table <name>] [--async] [--include-uncached]
hotdata connections new [-w <id>]listreturnsid,name,source_typefor each connection.- Pass a connection ID to view details (id, name, source type, table counts).
refreshtriggers a schema refresh by default. Pass--datato refresh cached row data instead.--schemaand--tablenarrow a data refresh to a single table (must be supplied together).--asyncsubmits a data refresh as a background job and returns a job ID; poll withhotdata jobs <job_id>. Only valid with--data— schema refresh is always synchronous.--include-uncachedincludes tables that haven't been cached yet in a connection-wide data refresh. Only valid with--dataand no--table.newlaunches an interactive connection creation wizard.
# List available connection types
hotdata connections create list [--format table|json|yaml]
# Inspect schema for a connection type
hotdata connections create list <type_name> --format json
# Create a connection
hotdata connections create --name "my-conn" --type postgres --config '{"host":"...","port":5432,...}'Managed databases are Hotdata-owned catalogs you create and populate yourself (no remote source to sync). Query them with SQL as <catalog>.schema.table.
hotdata databases list [-w <id>] [-o table|json|yaml]
hotdata databases create [--name <display_name>] [--catalog <alias>] [--table <table> ...] [--schema public] [--expires-at <duration|timestamp>] [-o table|json|yaml]
hotdata databases set <id>
hotdata databases unset
hotdata databases <name_or_id> [-o table|json|yaml]
hotdata databases delete <name_or_id>
hotdata databases run [--database <id>] [--name <label>] [--schema public] [--table <table> ...] [--expires-at <duration|timestamp>] <cmd> [args...]
hotdata databases <id> run <cmd> [args...]
# Preferred: load by catalog alias (auto-declares table if needed)
hotdata databases load --catalog <alias> --table <table> [--schema public] (--file <path> | --url <url> | --upload-id <id>)
# Also available: explicit database flag
hotdata databases tables list [--database <id_or_name>] [--schema <name>] [-o table|json|yaml]
hotdata databases tables load <table> [--database <id_or_name>] [--schema public] (--file <path> | --url <url> | --upload-id <id>)
hotdata databases tables delete <table> [--database <id_or_name>] [--schema public]createregisters a managed connection with no external credentials.--nameis a human-readable display name;--catalogsets the SQL alias used in queries (SELECT … FROM <catalog>.schema.table) and must be[a-z_][a-z0-9_]*.set/unset— save or clear the active database. Alldatabases tablesandcontextcommands default to it. The active database is marked with*indatabases list.load(top-level shorthand) — loads a parquet file into--catalog.--schema.--table. If the table was not declared at create time, the CLI automatically deletes and recreates the database with the table declared, then retries the load.tables loaduploads a parquet file (or uses a stagedupload_idfromPOST /v1/files) and publishes it as the table generation (replacemode).runmints a database-scoped JWT and execs<cmd>withHOTDATA_DATABASE_TOKEN,HOTDATA_DATABASE_REFRESH_TOKEN,HOTDATA_DATABASE,HOTDATA_WORKSPACE, andHOTDATA_API_URLinjected into its environment.- Managed table loads accept parquet only — convert CSV/JSON to parquet first.
Example:
hotdata databases create --catalog airbnb
hotdata databases load --catalog airbnb --table listings --url https://example.com/listings.parquet
hotdata query "SELECT count(*) FROM airbnb.public.listings"hotdata tables list [--workspace-id <id>] [--connection-id <id>] [--schema <pattern>] [--table <pattern>] [--limit <n>] [--cursor <token>] [--format table|json|yaml]- Without
--connection-id: lists all tables withtable,synced,last_sync. - With
--connection-id: includes column details (column,data_type,nullable). --schemaand--tablesupport SQL%wildcard patterns.- Tables are displayed as
<connection>.<schema>.<table>— use this format in SQL queries.
Named Markdown documents for a workspace (data model, glossary, etc.) are stored in the context API. The CLI treats the server as the source of truth; local files are only used where the tool requires a path on disk.
hotdata context list [-w <id>] [--prefix <stem>] [-o table|json|yaml]
hotdata context show <name> [-w <id>]
hotdata context pull <name> [-w <id>] [--force] [--dry-run]
hotdata context push <name> [-w <id>] [--dry-run]showprints Markdown to stdout (no local file needed). Use this to read the workspace data model in scripts or agents.pullwrites./<name>.mdin the current directory from the API. Refuses to overwrite an existing file unless--force.pushreads./<name>.mdand upserts that name in the workspace. Use after editing the file in your project directory.- Names follow SQL identifier rules (ASCII letters, digits, underscore; max 128 characters; SQL reserved words are not allowed). The usual stem for the semantic data model is
DATAMODEL(fileDATAMODEL.mdfor push/pull only).
hotdata query "<sql>" [-w <id>] [-d <database>] [-o table|json|csv]
hotdata query status <query_run_id> [-o table|json|csv]- Default output is
table, which prints results with row count and execution time. - Use
-d/--databaseto run the query against a specific managed database. - Long-running queries automatically fall back to async execution and return a
query_run_id. - Use
hotdata query status <query_run_id>to poll for results. - Exit codes for
query status:0= succeeded,1= failed,2= still running (poll again). jsonoutput carriestruncated,preview_row_count, andtotal_row_countso a consumer can detect a partial result from the body alone.- If the server returns only a bounded preview that can't be completed (truncated and unfetchable), the CLI prints the preview, warns on stderr, and exits
3— so pipelines break rather than silently ingest partial data.
hotdata queries list [--limit <n>] [--cursor <token>] [--status <csv>] [-o table|json|yaml]
hotdata queries <query_run_id> [-o table|json|yaml]listshows past query executions with status, creation time, duration, row count, and a truncated SQL preview (default limit 20).--statusfilters by run status (comma-separated, e.g.--status running,failed).- View a run by ID to see full metadata (timings,
result_id, snapshot, hashes) and the formatted, syntax-highlighted SQL. - If a run has a
result_id, fetch its rows withhotdata results <result_id>.
Both run entirely server-side. --type and --column are optional when the table has exactly one search index — they are inferred automatically. Pass them explicitly when multiple indexes exist.
# BM25 full-text search (requires a BM25 index on the column)
hotdata search "<query>" --table <connection.schema.table> [--type bm25] [--column <column>] [--select <columns>] [--limit <n>] [-o table|json|csv]
# Vector search (requires a vector index with auto-embedding on the column)
hotdata search "<query>" --table <table> [--type vector] [--column <source_text_column>] [--limit <n>]--type vector— pass your query as plain text, name the source text column (e.g.title). The server embeds the query at the same time, using the same provider that auto-embedded the column when the index was built — so distance metric, model, and dimensions all match automatically. NoOPENAI_API_KEY, no client-side embedding, no need to know about the auto-generated_embeddingcolumn. Generated SQL:vector_distance(col, 'query')server-side.--type bm25runsbm25_search(table, col, 'query')— requires a BM25 index on the column.- No vector index, or want to use a different model than the index? Skip
hotdata searchand use raw SQL viahotdata query(e.g.SELECT *, cosine_distance(col, [<your_vec>]) FROM ...). The SQL reference covers the available distance functions and table UDFs. - BM25 results sort by score (descending). Vector results sort by distance (ascending).
--selectspecifies which columns to return (comma-separated, defaults to all).- The previous
--modelflag and stdin-piped-vector path are removed — both hardcodedl2_distanceregardless of the index's actual metric, which silently produced wrong rankings on cosine indexes. For client-side embedding or precomputed-vector workflows, use raw SQL viahotdata query(e.g.SELECT *, cosine_distance(col, [<vec>]) ...).
create attaches an index to a table via its --catalog alias (a managed-database catalog or a connection name). list and delete accept --connection-id (+ --schema + --table) for connection-scoped operations.
# Create — by catalog alias (resolves a managed-database catalog or a connection name)
hotdata indexes create --catalog <alias> --schema <schema> --table <table> \
--column <cols> --type bm25|vector|sorted \
[--name <name>] [--metric l2|cosine|dot] [--async] \
[--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
# List — workspace scan, optionally filtered by connection / schema / table
hotdata indexes list [--connection-id <id>] [--schema <schema>] [--table <table>] [-o table|json|yaml]
# Delete — connection scope (--connection-id + --schema + --table)
hotdata indexes delete --connection-id <id> --schema <schema> --table <table> --name <name>--typeis required — choosesorted(B-tree-like),bm25(full-text), orvector(similarity).--type vectorrequires exactly one column.--asyncsubmits index creation as a background job and returns a job ID; poll withhotdata jobs <job_id>.- Auto-embedding (text → vector): when
--type vectoris used on a text column, embeddings are generated automatically. The embedding provider can be specified with--embedding-provider-id; if omitted, the first system provider is used. The generated column defaults to{column}_embeddingand can be overridden with--output-column.
hotdata embedding-providers list [-o table|json|yaml]
hotdata embedding-providers get <id> [-o table|json|yaml]
hotdata embedding-providers create --name <name> --provider-type service|local \
[--config '<json>'] [--provider-api-key <key> | --secret-name <name>]
hotdata embedding-providers update <id> [--name <name>] [--config '<json>'] \
[--provider-api-key <key> | --secret-name <name>]
hotdata embedding-providers delete <id>list/getshow registered providers (system providers likesys_emb_openaicome pre-configured).--provider-api-keyauto-creates a managed secret for the provider;--secret-namereferences an existing secret. They are mutually exclusive.--provider-api-keypairs with--provider-typeand avoids colliding with the global--api-key(Hotdata auth).
hotdata results <result_id> [--workspace-id <id>] [--format table|json|csv]
hotdata results list [--workspace-id <id>] [--limit <n>] [--offset <n>] [--format table|json|yaml]- Query results include a
result-idin the table footer — use it to retrieve past results without re-running queries.
hotdata jobs list [--workspace-id <id>] [--job-type <type>] [--status <status>] [--all] [--limit <n>] [--offset <n>] [--format table|json|yaml]
hotdata jobs <job_id> [--workspace-id <id>] [--format table|json|yaml]listshows only active jobs (pendingandrunning) by default. Use--allto see all jobs.--job-typeaccepts:data_refresh_table,data_refresh_connection,create_index.--statusaccepts:pending,running,succeeded,partially_succeeded,failed.
Config is stored at ~/.hotdata/config.yml keyed by profile (default: default).
| Variable | Description | Default |
|---|---|---|
HOTDATA_API_KEY |
API key (overrides config file) | |
HOTDATA_API_URL |
API base URL | https://api.hotdata.dev/v1 |
HOTDATA_APP_URL |
App URL for browser login | https://app.hotdata.dev |
Releases use a two-phase workflow wrapping cargo-release.
Phase 1 — prepare
scripts/release.sh prepare <version>Creates a release/<version> branch, bumps the version, updates CHANGELOG.md, pushes the branch, and opens a pull request.
Phase 2 — finish
scripts/release.sh finishSwitches to main, pulls latest, tags the release, and triggers the dist workflow.