diff --git a/knowledge_base/metric_view/README.md b/knowledge_base/metric_view/README.md new file mode 100644 index 0000000..1466eee --- /dev/null +++ b/knowledge_base/metric_view/README.md @@ -0,0 +1,90 @@ +# Unity Catalog Metric View + +This project demonstrates how to create a [Unity Catalog Metric View](https://docs.databricks.com/metric-views/) using Declarative Automation Bundles. Once registered, the metric view becomes available to analysts and BI tools across your workspace, queryable via the `MEASURE()` SQL function. + +## `bookings_kpis` metric view + +This project defines `bookings_kpis`, a metric view over the public sample dataset `samples.wanderbricks.bookings`. + +A SQL task in the job runs `CREATE OR REPLACE VIEW … WITH METRICS LANGUAGE YAML` from [`src/bookings_kpis.metric_view.sql`](src/bookings_kpis.metric_view.sql): + +```sql +CREATE OR REPLACE VIEW bookings_kpis +WITH METRICS +LANGUAGE YAML +AS $$ +version: 1.0 +source: samples.wanderbricks.bookings +filter: status = 'confirmed' +dimensions: + - name: check_in_month + expr: date_trunc('MONTH', check_in) +measures: + - name: total_bookings + expr: COUNT(1) + - name: total_revenue + expr: SUM(total_amount) +$$; +``` + +`{{catalog}}` and `{{schema}}` in the SQL file are substituted from job parameters at run time. + +Once registered, query the metric view from any SQL editor: + +```sql +SELECT + check_in_month, + MEASURE(total_bookings) AS bookings, + MEASURE(total_revenue) AS revenue +FROM ..bookings_kpis +GROUP BY check_in_month +ORDER BY check_in_month; +``` + +## Get started + +### Prerequisites + +* Databricks workspace with Unity Catalog enabled +* A SQL warehouse on a runtime that supports Unity Catalog metric views +* Databricks CLI installed and configured + +### Setup + +1. In `databricks.yml`, replace `` (in both `targets.dev` and `targets.prod`) with one of your warehouse IDs (`databricks warehouses list`). +2. If you don't have write access to `main`, change `catalog:` under `variables` to a catalog you can write to. + +### Deployment + +Deploy to dev: + +```bash +databricks bundle deploy --target dev +``` + +```bash +databricks bundle run bookings_kpis_metric_view --target dev +``` + +Deploy to production: + +```bash +databricks bundle deploy --target prod +``` + +```bash +databricks bundle run bookings_kpis_metric_view --target prod +``` + +The metric view will be created at `..bookings_kpis` (dev) or `.prod.bookings_kpis` (prod). + +### Notes + +- The job has a daily `periodic` trigger so the view definition is re-applied in production. [Development-mode](https://docs.databricks.com/dev-tools/bundles/deployment-modes.html) pauses the trigger automatically, so it only fires after `bundle deploy --target prod`. +- Set `source:` in the YAML body to any UC table you read from. The sample `samples.wanderbricks.bookings` is convenient for getting started. For production, use a table from your own pipeline. + +## Learn more + +- [Unity Catalog Metric Views](https://docs.databricks.com/metric-views/) — Official documentation +- [Metric View YAML Reference](https://docs.databricks.com/metric-views/yaml-ref) +- [Declarative Automation Bundles](https://docs.databricks.com/dev-tools/bundles/index.html) diff --git a/knowledge_base/metric_view/databricks.yml b/knowledge_base/metric_view/databricks.yml new file mode 100644 index 0000000..abe5582 --- /dev/null +++ b/knowledge_base/metric_view/databricks.yml @@ -0,0 +1,38 @@ +# This example bundle defines a job that creates a Unity Catalog Metric View. +# +# Metric Views are a Unity Catalog feature that let you declare reusable +# dimensions and measures over a base table, queryable via the MEASURE() SQL +# function. +# +# Docs: +# https://docs.databricks.com/metric-views/ +# https://docs.databricks.com/metric-views/yaml-ref + +bundle: + name: metric_view + +include: + - resources/*.yml + +variables: + catalog: + description: The Unity Catalog where the metric view will be created. + schema: + description: The schema where the metric view will be created. + warehouse_id: + description: SQL warehouse ID used to run the CREATE VIEW statement. To list available warehouses, use `databricks warehouses list`. + +targets: + dev: + mode: development + default: true + variables: + catalog: main + schema: ${workspace.current_user.short_name} + warehouse_id: + prod: + mode: production + variables: + catalog: main + schema: prod + warehouse_id: diff --git a/knowledge_base/metric_view/resources/bookings_kpis.job.yml b/knowledge_base/metric_view/resources/bookings_kpis.job.yml new file mode 100644 index 0000000..a3a9fa5 --- /dev/null +++ b/knowledge_base/metric_view/resources/bookings_kpis.job.yml @@ -0,0 +1,25 @@ +resources: + jobs: + bookings_kpis_metric_view: + name: bookings_kpis_metric_view + description: Creates/refreshes the `bookings_kpis` Unity Catalog metric view. + + trigger: + # Re-apply the view definition daily. Dev deployment mode pauses this trigger; + # in prod the job runs once per day. See https://docs.databricks.com/api/workspace/jobs/create#trigger + periodic: + interval: 1 + unit: DAYS + + parameters: + - name: catalog + default: ${var.catalog} + - name: schema + default: ${var.schema} + + tasks: + - task_key: create_metric_view + sql_task: + warehouse_id: ${var.warehouse_id} + file: + path: ../src/bookings_kpis.metric_view.sql diff --git a/knowledge_base/metric_view/src/bookings_kpis.metric_view.sql b/knowledge_base/metric_view/src/bookings_kpis.metric_view.sql new file mode 100644 index 0000000..7e90400 --- /dev/null +++ b/knowledge_base/metric_view/src/bookings_kpis.metric_view.sql @@ -0,0 +1,45 @@ +-- Create (or replace) a Unity Catalog Metric View over samples.wanderbricks.bookings. +-- See https://docs.databricks.com/metric-views/yaml-ref for the YAML syntax. +-- +-- Once deployed and run, query the metric view from any SQL editor with: +-- SELECT MEASURE(total_bookings), MEASURE(total_revenue) +-- FROM ..bookings_kpis +-- WHERE check_in_month >= '2024-01-01' +-- GROUP BY check_in_month; + +CREATE SCHEMA IF NOT EXISTS IDENTIFIER({{catalog}} || '.' || {{schema}}); +USE CATALOG IDENTIFIER({{catalog}}); +USE IDENTIFIER({{schema}}); + +CREATE OR REPLACE VIEW bookings_kpis +WITH METRICS +LANGUAGE YAML +AS $$ +version: 1.0 +comment: Booking KPIs (count, revenue, AOV, guests) over samples.wanderbricks.bookings. +source: samples.wanderbricks.bookings + +filter: status = 'confirmed' + +dimensions: + - name: check_in_date + expr: check_in + - name: check_in_month + expr: date_trunc('MONTH', check_in) + - name: guests_count + expr: guests_count + +measures: + - name: total_bookings + expr: COUNT(1) + comment: Number of confirmed bookings. + - name: total_revenue + expr: SUM(total_amount) + comment: Total revenue across confirmed bookings. + - name: avg_booking_value + expr: AVG(total_amount) + comment: Average revenue per confirmed booking. + - name: total_guests + expr: SUM(guests_count) + comment: Total guests across confirmed bookings. +$$; diff --git a/knowledge_base/metric_view_dbt/README.md b/knowledge_base/metric_view_dbt/README.md new file mode 100644 index 0000000..608d63e --- /dev/null +++ b/knowledge_base/metric_view_dbt/README.md @@ -0,0 +1,91 @@ +# Unity Catalog Metric View using dbt + +This project demonstrates how to materialize a [Unity Catalog Metric View](https://docs.databricks.com/metric-views/) via dbt-databricks using Declarative Automation Bundles. The metric view is defined as a dbt model with the [`metric_view` materialization](https://github.com/databricks/dbt-databricks/pull/1285) added in dbt-databricks 1.12.0. + +For the SQL-job variant, see [`../metric_view`](../metric_view). + +## `bookings_kpis` metric view + +This project defines `bookings_kpis`, a metric view over the public sample dataset `samples.wanderbricks.bookings`. + +The model lives in [`src/models/bookings_kpis.sql`](src/models/bookings_kpis.sql): + +```sql +{{ config(materialized='metric_view') }} + +version: 1.0 +source: samples.wanderbricks.bookings +filter: status = 'confirmed' +dimensions: + - name: check_in_month + expr: date_trunc('MONTH', check_in) +measures: + - name: total_bookings + expr: COUNT(1) + - name: total_revenue + expr: SUM(total_amount) +``` + +The `{{ config(materialized='metric_view') }}` line is the only jinja; everything below it is the metric view YAML body. dbt-databricks issues the equivalent `CREATE OR REPLACE VIEW … WITH METRICS LANGUAGE YAML` against the warehouse at `dbt run` time. + +Once materialized, query the metric view from any SQL editor: + +```sql +SELECT + check_in_month, + MEASURE(total_bookings) AS bookings, + MEASURE(total_revenue) AS revenue +FROM ..bookings_kpis +GROUP BY check_in_month +ORDER BY check_in_month; +``` + +## Get started + +### Prerequisites + +* Databricks workspace with Unity Catalog enabled +* A SQL warehouse on a runtime that supports Unity Catalog metric views +* Databricks CLI installed and configured + +### Setup + +1. In `dbt_profiles/profiles.yml`, set `http_path` (in both `dev` and `prod`) to one of your warehouses (`databricks warehouses list`) and update `catalog` to one you can write to (the default `main` is often not writable). + +### Deployment + +Deploy to dev: + +```bash +databricks bundle deploy --target dev +``` + +```bash +databricks bundle run metric_view_dbt_job --target dev +``` + +Deploy to production: + +```bash +databricks bundle deploy --target prod +``` + +```bash +databricks bundle run metric_view_dbt_job --target prod +``` + +The metric view will be created at `..bookings_kpis` (dev) or `.prod.bookings_kpis` (prod). + +### Notes + +- The job has a daily `periodic` trigger so `dbt run` re-applies the view definition in production. [Development-mode](https://docs.databricks.com/dev-tools/bundles/deployment-modes.html) pauses the trigger automatically, so it only fires after `bundle deploy --target prod`. +- Requires `dbt-databricks >= 1.12.0` (the version that introduced the `metric_view` materialization). The job pins this in its task environment. +- The model file is `.sql` even though its body is YAML — dbt model files must use `.sql`. dbt-databricks wraps the body in `CREATE OR REPLACE VIEW … LANGUAGE YAML AS $$ … $$` at run time. +- Set `source:` in the YAML body to any UC table you read from. For production, replace `samples.wanderbricks.bookings` with a table from your own pipeline. + +## Learn more + +- [Unity Catalog Metric Views](https://docs.databricks.com/metric-views/) — Official documentation +- [Metric View YAML Reference](https://docs.databricks.com/metric-views/yaml-ref) +- [`metric_view` materialization in dbt-databricks](https://github.com/databricks/dbt-databricks/pull/1285) +- [Declarative Automation Bundles](https://docs.databricks.com/dev-tools/bundles/index.html) diff --git a/knowledge_base/metric_view_dbt/databricks.yml b/knowledge_base/metric_view_dbt/databricks.yml new file mode 100644 index 0000000..bce62b0 --- /dev/null +++ b/knowledge_base/metric_view_dbt/databricks.yml @@ -0,0 +1,27 @@ +# This example bundle materializes a Unity Catalog Metric View using dbt-databricks. +# +# Metric Views are a Unity Catalog feature that let you declare reusable +# dimensions and measures over a base table, queryable via the MEASURE() SQL +# function. +# +# See also the SQL-job variant at ../metric_view. +# +# Docs: +# https://docs.databricks.com/metric-views/ +# https://docs.databricks.com/metric-views/yaml-ref +# https://docs.getdbt.com/reference/resource-configs/databricks-configs + +bundle: + name: metric_view_dbt + +include: + - resources/*.yml + +# Deployment targets. +# The schema and catalog for dbt are configured in dbt_profiles/profiles.yml. +targets: + dev: + mode: development + default: true + prod: + mode: production diff --git a/knowledge_base/metric_view_dbt/dbt_profiles/profiles.yml b/knowledge_base/metric_view_dbt/dbt_profiles/profiles.yml new file mode 100644 index 0000000..81059dc --- /dev/null +++ b/knowledge_base/metric_view_dbt/dbt_profiles/profiles.yml @@ -0,0 +1,32 @@ +# dbt profiles used by the deployed dbt job in resources/metric_view_dbt.job.yml. +# +# For local development with the dbt CLI you should create your own profile in +# ~/.dbt/profiles.yml using `dbt init`; this file is only read by the job. +metric_view_dbt: + target: dev # default target + outputs: + + # The 'dev' target uses the workspace's current_user.short_name as the + # schema (passed in as the `dev_schema` var from the job; see the job YAML). + dev: + type: databricks + method: http + catalog: main + schema: "{{ var('dev_schema') }}" + + http_path: /sql/1.0/warehouses/abcdef1234567890 + + # DBT_HOST / DBT_ACCESS_TOKEN are injected by Databricks at run time. + host: "{{ env_var('DBT_HOST') }}" + token: "{{ env_var('DBT_ACCESS_TOKEN') }}" + + prod: + type: databricks + method: http + catalog: main + schema: prod + + http_path: /sql/1.0/warehouses/abcdef1234567890 + + host: "{{ env_var('DBT_HOST') }}" + token: "{{ env_var('DBT_ACCESS_TOKEN') }}" diff --git a/knowledge_base/metric_view_dbt/dbt_project.yml b/knowledge_base/metric_view_dbt/dbt_project.yml new file mode 100644 index 0000000..2f408bf --- /dev/null +++ b/knowledge_base/metric_view_dbt/dbt_project.yml @@ -0,0 +1,25 @@ +name: 'metric_view_dbt' +version: '1.0.0' +config-version: 2 + +# Which 'profile' (in dbt_profiles/profiles.yml) dbt uses for this project. +profile: 'metric_view_dbt' + +# Everything dbt needs lives under src/ so that non-dbt bundle resources +# (such as the job in resources/) can sit alongside without confusing dbt. +model-paths: ["src/models"] +analysis-paths: ["src/analyses"] +test-paths: ["src/tests"] +seed-paths: ["src/seeds"] +macro-paths: ["src/macros"] +snapshot-paths: ["src/snapshots"] + +clean-targets: + - "target" + - "dbt_packages" + +# Default materialization is 'view'; the single metric_view model overrides +# this with `{{ config(materialized='metric_view') }}`. +models: + metric_view_dbt: + +materialized: view diff --git a/knowledge_base/metric_view_dbt/resources/metric_view_dbt.job.yml b/knowledge_base/metric_view_dbt/resources/metric_view_dbt.job.yml new file mode 100644 index 0000000..970bd1b --- /dev/null +++ b/knowledge_base/metric_view_dbt/resources/metric_view_dbt.job.yml @@ -0,0 +1,34 @@ +resources: + jobs: + metric_view_dbt_job: + name: metric_view_dbt_job + description: Materializes the `bookings_kpis` Unity Catalog metric view via dbt-databricks. + + trigger: + # Re-apply the view definition daily. Dev-mode deploys pause this trigger; + # in prod the job runs once per day. See https://docs.databricks.com/api/workspace/jobs/create#trigger + periodic: + interval: 1 + unit: DAYS + + tasks: + - task_key: dbt + environment_key: default + dbt_task: + project_directory: ../ + # The default schema/catalog are defined in ../dbt_profiles/profiles.yml. + profiles_directory: dbt_profiles/ + # `dev_schema` is consumed by the 'dev' target in profiles.yml so + # each developer gets their own schema (matches DABs dev-mode prefixing). + commands: + - 'dbt deps --target=${bundle.target}' + - 'dbt run --target=${bundle.target} --vars "{ dev_schema: ${workspace.current_user.short_name} }"' + + environments: + - environment_key: default + spec: + environment_version: "4" + dependencies: + # The metric_view materialization landed in dbt-databricks 1.12.0 + # (https://github.com/databricks/dbt-databricks/pull/1285). + - dbt-databricks>=1.12.0,<2.0.0 diff --git a/knowledge_base/metric_view_dbt/src/models/bookings_kpis.sql b/knowledge_base/metric_view_dbt/src/models/bookings_kpis.sql new file mode 100644 index 0000000..f9cf46d --- /dev/null +++ b/knowledge_base/metric_view_dbt/src/models/bookings_kpis.sql @@ -0,0 +1,54 @@ +{# + A Unity Catalog Metric View, materialized by dbt-databricks. + + Everything below the `config(...)` line is the metric-view YAML body (see + https://docs.databricks.com/metric-views/yaml-ref). The metric_view + materialization wraps it in: + + CREATE OR REPLACE VIEW WITH METRICS LANGUAGE YAML AS + + so the file looks like SQL to dbt but its contents are YAML. (These jinja + comments are stripped at compile time; SQL `--` comments would be passed + through verbatim into the YAML body and break the view definition.) + + Query the resulting metric view from any SQL editor: + + SELECT + check_in_month, + MEASURE(total_bookings) AS bookings, + MEASURE(total_revenue) AS revenue, + MEASURE(avg_booking_value) AS aov + FROM ..bookings_kpis + WHERE check_in_date >= '2024-01-01' + GROUP BY check_in_month + ORDER BY check_in_month; +#} +{{ config(materialized='metric_view') }} + +version: 1.0 +comment: Booking KPIs (count, revenue, AOV, guests) over samples.wanderbricks.bookings. +source: samples.wanderbricks.bookings + +filter: status = 'confirmed' + +dimensions: + - name: check_in_date + expr: check_in + - name: check_in_month + expr: date_trunc('MONTH', check_in) + - name: guests_count + expr: guests_count + +measures: + - name: total_bookings + expr: COUNT(1) + comment: Number of confirmed bookings. + - name: total_revenue + expr: SUM(total_amount) + comment: Total revenue across confirmed bookings. + - name: avg_booking_value + expr: AVG(total_amount) + comment: Average revenue per confirmed booking. + - name: total_guests + expr: SUM(guests_count) + comment: Total guests across confirmed bookings. diff --git a/knowledge_base/metric_view_dbt/src/models/schema.yml b/knowledge_base/metric_view_dbt/src/models/schema.yml new file mode 100644 index 0000000..571b5ec --- /dev/null +++ b/knowledge_base/metric_view_dbt/src/models/schema.yml @@ -0,0 +1,8 @@ +version: 2 + +models: + - name: bookings_kpis + description: | + Unity Catalog Metric View over the public sample `samples.wanderbricks.bookings`. + Exposes booking-count, revenue, and average-value measures sliceable by date, + status, and guest count. Query via `MEASURE() ... GROUP BY `.