docs: refactor service registration brief to HTTP traffic metering#181
docs: refactor service registration brief to HTTP traffic metering#181JoseSzycho wants to merge 8 commits into
Conversation
…e diagrams and Taskfile automation This changes replaces the service-catalog-registration with documentation more focused in the architecture design and implementation
scotwells
left a comment
There was a problem hiding this comment.
Directionally the architecture looks good. Let's just clean up some of the implementation details. We want to keep this document product / consumer focused.
|
|
||
| ### Goals | ||
|
|
||
| - Define a standard `Service` and `ServiceConfiguration` to register Network Services under the service domain `networking.datumapis.com`. |
There was a problem hiding this comment.
This is more of an implementation detail. I'd keep goals high level / consumer focused.
| This enhancement defines the architecture, data structures, and roadmap to bring HTTP traffic metering and catalog registration to Network Services. The work is split into two phases: | ||
| - **Phase 1 (Catalog & Metadata):** Declare a `Service` and a companion `ServiceConfiguration` resource (`services.miloapis.com/v1alpha1`) carrying the monitored-resource and meter declarations inline. This is a YAML-only delivery packaged in the `config/services/` bundle. | ||
| - **Phase 2 (Emission & Integration):** Configure Envoy Gateway proxy logging and deploy a custom Vector Agent to scrape access logs, parse billing signals into CloudEvents, and forward them to the local `billing-usage-collector-vector` DaemonSet. |
There was a problem hiding this comment.
I don't think it's important to highlight the phases in the enhancement. How the work breaks down and is sequenced is better for a project management tool (e.g. GitHub issues), not an enhancement document.
|
|
||
| Network Services is a core utility that incurs direct infrastructure costs. Capturing consumption signals is necessary for platform billing and cost-attribution. | ||
|
|
||
| Because `MeterDefinition` fields (such as `meterName` and `measurement.unit`) are immutable once published, establishing correct definitions in the `Draft`/`Provisional` phase is critical. Doing so avoids costly SDK upgrades, meter deprecation cycles, and data migrations. |
There was a problem hiding this comment.
This feels like internal / implementation detail. The motivation should be more product / consumer focused.
| - Altering the `MeterDefinition` schema or billing pipeline contract. | ||
| - Implementation of the core Billing SDK (owned by the Billing Team). | ||
| - Shared-infrastructure cost attribution or cross-project billing logic. | ||
| - **Deploying or modifying the Billing System or `billing-usage-collector-vector` DaemonSet.** These components are pre-existing, shared platform infrastructure. Our work is limited to deploying a custom Vector Agent (Log Parser) to parse logs and forward them to this existing collector. |
There was a problem hiding this comment.
I don't think this should be a non-goal, it's more of an implementation detail.
|
|
||
| - The **Monitored Resource** is the Kubernetes Gateway API `HTTPRoute` resource, representing the customer-facing HTTP endpoint. | ||
| - **Phase 1** registers the service with the service catalog via declarative YAML configurations. The service catalog fan-out controller automatically creates `MonitoredResourceType` and `MeterDefinition` resources in the billing namespace. | ||
| - **Phase 2** instruments the Envoy Gateway instances to write structured JSON access logs to stdout. A node-level Vector Agent (Log Parser) tails these logs, parses and maps the raw logs into CloudEvents, and forwards them locally via HTTP to the `billing-usage-collector-vector` DaemonSet. The billing collector then handles local disk buffering and reliably forwards them to the Billing System. |
There was a problem hiding this comment.
We should measure the resource impact this would have on the worker node. It may make sense to have the existing vector agent handle collection from envoy and transforming the results into billable usage events.
|
|
||
| --- | ||
|
|
||
| ### Service and ServiceConfiguration Definitions |
There was a problem hiding this comment.
I don't think it's important to have the actual service configuration file here. That's an implementation detail of registering the metering definitions.
|
Following on from the pipeline shape — the piece worth nailing down is how the log line gets the metadata embedded in the first place, since the project isn't something Envoy knows on its own. Project identity lives on the namespace ( The natural place to do that is the extension server proposed in #187. It already injects per-route So rather than enrich anywhere downstream, we piggyback on #187 to stamp the metadata at build time and let it fall through into the log line. |
…tions for HTTP traffic metering
|
@scotwells I just updated the enhancement, and I have a working implementation here: Could you please review the Design Details I introduced in the previous commit? I propose to different alternatives. Will wait to hear from you in order to continue with the implementation. |
| 1. **Network Services Operator (controller).** When the operator reconciles a | ||
| customer `HTTPRoute` into its downstream representation, it injects the | ||
| project name as a request header (`x-datum-project-name`) via a | ||
| `RequestHeaderModifier` filter on each route rule. The project name is read | ||
| from the upstream cluster identity (the Milo project name) that the | ||
| operator already holds while mapping upstream → downstream resources. Routes | ||
| that already define a `RequestHeaderModifier` are merged into rather than | ||
| duplicated, since Gateway API permits at most one such filter per rule. |
There was a problem hiding this comment.
While this approach works, I believe the approach in #181 (comment) would be better since it doesn't require exposing billing metadata through headers seen by end users.
With the header approach, we would have to prevent the user from ever specifying this header in their request since we use it for billing attribution. Otherwise we could potentially bill the wrong project for an HTTP route.
There was a problem hiding this comment.
thanks for the fast review. Will set this PR on draft and wait till the extension server to be on place for doing the final tweaks into the documentation.
Warning
In
pausetill #187 is on place#181 (comment)
This PR refactors and expands the initial service catalog registration design brief into a structured, standard enhancement proposal focusing on HTTP Traffic Metering for Network Services.
It addresses and resolves all open questions (OD-1 through OD-8) carried over from the previous design brief.
Related to:
Working implementation using OTLP Transport: