Skip to content

docs: refactor service registration brief to HTTP traffic metering#181

Draft
JoseSzycho wants to merge 8 commits into
feat/service-catalog-registration-designfrom
feat/http-traffic-metering
Draft

docs: refactor service registration brief to HTTP traffic metering#181
JoseSzycho wants to merge 8 commits into
feat/service-catalog-registration-designfrom
feat/http-traffic-metering

Conversation

@JoseSzycho

@JoseSzycho JoseSzycho commented Jun 10, 2026

Copy link
Copy Markdown

Warning

In pause till #187 is on place
#181 (comment)

This PR refactors and expands the initial service catalog registration design brief into a structured, standard enhancement proposal focusing on HTTP Traffic Metering for Network Services.

It addresses and resolves all open questions (OD-1 through OD-8) carried over from the previous design brief.

Related to:

Working implementation using OTLP Transport:

…e diagrams and Taskfile automation

This changes replaces the service-catalog-registration with documentation more focused in the architecture design and implementation
@JoseSzycho JoseSzycho changed the base branch from main to feat/service-catalog-registration-design June 10, 2026 15:03
@JoseSzycho JoseSzycho requested a review from scotwells June 10, 2026 15:06

@scotwells scotwells left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directionally the architecture looks good. Let's just clean up some of the implementation details. We want to keep this document product / consumer focused.


### Goals

- Define a standard `Service` and `ServiceConfiguration` to register Network Services under the service domain `networking.datumapis.com`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of an implementation detail. I'd keep goals high level / consumer focused.

Comment on lines +38 to +40
This enhancement defines the architecture, data structures, and roadmap to bring HTTP traffic metering and catalog registration to Network Services. The work is split into two phases:
- **Phase 1 (Catalog & Metadata):** Declare a `Service` and a companion `ServiceConfiguration` resource (`services.miloapis.com/v1alpha1`) carrying the monitored-resource and meter declarations inline. This is a YAML-only delivery packaged in the `config/services/` bundle.
- **Phase 2 (Emission & Integration):** Configure Envoy Gateway proxy logging and deploy a custom Vector Agent to scrape access logs, parse billing signals into CloudEvents, and forward them to the local `billing-usage-collector-vector` DaemonSet.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's important to highlight the phases in the enhancement. How the work breaks down and is sequenced is better for a project management tool (e.g. GitHub issues), not an enhancement document.


Network Services is a core utility that incurs direct infrastructure costs. Capturing consumption signals is necessary for platform billing and cost-attribution.

Because `MeterDefinition` fields (such as `meterName` and `measurement.unit`) are immutable once published, establishing correct definitions in the `Draft`/`Provisional` phase is critical. Doing so avoids costly SDK upgrades, meter deprecation cycles, and data migrations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like internal / implementation detail. The motivation should be more product / consumer focused.

- Altering the `MeterDefinition` schema or billing pipeline contract.
- Implementation of the core Billing SDK (owned by the Billing Team).
- Shared-infrastructure cost attribution or cross-project billing logic.
- **Deploying or modifying the Billing System or `billing-usage-collector-vector` DaemonSet.** These components are pre-existing, shared platform infrastructure. Our work is limited to deploying a custom Vector Agent (Log Parser) to parse logs and forward them to this existing collector.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be a non-goal, it's more of an implementation detail.


- The **Monitored Resource** is the Kubernetes Gateway API `HTTPRoute` resource, representing the customer-facing HTTP endpoint.
- **Phase 1** registers the service with the service catalog via declarative YAML configurations. The service catalog fan-out controller automatically creates `MonitoredResourceType` and `MeterDefinition` resources in the billing namespace.
- **Phase 2** instruments the Envoy Gateway instances to write structured JSON access logs to stdout. A node-level Vector Agent (Log Parser) tails these logs, parses and maps the raw logs into CloudEvents, and forwards them locally via HTTP to the `billing-usage-collector-vector` DaemonSet. The billing collector then handles local disk buffering and reliably forwards them to the Billing System.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should measure the resource impact this would have on the worker node. It may make sense to have the existing vector agent handle collection from envoy and transforming the results into billable usage events.


---

### Service and ServiceConfiguration Definitions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's important to have the actual service configuration file here. That's an implementation detail of registering the metering definitions.

@scotwells

Copy link
Copy Markdown
Contributor

Following on from the pipeline shape — the piece worth nailing down is how the log line gets the metadata embedded in the first place, since the project isn't something Envoy knows on its own. Project identity lives on the namespace (resourcemanager.miloapis.com/project-name, e.g. p-abc), so it has to be put onto the route before the access log can emit it.

The natural place to do that is the extension server proposed in #187. It already injects per-route filter_metadata for the WAF and Connector, already watches namespaces in a warm cache, and already maps the downstream ns-<uid> back to the upstream namespace — so adding the project name to route metadata is a small, additive change that stays out of the request path. Once it's on the route, the access log format references it with %METADATA(ROUTE:...)% and the line comes out with project (and whatever else we want) already embedded, ready for Vector to map to subject: projects/{name}.

So rather than enrich anywhere downstream, we piggyback on #187 to stamp the metadata at build time and let it fall through into the log line.

@JoseSzycho JoseSzycho mentioned this pull request Jun 17, 2026
@JoseSzycho

Copy link
Copy Markdown
Author

@scotwells I just updated the enhancement, and I have a working implementation here:

Could you please review the Design Details I introduced in the previous commit?

I propose to different alternatives.

Will wait to hear from you in order to continue with the implementation.

Comment on lines +149 to +156
1. **Network Services Operator (controller).** When the operator reconciles a
customer `HTTPRoute` into its downstream representation, it injects the
project name as a request header (`x-datum-project-name`) via a
`RequestHeaderModifier` filter on each route rule. The project name is read
from the upstream cluster identity (the Milo project name) that the
operator already holds while mapping upstream → downstream resources. Routes
that already define a `RequestHeaderModifier` are merged into rather than
duplicated, since Gateway API permits at most one such filter per rule.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this approach works, I believe the approach in #181 (comment) would be better since it doesn't require exposing billing metadata through headers seen by end users.

With the header approach, we would have to prevent the user from ever specifying this header in their request since we use it for billing attribution. Otherwise we could potentially bill the wrong project for an HTTP route.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fast review. Will set this PR on draft and wait till the extension server to be on place for doing the final tweaks into the documentation.

@JoseSzycho JoseSzycho marked this pull request as draft June 17, 2026 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants