feat(analytics): HMAC bearer auth #35776
Conversation
…CS_PASSWORD HMAC bearer is the only auth scheme. Token sourced from per-site app secrets via the save-flow exchange; DOT_ANALYTICS_BEARER_TOKEN env var kept as a bootstrap fallback for CI/smoke contexts. Basic-auth path and DOT_ANALYTICS_AUTH_MODE switch removed; MonitorHelper no longer requires DOT_ANALYTICS_PASSWORD.
If the credential exchange fails (event-manager unreachable, wrong password, missing tenant/base-url config), the password was being left in the app config. The YAML hints explicitly promise it is never persisted, so clear it on every failure path. User sees an error message and re-enters the password to retry.
Adds an INFO block above the bearerToken field telling site admins it's managed by the save flow and must not be edited. Apps form framework doesn't support read-only STRING fields, so the field stays hidden+STRING to survive the mask-preservation path on unrelated saves — the social signal compensates for the lack of a UI guard.
Replaces the stub test (empty body + tautological string compare) with JUnit 5 + Mockito 5 MockedStatic coverage of the listener's critical behaviors: - null event is a no-op - missing host identifier short-circuits before any HostAPI call - re-entrancy guard skips when adminPassword is absent - missing DOT_ANALYTICS_TENANT clears the password and notifies the user - missing DOT_ANALYTICS_BASE_URL clears the password and notifies the user - getKey() returns CONTENT_ANALYTICS_APP_KEY Listener constructed via reflection on the (HostAPI) constructor to avoid the APILocator singleton chain. HTTP success path remains an integration-test concern (mocking the full CircuitBreakerUrl builder chain isn't worth the complexity here).
… bearer Addresses three review findings on the catch-all event-manager proxy: - Whitelist relative paths to the event/* surface area; reject anything containing .. or \ segments. Prevents an authenticated backend user from probing upstream admin endpoints via path traversal. - Set a 10s upstream timeout to keep a slow event manager from exhausting the Jersey thread pool and degrading the admin portal. - Return Optional<String> from buildAuthHeader and omit the Authorization header entirely when no token is resolved, instead of sending the malformed 'Bearer ' (empty token) value that RFC 6750 forbids. Adds unit coverage for isAllowedRelativePath including traversal, non-event prefixes, null/blank input.
…conflict Addresses four findings from the second-round review: - Make ContentAnalyticsAppListener(HostAPI) package-private and drop the reflection helper in the test — same package, no need to bypass access. - Simplify ALLOWED_PATH_PREFIXES to ['event'] (the '/' variant was redundant given the equals + startsWith(prefix + '/') matching logic). - proxyEventRequest now short-circuits on null/blank body and also catches IllegalArgumentException alongside IOException, so malformed payloads produce a 400 with a clear message instead of a raw 500. - Drop 'envvar: DOT_ANALYTICS_BEARER_TOKEN' from the bearerToken field in the App YAML to remove the double-binding with the helper's Config.getStringProperty fallback. The env var stays as a code-level bootstrap fallback only. Deferred (per design / scope): - Per-site tenant in App config: deliberate design choice, tenant stays global via DOT_ANALYTICS_TENANT. - Async-save plaintext window: needs Apps framework pre-save hook, outside this branch's scope.
requiredBackendUser(true) lets any logged-in backend user (content editor, translator, etc.) query raw analytics across all sites. Add requireAdmin(true) so only CMS administrators reach the read proxy. Event POST path is unchanged — it uses siteAuth gating, not backend-user gating.
- ContentAnalyticsAppListener.notify now null-checks both getAppSecrets() and getSecrets() before reading, so a malformed AppSecretSavedEvent exits cleanly instead of NPE-ing through the subscriber chain. - buildAuthHeader trims whitespace from both the per-site bearer token and the env-var fallback before emitting the Authorization header. Defends against copy-paste artifacts (trailing newlines, leading spaces) that would produce an RFC 7230-illegal header value.
…esolution, health - Only set Content-Type on POST upstream calls (RFC 9110 §8.3): GETs without a body shouldn't carry a Content-Type, some gateways reject the request. - Guard exchangeToken() against a null CircuitBreakerUrl.Response (returned when setThrowWhenError(false) is paired with a circuit-open / transport-level failure). Avoids a silent NPE that swallowed the real diagnostic. - notifyError() now logs and returns when userId is blank instead of pushing a null-userId notification that would NPE in the subscriber chain and block downstream AppSecretSavedEvent listeners. - proxyGetRequest() now resolves the target site from an explicit ?siteId= query param first, falling back to the session-bound host only when absent. Eliminates the multi-tab race where two admin tabs targeting different sites could leak data across tenant scopes via the shared session host, and unblocks stateless API clients. - isContentAnalytics() health check no longer returns NOT_CONFIGURED when the probe lands on the System Host (typical for IP/LB-based monitoring probes). Global env-var config (BASE_URL + TENANT + PROJECT) is now the baseline; the reachability probe runs whenever the subsystem is globally configured, so outages of the Event Manager surface as CONFIGURATION_ERROR to monitoring.
Self-review walked back four review-driven additions that added noise without preventing a real failure mode: - Drop the appSecrets / getSecrets() null guard in notify(). The framework's constructor invariant guarantees both are non-null on dispatch; the guard was cargo-culted from a theoretical reviewer concern. - Drop the containsKey(ADMIN_PASSWORD_KEY) early-return in clearAdminPassword. All call sites are reached only after notify()'s 'password is set' guard, so the inner check is dead by construction. - Drop the trim() on the per-site bearer token. The token is produced by the upstream /v1/admin/token endpoint and JSON-parsed via JSONObject.optString; whitespace cannot land in it. The env-var fallback trim (real copy-paste risk) is kept. - Drop the dedicated info_token_managed INFO block in the YAML. The field's own hint already says 'Managed by the save flow above. Leave the masked value alone.' The separate block was the same message restated.
…save-failure notify, drop empty-POST Content-Type - buildUpstreamUrl now only appends the configured DOT_ANALYTICS_PROJECT when the caller didn't already include a project= query parameter. Eliminates the duplicate-key upstream URL (?project=foo&project=bar) whose decoding behavior is implementation-defined. - proxyGetRequest now returns 400 INVALID_SITE_ID when an explicit ?siteId= cannot be resolved, instead of silently falling back to the session host. Surprising-data-leak failure mode replaced with a clean error. - persistTokenAndClearCredentials and clearAdminPassword now notify the user via SystemMessage on saveSecrets failure, instead of swallowing the error with a log-only handler. Doesn't fix the underlying async-window issue (Apps framework pre-save hook still pending), but at least the operator sees something went wrong instead of a silent 'Save Successful'. - Drop Content-Type: application/json from exchangeToken's POST headers — the call has no body, so per RFC 9110 §8.3 the Content-Type is meaningless and can trip strict gateways.
…sts, javadoc - MonitorHelper.isContentAnalytics now null-guards the resolved Host before passing it to ContentAnalyticsUtil.getAppSecrets. IP / load-balancer probes where WebAPILocator.getCurrentHostNoThrow returns null no longer hit the catch-block NPE in the (pre-existing) utility, which would otherwise log a stack trace on every probe. - Add 5 unit tests for buildUpstreamUrl covering trailing-slash stripping, query-param forwarding, configured-project append, the new caller-supplied project dedup (round 5 #1), and the empty-query path. - Correct misleading proxyEventRequest javadoc. The example showed POSTing to /content/event/total-events which doesn't exist (the @path is an exact match and the upstream is hard-coded to event/ingest). Replace with the actual ingest example and point at proxyGetRequest for the read endpoints.
…list gap - healthCheck() now sets a 2s timeout. The method is pre-existing on main, but round-4 #6 expanded the conditions under which it runs (now on every monitor probe, including IP/LB probes that land on the System Host). Without a timeout, a hung event manager would block the monitor's calling thread. - proxyEventRequest now guards against a Java null bodyMap. JSON parsers treat the literal token 'null' as a valid value that deserializes to a null Map, which then NPEs on get(). Existing checks only caught null/blank request bodies, not the 'null' literal. - isAllowedRelativePath now rejects relative paths containing '?' or '#'. JAX-RS URL-decodes %3F / %23 in @PathParam values, so a request like /api/v1/analytics/event/foo%3Fbar passes the startsWith('event/') check but produces a malformed upstream URL (double-? on appended query params) or strips them entirely (#). Reviewer correctly distinguished this from the previously-rejected event?injected case (still rejected because it doesn't start with event/). - Extract BEARER_TOKEN_KEY constant to ContentAnalyticsUtil so the listener and the helper reference a single source-of-truth for the app-secret key name. Removes a divergence vector if the key ever gets renamed. - Add a unit test for the new allowlist rejection cases.
…eanup GET proxy now resolves the requested site as the logged-in user instead of systemUser, so PermissionAPI gates access. Dropped the requireAdmin gate that was a temporary fix; a backend user without READ on the site gets 403 SITE_ACCESS_DENIED, even if they hand-craft ?siteId= to bypass the (already-permissioned) site picker. Session fallback switched to getCurrentHost(request, user) for the same reason. 403 path mirrors the global DotForbiddenExceptionMapper by calling SecurityLogger.logInfo so audit trails stay intact. Other fixes: - Listener env-var constants promoted to import static from EventAnalyticsProxyHelper (matches round-7 BEARER_TOKEN_KEY pattern, matches MonitorHelper). - Dead 4-arg EventAnalyticsProxyHelper.proxy() overload removed; both callers use the 5-arg variant. - @Operation/@parameter on proxyEventRequest rewritten to match reality (JSON body validation, exact path, hard-coded event/ingest upstream). - @ApiResponses on proxyGetRequest gained 400/403 entries. - DONT_RESPECT_FRONT_END_ROLES used instead of literal false (matches sibling generateSiteKey usage in the same file). - persistTokenAndClearCredentials relaxed to package-private to enable a direct happy-path test of the atomic password-clear + token-persist invariant; new test asserts adminPassword absent, bearerToken replaced, unrelated secrets preserved. - buildAuthHeader gained five tests covering the security-critical branches: per-site wins over fallback, env fallback with trim, null host fallback, both-absent returns empty (no malformed "Bearer " per RFC 6750), blank site token falls through.
This comment was marked as resolved.
This comment was marked as resolved.
- ContentAnalyticsAppListener.persistTokenAndClearCredentials.onFailure
now calls clearAdminPassword as best-effort cleanup. Without this,
a failure mid-token-persist would leave the user's adminPassword
sitting in encrypted storage indefinitely (a re-save would short-
circuit on the "password not set" guard). Notification text updated
to drop the misleading "password may still be stored" claim.
- EventAnalyticsProxyHelper.isAllowedRelativePath switched from a
blanket contains("..") to a segment-equality check for "." and "..".
The previous form would have wrongly rejected legitimate future
endpoint names containing dots (e.g. "event/last..7..days").
Backslash + ?/# guards retained as defense-in-depth.
- Regenerate openapi.yaml — Swagger annotation edits in the last
commit (@Operation/@Parameter/@ApiResponses) drifted the committed
YAML; CI's "no uncommitted changes after build" gate caught it.
- New tests: failure-cleanup path on the listener, regression test
on the helper confirming dots inside segments are now accepted.
|
|
||
| /** Relative paths allowed by the catch-all proxy. Anything outside this list — including | ||
| * attempts to escape via {@code ..} segments — is rejected with 400. */ | ||
| private static final List<String> ALLOWED_PATH_PREFIXES = List.of("event"); |
There was a problem hiding this comment.
really we support more than just event url path in the event manager
There was a problem hiding this comment.
Yes sorry, meant to expand the list, this is done.
List.of("event", "conversion", "session", "health");
Am I missing any @freddyDOTCMS ? I want to keep the allowlist otherwise the proxy means any authenticated backend user can tunnel through the auth boundary.
- ALLOWED_PATH_PREFIXES expands to event/conversion/session/health to match the actual dot-ca-event-manager surface area. The prior event-only list would have 400'd every dashboard call to the conversion, session, and health endpoints. Allowlist itself kept (rather than dropped) so the proxy doesn't auto-expose admin/* or tenants/* with the tenant-scoped bearer. - proxyGetRequest's site-resolution + load-bearing javadoc extracted into a private resolveSiteForUser(request, user) helper. The request handler is now just the try/catch + response-building. - Helper tests: positive cases for each new prefix; old rejectsNonEventPrefixes renamed to rejectsDisallowedPrefixes with the security intent made explicit (admin/, tenants/) plus boundary-substring checks (healthcheck, conversionfoo). Per Freddy's PR review comments.
| @@ -1,4 +1,5 @@ | |||
| set positional-arguments := true | |||
| import? 'justfile.local' | |||
There was a problem hiding this comment.
Is this change intended?
There was a problem hiding this comment.
Yes, I have a local justfile with things to ease my personal dev process and its an optional import so it won't impact others. Just committing it to main now so I don't have to re-add it on every branch. There are some things I plan to push into the main justfile but I am fleshing them out locally.
Closes #35775
Sub-issue of #35048 (Phase 3 — Security)
Summary
Replaces Basic auth on the dotCMS-side analytics proxy with per-tenant HMAC bearer tokens, adds a save-flow that exchanges admin credentials for a token (clearing the password from the app config after the exchange), tightens the proxy/health surface, and enforces per-site permissions on the GET read endpoint.
What changes
Auth model
DOT_ANALYTICS_PASSWORDentirely. The proxy now uses bearer tokens sourced from the per-site Content Analytics App secret, withDOT_ANALYTICS_BEARER_TOKENas a bootstrap fallback for non-site-bound contexts.ContentAnalyticsAppListener: when an admin saves the Content Analytics App withadminPasswordpopulated, the listener callsPOST {DOT_ANALYTICS_BASE_URL}/v1/admin/token(Basic auth:tenant:password), stores the returnedbearerTokenas a hidden secret on the app config, and clears theadminPasswordfield. The user-entered password is written to encrypted app-secret storage by the original save and then removed automatically once the token exchange completes; on a failed exchange (or a failed token-persist) the listener still attempts to clear the password as a best-effort cleanup.AppSecretSavedEvent; the listener short-circuits via the "admin password not set" guard.adminPassword,bearerToken) with operator hints explaining the one-time exchange.Proxy hardening (
EventAnalyticsProxyHelper)event/event/...are forwarded. Path segments exactly equal to..or., encoded?/#, backslashes, and non-eventprefixes are rejected with 400.project=wins overDOT_ANALYTICS_PROJECT; the configured value is no longer appended when the URL already has one.buildAuthHeaderreturnsOptional.empty()when no token is available; the proxy then omitsAuthorizationinstead of emitting a malformedBearer(no token).nullJSON guard returns 400 with a clear error envelope. Empty-body POSTs dropContent-Typeper RFC 9110 §8.3.Per-site permission gating (
EventAnalyticsProxyResource.proxyGetRequest)HostAPI.find(siteId, user, DONT_RESPECT_FRONT_END_ROLES)for the explicit?siteId=branch andHostWebAPI.getCurrentHost(request, user)for the session fallback, soPermissionAPI.READgates access.403 SITE_ACCESS_DENIED, even if they hand-craft?siteId=to bypass the (already-permissioned) site picker.SecurityLogger.logInfo(...)to mirror what the globalDotForbiddenExceptionMapperwould log — keeps denied-site accesses visible in the audit trail.Health monitor (
MonitorHelper.isContentAnalytics)DOT_ANALYTICS_PASSWORDrequirement.getAppSecrets./v1/healthtimeout via the same 2s ceiling so monitor probes fail fast.Tests
ContentAnalyticsAppListenerTest: failure paths (missing tenant / missing base URL clear the password and notify the user) plus a direct happy-path test ofpersistTokenAndClearCredentialsthat asserts the atomic invariant —adminPasswordabsent,bearerTokenreplaced with the new value, unrelated secrets (siteAuth) preserved verbatim. Method relaxed to package-private specifically for this test.EventAnalyticsProxyHelperTest: path allowlist matrix (traversal, smuggled?/#, non-eventprefixes),buildUpstreamUrl(trailing-slash strip, project dedup, query forwarding), andbuildAuthHeadercovering the four security-critical branches (per-site wins, env fallback with trim, null host → fallback, both absent → empty Optional — the RFC 6750 branch).Files changed
dotCMS/src/main/java/com/dotcms/analytics/listener/ContentAnalyticsAppListener.java(new)dotCMS/src/main/java/com/dotcms/rest/api/v1/analytics/event/EventAnalyticsProxyHelper.javadotCMS/src/main/java/com/dotcms/rest/api/v1/analytics/event/EventAnalyticsProxyResource.javadotCMS/src/main/java/com/dotcms/rest/api/v1/analytics/content/util/ContentAnalyticsUtil.javadotCMS/src/main/java/com/dotcms/rest/api/v1/system/monitor/MonitorHelper.javadotCMS/src/main/java/com/dotcms/system/event/local/business/LocalSystemEventSubscribersInitializer.javadotCMS/src/main/resources/apps/dotContentAnalytics-config.ymldotCMS/src/test/java/com/dotcms/analytics/listener/ContentAnalyticsAppListenerTest.java(new)dotCMS/src/test/java/com/dotcms/rest/api/v1/analytics/event/EventAnalyticsProxyHelperTest.java(new)justfile(import optionaljustfile.local)Operator notes
DOT_ANALYTICS_PASSWORDis no longer read. Remove it from secrets/Terraform/k8s configs after merge.DOT_ANALYTICS_TENANTis still required. Tenant must already exist on the upstream event manager; this PR exchanges credentials, it does not provision the tenant.DOT_ANALYTICS_BEARER_TOKEN(optional) is a bootstrap fallback for non-site-bound contexts; production deployments should use the per-site App save flow instead.Behavioral changes worth flagging to consumers
403 SITE_ACCESS_DENIEDfor backend users without READ on the resolved site. The dashboard already passes?siteId=and the global site picker is already permissioned, so this only affects callers that hand-craft asiteId./api/v1/analytics/content/eventnow return400with"Request body is required". Postman tests that asserted on status alone still pass; tests that asserted on message text may need updates.NOT_CONFIGUREDrequires either global env vars OR per-site App secrets, no longer both.CircuitBreakerUrldefault.Test plan
./mvnw test -pl :dotcms-core -Dtest=ContentAnalyticsAppListenerTest,EventAnalyticsProxyHelperTestopenapi.yaml:./mvnw compile -pl :dotcms-core -DskipTests— verify the GET catch-all gains 400/403 entries and the POST endpoint description matches realityadminPassword; confirmbearerTokenis populated andadminPasswordis cleared in a single round-trip/api/v1/analytics/event/total-events?siteId=<Site B id>and confirm 403SITE_ACCESS_DENIED/api/v1/analytics/event/...against an unconfigured deployment fails fast (timeout) rather than hanging