RANGER-5654: Solr audit dispatcher fails to index after Kerberos TGT relogin (No key to store) with default useTicketCache=true by ramackri · Pull Request #1030 · apache/ranger

ramackri · 2026-06-23T07:24:34Z

Fixes RANGER-5654: Solr audit dispatcher stops indexing audits into Kerberos-protected Solr after TGT refresh/relogin when useTicketCache=true (the shipped default).

What changes were proposed in this pull request?

Set xasecure.audit.jaas.Client.option.useTicketCache=false in:

File	Purpose
`audit-server/audit-dispatcher/dispatcher-solr/src/main/resources/conf/ranger-audit-dispatcher-solr-site.xml`	Shipped default (audit-dispatcher tarball)
`dev-support/ranger-docker/scripts/audit-dispatcher/ranger-audit-dispatcher-solr-site.xml`	Docker Tier 3 E2E harness

No Java changes. Config-only fix.

Problem

The Solr dispatcher consumes from Kafka but eventually stops writing to Solr when Kerberos is enabled. Logs show Failure in sending audits into Solr and No key to store. Kafka offsets advance; Solr document counts stay flat until dispatcher restart.

Root cause (corrected causal chain):

SolrAuditDestination uses KerberosJAASConfigUser + KerberosAction.
Every Solr write → KerberosAction.execute() → checkTGTAndRelogin().
When the TGT passes ~80% of its lifetime (TICKET_RENEW_WINDOW = 0.80 in AbstractKerberosUser), that method does logout(); login() — this is by design in KerberosAction, not caused by useKeyTab=true.
Shipped config has useKeyTab=true (correct for a keytab daemon) and useTicketCache=true (incorrect here).
After logout(), login() with useTicketCache=true makes Krb5LoginModule use the ticket cache; with no valid cache entry → "No key to store" → dispatcher stuck until restart.

JAAS option	Role	Causes 80% `logout(); login()`?	Causes "No key to store"?
`KerberosAction` / `checkTGTAndRelogin()`	Proactive TGT refresh before each Solr op	Yes	No
`useKeyTab=true`	Login from keytab (correct for daemon)	No	No (by itself)
`useTicketCache=true`	Use OS ticket cache on relogin	No	Yes (with step 3)

Fix: useTicketCache=false so step 3 still happens at ~80% TGT, but step 5 succeeds by reading the keytab again (same pattern as ingestor Kafka JAAS and Kafka plugin).

Why `useTicketCache=false`?

useTicketCache=true is appropriate when a process relies on an existing user ticket cache (kinit / KRB5CCNAME). For long-running services that authenticate from a keytab, Ranger and Hadoop convention is useTicketCache=false so relogin always uses the keytab — the same pattern already used elsewhere in the audit stack.

Component	Mechanism	`useTicketCache`	Proactive relogin
Ingestor Kafka producer	JAAS string (`AuditServerConstants`)	false	Kafka client handles refresh
Kafka plugin	JAAS string	false	connection-time
HDFS dispatcher	UGI keytab	N/A	`checkTGTAndReloginFromKeytab()`
Plugin → ingestor audits	UGI keytab	N/A	`checkTGTAndReloginFromKeytab()`
Admin Solr (postgres docker)	JAAS via site XML	false	on-demand queries
SPNEGO acceptor (ingestor HTTP)	UGI + Hadoop SPNEGO filter (`isInitiator=false` in unused acceptor JAAS helper)	true (acceptor role; not this code path)	per-request token validation, not `KerberosAction` relogin
Solr dispatcher (before fix)	JAAS client + `KerberosAction`	true (bug)	every Solr write (`logout(); login()` at ~80% TGT)

The Solr dispatcher was the outlier: the only long-running audit daemon using outbound JAAS client login with proactive logout()/login() while useTicketCache=true. HDFS dispatcher and plugin→ingestor paths avoid this by using UGI checkTGTAndReloginFromKeytab() instead of JAAS logout()/login().

How was this patch tested?

Manual testing (Docker Tier 3 audit stack with Kerberos)

Environment: dev-support/ranger-docker Tier 3 compose — Admin, KDC, Kafka, ingestor, Solr dispatcher, HDFS/Ozone/Hive plugins.

Reproduce (master behavior):

Start Tier 3 stack; trigger audits (e.g. HDFS hdfs dfs -ls /, Ozone volume create).
Confirm audits reach Kafka; after TGT refresh window observe Solr dispatcher logs: No key to store, Failure in sending audits into Solr.
Solr reqUser:* count flat; Kafka offset continues to grow.

Verify (with useTicketCache=false + Solr dispatcher restart):

Redeploy Solr dispatcher with updated site XML.
Confirm clean JAAS login (Successful login for rangerauditserver/...).
Trigger additional audits; Solr document count increases; Admin accessAudit totalCount increases.
Full pipeline PASS: plugin → ingestor :7081 → Kafka ranger_audits → Solr dispatcher → Solr ranger_audits → Admin Access Audit UI.

Upgrade note

Existing deployments that already have useTicketCache=true in their live ranger-audit-dispatcher-solr-site.xml must set it to false and restart the Solr dispatcher (or redeploy from the updated tarball).

…elogin (No key to store) with default useTicketCache=true Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

mneethiraj · 2026-06-23T11:25:19Z

    <property>
        <name>xasecure.audit.jaas.Client.option.useTicketCache</name>
-        <value>true</value>
+        <value>false</value>


Is there an alternate to disabling use of ticket cache? Like having a thread refresh kerberos ticket before it expires? Is ticket cache disabled in all Ranger modules, like Ranger admin (to fetch audit logs from Solr), plugins (to download policies/tags/role/..)?

ramk and others added 2 commits June 23, 2026 12:53

RANGER-5654:Solr audit dispatcher fails to index after Kerberos TGT r…

ba62b45

…elogin (No key to store) with default useTicketCache=true Co-authored-by: Cursor <cursoragent@cursor.com>

RANGER-5654: Restore site XML descriptions and document relogin recovery

67e08e7

Co-authored-by: Cursor <cursoragent@cursor.com>

ramackri requested a review from rameeshm June 23, 2026 07:59

RANGER-5654: Drop AbstractKerberosUser change; config-only fix

ee096db

Co-authored-by: Cursor <cursoragent@cursor.com>

ramackri requested a review from mneethiraj June 23, 2026 09:14

mneethiraj reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RANGER-5654: Solr audit dispatcher fails to index after Kerberos TGT relogin (No key to store) with default useTicketCache=true#1030

RANGER-5654: Solr audit dispatcher fails to index after Kerberos TGT relogin (No key to store) with default useTicketCache=true#1030
ramackri wants to merge 3 commits into
apache:masterfrom
ramackri:RANGER-5654-patch

ramackri commented Jun 23, 2026 •

edited

Loading

Uh oh!

mneethiraj Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ramackri commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Problem

Why useTicketCache=false?

How was this patch tested?

Manual testing (Docker Tier 3 audit stack with Kerberos)

Upgrade note

Uh oh!

mneethiraj Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ramackri commented Jun 23, 2026 •

edited

Loading

Why `useTicketCache=false`?