Skip to content

RANGER-5654: Solr audit dispatcher fails to index after Kerberos TGT relogin (No key to store) with default useTicketCache=true#1030

Open
ramackri wants to merge 3 commits into
apache:masterfrom
ramackri:RANGER-5654-patch
Open

RANGER-5654: Solr audit dispatcher fails to index after Kerberos TGT relogin (No key to store) with default useTicketCache=true#1030
ramackri wants to merge 3 commits into
apache:masterfrom
ramackri:RANGER-5654-patch

Conversation

@ramackri

@ramackri ramackri commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Fixes RANGER-5654: Solr audit dispatcher stops indexing audits into Kerberos-protected Solr after TGT refresh/relogin when useTicketCache=true (the shipped default).

What changes were proposed in this pull request?

Set xasecure.audit.jaas.Client.option.useTicketCache=false in:

File Purpose
audit-server/audit-dispatcher/dispatcher-solr/src/main/resources/conf/ranger-audit-dispatcher-solr-site.xml Shipped default (audit-dispatcher tarball)
dev-support/ranger-docker/scripts/audit-dispatcher/ranger-audit-dispatcher-solr-site.xml Docker Tier 3 E2E harness

No Java changes. Config-only fix.

Problem

The Solr dispatcher consumes from Kafka but eventually stops writing to Solr when Kerberos is enabled. Logs show Failure in sending audits into Solr and No key to store. Kafka offsets advance; Solr document counts stay flat until dispatcher restart.

Root cause (corrected causal chain):

  1. SolrAuditDestination uses KerberosJAASConfigUser + KerberosAction.
  2. Every Solr writeKerberosAction.execute()checkTGTAndRelogin().
  3. When the TGT passes ~80% of its lifetime (TICKET_RENEW_WINDOW = 0.80 in AbstractKerberosUser), that method does logout(); login() — this is by design in KerberosAction, not caused by useKeyTab=true.
  4. Shipped config has useKeyTab=true (correct for a keytab daemon) and useTicketCache=true (incorrect here).
  5. After logout(), login() with useTicketCache=true makes Krb5LoginModule use the ticket cache; with no valid cache entry → "No key to store" → dispatcher stuck until restart.
JAAS option Role Causes 80% logout(); login()? Causes "No key to store"?
KerberosAction / checkTGTAndRelogin() Proactive TGT refresh before each Solr op Yes No
useKeyTab=true Login from keytab (correct for daemon) No No (by itself)
useTicketCache=true Use OS ticket cache on relogin No Yes (with step 3)

Fix: useTicketCache=false so step 3 still happens at ~80% TGT, but step 5 succeeds by reading the keytab again (same pattern as ingestor Kafka JAAS and Kafka plugin).

Why useTicketCache=false?

useTicketCache=true is appropriate when a process relies on an existing user ticket cache (kinit / KRB5CCNAME). For long-running services that authenticate from a keytab, Ranger and Hadoop convention is useTicketCache=false so relogin always uses the keytab — the same pattern already used elsewhere in the audit stack.

Component Mechanism useTicketCache Proactive relogin
Ingestor Kafka producer JAAS string (AuditServerConstants) false Kafka client handles refresh
Kafka plugin JAAS string false connection-time
HDFS dispatcher UGI keytab N/A checkTGTAndReloginFromKeytab()
Plugin → ingestor audits UGI keytab N/A checkTGTAndReloginFromKeytab()
Admin Solr (postgres docker) JAAS via site XML false on-demand queries
SPNEGO acceptor (ingestor HTTP) UGI + Hadoop SPNEGO filter (isInitiator=false in unused acceptor JAAS helper) true (acceptor role; not this code path) per-request token validation, not KerberosAction relogin
Solr dispatcher (before fix) JAAS client + KerberosAction true (bug) every Solr write (logout(); login() at ~80% TGT)

The Solr dispatcher was the outlier: the only long-running audit daemon using outbound JAAS client login with proactive logout()/login() while useTicketCache=true. HDFS dispatcher and plugin→ingestor paths avoid this by using UGI checkTGTAndReloginFromKeytab() instead of JAAS logout()/login().

How was this patch tested?

Manual testing (Docker Tier 3 audit stack with Kerberos)

Environment: dev-support/ranger-docker Tier 3 compose — Admin, KDC, Kafka, ingestor, Solr dispatcher, HDFS/Ozone/Hive plugins.

Reproduce (master behavior):

  1. Start Tier 3 stack; trigger audits (e.g. HDFS hdfs dfs -ls /, Ozone volume create).
  2. Confirm audits reach Kafka; after TGT refresh window observe Solr dispatcher logs: No key to store, Failure in sending audits into Solr.
  3. Solr reqUser:* count flat; Kafka offset continues to grow.

Verify (with useTicketCache=false + Solr dispatcher restart):

  1. Redeploy Solr dispatcher with updated site XML.
  2. Confirm clean JAAS login (Successful login for rangerauditserver/...).
  3. Trigger additional audits; Solr document count increases; Admin accessAudit totalCount increases.
  4. Full pipeline PASS: plugin → ingestor :7081 → Kafka ranger_audits → Solr dispatcher → Solr ranger_audits → Admin Access Audit UI.

Upgrade note

Existing deployments that already have useTicketCache=true in their live ranger-audit-dispatcher-solr-site.xml must set it to false and restart the Solr dispatcher (or redeploy from the updated tarball).

ramk and others added 2 commits June 23, 2026 12:53
…elogin (No key to store) with default useTicketCache=true

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@ramackri ramackri requested a review from rameeshm June 23, 2026 07:59
Co-authored-by: Cursor <cursoragent@cursor.com>
@ramackri ramackri requested a review from mneethiraj June 23, 2026 09:14
<property>
<name>xasecure.audit.jaas.Client.option.useTicketCache</name>
<value>true</value>
<value>false</value>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an alternate to disabling use of ticket cache? Like having a thread refresh kerberos ticket before it expires? Is ticket cache disabled in all Ranger modules, like Ranger admin (to fetch audit logs from Solr), plugins (to download policies/tags/role/..)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants