NIFI-16028 - Support nested field paths in PutElasticsearchJson by agturley · Pull Request #11355 · apache/nifi

agturley · 2026-06-18T06:02:04Z

Summary

Allow the Index Field, Identifier Field, and Timestamp Field properties to reference a nested
field with a "/"-delimited path (e.g. "@metadata/index"). A value with no "/" behaves exactly as
before (top-level lookup), so existing flows are unaffected.

When the corresponding Retain property is false, the extracted leaf is removed and any ancestor
object left empty is pruned, so a transient envelope like "@metadata" is not indexed once its
fields are consumed; parents that still have other fields are kept.

Applies across NDJSON, JSON Array, and Single JSON, and all index operations. The NDJSON
raw-bytes fast path is bypassed only when a nested path is configured, since the streaming scan
only matches flat field names.

Builds on NIFI-15985.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Apache NiFi Jira issue created

Pull Request Tracking

Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000
Pull request contains commits signed with a registered key indicating Verified status

Pull Request Formatting

Pull Request based on current revision of the main branch
Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

Build completed using ./mvnw clean install -P contrib-check
- JDK 21
- JDK 25

Licensing

New dependencies are compatible with the Apache License 2.0 according to the License Policy
New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

Documentation formatting appears as expected in rendered files

pvillard31 · 2026-06-22T09:23:37Z

            .name("Identifier Field")
            .description("""
                    The name of the field within each document to use as the Elasticsearch document ID. \
+                    A nested field can be referenced with a "/"-delimited path (e.g. "@metadata/id"). \


Could you rebase this branch on the latest main? The validation gate is failing on "branch not up to date", and because of that the full build and test matrix was skipped, so we do not have a green CI signal yet.

pvillard31 · 2026-06-22T09:23:37Z

+    /**
+     * Whether {@code field} is a {@code /}-delimited nested path rather than a flat field name.
+     */
+    private static boolean isNestedPath(final String field) {


A field key that literally contains a slash is now always read as a nested path. Should we note this limitation in the property descriptions so users with such field names are not surprised?

I've altered it to allow users to escape a backslash to allow a backslash in the path

pvillard31 · 2026-06-22T09:23:37Z

+    }
+
+    @Test
+    void testNestedIndexFieldKeepsParentWithRemainingSiblings() {


Could we add a test with a 3 level path (for example a/b/c) to lock in the recursive pruning of multiple empty ancestor objects?

pvillard31 · 2026-06-22T09:23:37Z

+    }
+
+    @Test
+    void testNestedIdentifierFieldNdjsonPrunesEmptyParent() {


Should we add a nested identifier test for an Update, Delete, or Upsert operation, so the resolveId nested path is covered as well?

Added testNestedIdentifierFieldUpdateOperationResolvesAndPrunes — it covers the resolveId nested path via an Update op (@metadata/id), verifying the nested ID resolves and the empty parent is pruned. Also added an escaped-slash test on the same path.

@metadata

Allow the Index Field, Identifier Field, and Timestamp Field properties to reference a nested field with a "/"-delimited path (e.g. "@metadata/index"). A value with no "/" behaves exactly as before (top-level lookup), so existing flows are unaffected. When the corresponding Retain property is false, the extracted leaf is removed and any ancestor object left empty is pruned recursively, so a transient envelope like "@metadata" is not indexed once its fields are consumed; parents that still have other fields are kept. A field name that literally contains a "/" can be addressed by escaping the slash as "\/" (and a literal backslash as "\\"); a backslash not followed by "/" or "\" is left untouched, so existing field names are unaffected. The escaping rule and worked nesting/escaping examples are documented in the Index, Identifier, and Timestamp Field property descriptions. Applies across NDJSON, JSON Array, and Single JSON, and all index operations. The NDJSON raw-bytes fast path is bypassed only when a nested path is configured, since the streaming scan only matches flat field names; the path classification and decoded flat names are resolved once per onTrigger rather than per record. Builds on NIFI-15985.

agturley · 2026-06-22T17:47:46Z

Do you think the description updates I provided are overkill?

pvillard31 · 2026-06-22T20:52:24Z

            .name("Identifier Field")
            .description("""
                    The name of the field within each document to use as the Elasticsearch document ID. \
+                    A nested field can be referenced with a "/"-delimited path: for a document \


These three properties shipped in released 2.10.0 with literal field name semantics, so an existing flow that has a slash in the configured value will now be read as a nested path after an upgrade. I am not an experienced Elasticsearch user and I do not know how common a slash is in these field names, and I agree the likelihood of a problematic value is low, but I think we should still handle this case safely with a property migration that escapes any existing slash or backslash, and add test coverage for that migration. Could we add that here?

So I started on this and ran into something I want to flag before going further.

The problem is migrateProperties runs on every flow import and restart, and there's no way to tell a 2.10 literal field name like @metadata/id apart from a 2.11 nested path @metadata/id — they're the exact same string. So if I just escape slashes, it also escapes legit nested paths and breaks them. I actually hit this on my test cluster: imported a normal 2.11 flow and every nested-path case quietly stopped working because the migration had rewritten @metadata/id to @metadata\/id on import.

The only way I found to make it safe is to rename the three properties (Identifier Field → Identifier Field Path, etc). Then the old name only exists in pre-2.11 flows, so I can escape values found under the old name and leave the new ones alone. That works, but it means renaming three properties everyone's already using.

This is feeling like a lot for the case we're protecting against — a 2.10 flow that happens to have a literal / in the field name, and slashes aren't really common in ES field names to begin with. So I'm inclined to skip the migration and just document it: on upgrade, a field name with a literal / gets read as a nested path now, and you can escape it as / if you want the old behavior.

Let me know if you'd rather I do the rename though, happy to if you think the upgrade guarantee is worth it.

@ChrisSamo632 - thoughts on this? You're way more familiar with ES so your feedback would be appreciated if you have the chance

I'm not aware of any restrictions around the use of / characters in field names in Elasticsearch or json in general, although I'd imagine the occurrence is relatively infrequent

Trying to think of examples, something like house_name/number could be the sort of thing someone might use, but I'd typically opt for an alternative in such a case as I'd see the / as unusual (and potentially problematic in some systems/parsers)

For this instance, would a workable option be to add a processor parameter that let's the user configure how a / should be treated in field names for this processor? The default could be to have the character treated as a literal, with a new mode that enables the nested object parsing?

Alternatively, would . be a better option as it matches the Elasticsearch nested query parsing, but then is further from the nifi record path parsing, which / emulates? The same concern would exist, although use of . in elasticsearch field names has been discouraged for a long time as it can confuse the Elasticsearch query parser - therefore, the chances of people using . litterals on field names seems lower

pvillard31 · 2026-06-22T20:52:24Z

            .name("Index Field")
            .description("""
                    The name of the field within each document to use as the Elasticsearch index name. \
+                    A nested field can be referenced with a "/"-delimited path: for a document \


The escape explanation is repeated in all three property descriptions. Could we move the full explanation into an additionalDetails.md for the processor and have each property description point to the processor documentation for the details instead?

agreed, done

pvillard31 · 2026-06-22T20:52:24Z

+     * metadata, e.g. {@code @metadata}, should not be indexed once its fields are extracted).
+     * Does nothing when the path is absent. A path with no slash is a direct top-level removal.
+     */
+    private static void removeAtPath(final Map<String, Object> root, final String path) {


The Map and ObjectNode versions of removeAtPath duplicate the same descend and prune loop. Could we factor the shared walk into one helper so the two cannot drift over time?

…dditionalDetails, add pass-through tests

pvillard31 requested changes Jun 22, 2026

View reviewed changes

agturley force-pushed the NIFI-16028 branch from 8e2475e to c088c42 Compare June 22, 2026 17:10

agturley requested a review from pvillard31 June 22, 2026 17:14

pvillard31 reviewed Jun 22, 2026

View reviewed changes

NIFI-16028 - Address review: factor removeAtPath, move path docs to a…

4872c2b

…dditionalDetails, add pass-through tests

agturley requested a review from pvillard31 June 23, 2026 00:14

Uh oh!

Conversation

agturley commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tracking

Issue Tracking

Pull Request Tracking

Pull Request Formatting

Verification

Build

Licensing

Documentation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agturley commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agturley Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

agturley commented Jun 18, 2026 •

edited

Loading

agturley commented Jun 22, 2026 •

edited

Loading

agturley Jun 23, 2026 •

edited

Loading