Skip to content

fix(enrichment tables): preserve memory enrichment table state on reload#25547

Open
esensar wants to merge 8 commits into
vectordotdev:masterfrom
esensar:fix/memory-table-reload-state
Open

fix(enrichment tables): preserve memory enrichment table state on reload#25547
esensar wants to merge 8 commits into
vectordotdev:masterfrom
esensar:fix/memory-table-reload-state

Conversation

@esensar
Copy link
Copy Markdown
Contributor

@esensar esensar commented Jun 1, 2026

Summary

While working on #25143, it was brought to my attention that reload was not handled for memory tables (#25143 (comment)) - that is, new components were generated, that were not attached to the tables that were queried. This PR resolves that by making these tables take over the state of previous components.

Vector configuration

enrichment_tables:
  memory_table:
    type: memory
    ttl: 60
    flush_interval: 5
    inputs: ["cache_generator"]


sources:
  demo_logs_test:
    type: "demo_logs"
    format: "json"

transforms:
  demo_logs_processor:
    type: "remap"
    inputs: ["demo_logs_test"]
    source: |
      . = parse_json!(.message)
      user_id = get!(., path: ["user-identifier"])

      existing, err = get_enrichment_table_record("memory_table", { "key": user_id })

      if err == null {
        . = existing.value
        .source = "cache"
      } else {
        .referer = parse_url!(.referer)
        .referer.host = encode_punycode!(.referer.host)
        .source = "transform"
      }      

  cache_generator:
    type: "remap"
    inputs: ["demo_logs_processor"]
    source: |
      existing, err = get_enrichment_table_record("memory_table", { "key": get!(., path: ["user-identifier"]) })
      if err != null {
        data = .
        . = set!(value: {}, path: [get!(data, path: ["user-identifier"])], data: data)
      } else {
        . = {}
      }      

sinks:
  console:
    inputs: ["demo_logs_processor"]
    target: "stdout"
    type: "console"
    encoding:
      codec: "json"

How did you test this PR?

Ran vector with the above configuration and the --watch-config flag. Changed TTL a couple of times and Vector properly reloaded and kept state, observed by seeing cached output data, instead of newly generated.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

Sponsored by Quad9

@esensar esensar requested a review from a team as a code owner June 1, 2026 12:51
@github-actions github-actions Bot added the domain: topology Anything related to Vector's topology code label Jun 1, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6583fd70e4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/topology/running.rs Outdated
Comment thread src/topology/builder.rs Outdated
@pront
Copy link
Copy Markdown
Member

pront commented Jun 1, 2026

Thanks @esensar! Per our new policy I will come back to this once codex comments are resolved.

@esensar esensar changed the title fix(enrichment): preserve memory enrichment table state on reload fix(enrichment tables): preserve memory enrichment table state on reload Jun 2, 2026
@esensar
Copy link
Copy Markdown
Contributor Author

esensar commented Jun 2, 2026

While resolving this I found that there were quite a few typos and missed cases for reload in enrichment tables that have source/sink. I think I covered them all now.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0044cc7f03

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/topology/builder.rs Outdated
Copy link
Copy Markdown
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution. Can please add a couple of test cases, e.g. take_state_preserves_data and reload_preserves_state?

#[snafu(display("Table {table} not loaded"))]
TableNotLoaded { table: String },
#[snafu(display("Table configuration is not compatible for reload"))]
IncompatibleTableConfig,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used somewhere?

fn needs_reload(&self) -> bool;

/// Returns true if this table holds state that needs to be moved in case of reload.
fn stateful(&self) -> bool;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller must check stateful() before calling take_state(), but nothing prevents calling take_state() on a non-stateful table.

I was thinking of a different design (but didn't validate fully). Specifically:

fn extract_state(&self) -> Option<Box<dyn std::any::Any + Send + Sync>> { None }

In combination with the comment below.

Comment on lines +161 to +163
fn into_any(self: Box<Self>) -> Box<dyn Any>;

fn as_any(&self) -> &dyn Any;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, can we avoid all these new trait methods if we pass prev_state: Option<Box<dyn std::any::Any + Send + Sync>> in build?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: topology Anything related to Vector's topology code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants