Skip to content

fix: update Multiretriever params#11823

Open
Amnah199 wants to merge 2 commits into
mainfrom
update-multiretriever
Open

fix: update Multiretriever params#11823
Amnah199 wants to merge 2 commits into
mainfrom
update-multiretriever

Conversation

@Amnah199

Copy link
Copy Markdown
Contributor

Related Issues

Proposed Changes:

  • Add top_k_per_retriever and forward it to each retriever as its top_k; filters and top_k_per_retriever are now only passed when set, so each retriever keeps its own defaults otherwise.
  • Apply the overall top_k at merge time to truncate the combined results, using RRF for a consistent ranking; fixed a sorted shadowing bug in _merge_results.
  • Mirror the same behavior in run_async.

How did you test it?

  • Add/update tests for forwarding, truncation, and RRF fallback (sync + async).

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct.
  • I have updated the related issue with new insights and changes.
  • I have added unit tests and updated the docstrings.
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I have documented my code.
  • I have added a release note file, following the contributors guidelines.
  • I have run pre-commit hooks and fixed any issue.

@Amnah199 Amnah199 requested a review from a team as a code owner June 29, 2026 16:22
@Amnah199 Amnah199 requested review from bogdankostic and removed request for a team June 29, 2026 16:22
@vercel

vercel Bot commented Jun 29, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
haystack-docs Ignored Ignored Preview Jun 29, 2026 4:53pm

Request Review

@github-actions github-actions Bot added topic:tests type:documentation Improvements on the docs labels Jun 29, 2026
@claude

claude Bot commented Jun 29, 2026

Copy link
Copy Markdown

Code review

Reviewed for bugs and CLAUDE.md / AGENTS.md compliance. Three issues found.


1. to_dict does not serialize the new top_k_per_retriever parameter (serialization round-trip bug)

__init__ adds and stores the new top_k_per_retriever init parameter, but to_dict was not updated to include it:

return default_to_dict(
self,
retrievers={name: component_to_dict(obj=r, name=name) for name, r in self.retrievers.items()},
filters=self.filters,
top_k=self.top_k,
max_workers=self.max_workers,
join_mode=self.join_mode,
)

Since from_dict reconstructs via default_from_dict (which calls cls(**init_parameters) using only the keys present in the serialized dict), a MultiRetriever(top_k_per_retriever=...) will silently revert top_k_per_retriever to None after a to_dict() -> from_dict() round-trip (e.g. saving and reloading a pipeline), dropping the configured per-retriever truncation.

        return default_to_dict(
            self,
            retrievers={name: component_to_dict(obj=r, name=name) for name, r in self.retrievers.items()},
            filters=self.filters,
            top_k_per_retriever=self.top_k_per_retriever,
            top_k=self.top_k,
            max_workers=self.max_workers,
            join_mode=self.join_mode,
        )

2. test_init_default_parameters now fails — default top_k changed from 10 to None

This PR changes the __init__ default of top_k from 10 to None, but the existing (unmodified) test still asserts retriever.top_k == 10 for a default-constructed instance, so it will fail in CI:

retriever = MultiRetriever(retrievers=retrievers)
assert retriever.retrievers == retrievers
assert retriever.filters is None
assert retriever.top_k == 10
assert retriever.max_workers == 4
assert retriever.join_mode == "reciprocal_rank_fusion"

        assert retriever.top_k is None

3. Missing release note (AGENTS.md adherence)

The root CLAUDE.md mandates following AGENTS.md, which requires:

Every user-facing PR (not docs, not CI) must include a release note

haystack/AGENTS.md

Lines 53 to 59 in 04abc65

## Release Notes
Every user-facing PR (not docs, not CI) must include a release note:
hatch run release-note SHORT_DESCRIPTION
Edit the generated file in `releasenotes/notes/`. Release notes use reStructuredText formatting; see the [release notes section in CONTRIBUTING.md](CONTRIBUTING.md#release-notes) for details.

This PR adds a new public parameter (top_k_per_retriever) and changes the semantics/default of the existing public top_k parameter on the MultiRetriever component, but no releasenotes/notes/*.yaml file is included. The @_experimental status does not exempt it — prior PRs to this same experimental component (its introduction and the join_mode/RRF addition) both included release notes. Please add one via hatch run release-note ....

@bogdankostic bogdankostic left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks already good in genral, let's just make sure to document the new behavior and update the serialization methods for the new parameter

Comment on lines +123 to +125
are deduplicated and re-scored using RRF, then returned in descending score order. When `top_k` is set, RRF
is always used so the combined results have a consistent global ranking, and only the top `top_k` documents
are returned.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's document this in public facing docstrings, private methods are not in shown in our API reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants