Skip to content

SOLR-15701: Complete configsets api#4264

Open
epugh wants to merge 14 commits intoapache:mainfrom
epugh:complete_configsets_api
Open

SOLR-15701: Complete configsets api#4264
epugh wants to merge 14 commits intoapache:mainfrom
epugh:complete_configsets_api

Conversation

@epugh
Copy link
Copy Markdown
Contributor

@epugh epugh commented Apr 4, 2026

https://issues.apache.org/jira/browse/SOLR-15701

Description

Add Download and GetFile and PutFile to ConfigSets API and SolrJ.

Solution

This was extracted with copilots help from the SchemaDesigner code base. We are going to merge this as part of ConfigSets API. In a future PR, once this is done, we'll finish the currently started SchemaDesigner work to use these APIs.

The PutFile was broken out of the Upload feature into it's own files with it's own parameters. Before it was inter twingled with Upload of a zipped configset.

Tests

Added some tests.

@epugh epugh requested a review from gerlowskija April 4, 2026 00:04
@github-actions github-actions bot added documentation Improvements or additions to documentation tests cat:api labels Apr 4, 2026
Copy link
Copy Markdown
Contributor

@VishnuPriyaChandraSekar VishnuPriyaChandraSekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just started reviewing the Solr code base.
I just left a minor comment other than that, PR seems to be fine.

Copy link
Copy Markdown
Contributor

@gerlowskija gerlowskija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good overall. I left some inline comments, mostly small things.

I do have one larger question here though: I gather much of this functionality already existed under the aegis of the SchemaDesigner, but I don't see any deletions or modifications to those files in this PR. Shouldn't this PR be deleting the "download-configset" and "get-configset-file" APIs that exist over in schema-designer land?

/**
* V2 API definition for downloading an existing configset as a ZIP archive.
*
* <p>Equivalent to GET /api/configsets/{configSetName}/download
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[-0] Not sure what you're trying to say here, but this could use some word-smithing. There's no v1 counterpart to this API which is where I typically see the "equivalent to" language.

Alternately If you're trying to highlight what the API actually looks like, then maybe "Available at" works better? (Or omit altogether since the @Path annotation is a just a line or two below)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh, copy n paste from the old schemadesigner.

@Path("/configsets/{configSetName}")
interface Download {
@GET
@Path("/download")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[-1] /download isn't very REST-ful. The fact that we're fetching/downloading the resource is already kindof implied by the HTTP 'GET' verb/method.

Could we drop it and make this "just" GET /api/configsets/{configsetName}?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we could, and I was thinking about it and then asked Claude for soem suggestions, and he kind of came back to "/download" is easy. Here is the write up:
RESTful Alternatives

Option 1: Keep current design ✅ (Recommended)

GET /api/configsets                          → List (JSON)
GET /api/configsets/{name}/download          → Download ZIP
GET /api/configsets/{name}/{filePath}        → Get single file

Pros:

  • Explicit and unambiguous
  • Self-documenting (/download tells you what it does)
  • No conflicts with file paths
  • Common pattern (GitHub uses /archive, GitLab uses /repository/archive, etc.)

Cons:

  • Not purely resource-oriented REST

Option 2: Use /files sub-resource

GET /api/configsets/{name}                   → Metadata (JSON) - NEW
GET /api/configsets/{name}/files             → Download all as ZIP
GET /api/configsets/{name}/files/{path}      → Get single file

Pros:

  • More RESTful hierarchy
  • Room for metadata endpoint

Cons:

  • Breaking change to existing API
  • More complex paths

Option 3: Content negotiation (Pure REST)

GET /api/configsets/{name}
  Accept: application/json  → Metadata
  Accept: application/zip   → ZIP download

Pros:

  • True REST content negotiation

Cons:

  • Solr doesn't typically use Accept headers this way
  • Harder to use with curl/browsers
  • Still conflicts with /{filePath} pattern

Recommendation

Keep the current design (/download) because:

  1. Unambiguous - no path conflicts
  2. Self-documenting - clear what you get back
  3. Pragmatic - works well without complex content negotiation
  4. Common pattern - many APIs use action-style endpoints for different representations
  5. Backward compatible - if we add metadata later, we can use GET /api/configsets/{name} for JSON

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had another places that we did the zip download, then I could see making them all work the same, with whatever pattern we picked...

@Produces("application/zip")
Response downloadConfigSet(
@PathParam("configSetName") String configSetName,
@QueryParam("displayName") String displayName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] From looking ahead at code in this PR, it looks like 'displayName' is used primarily (solely?) to inform the "attachment name" part of a "Content-Disposition" response header.

Do I have that right? Or am I missing another usage?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is it... I looked a bit at if we really needed it, but for schema designer we do because it makes crazy "temp" names for the configset.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leaned something new! Turns out we can spedifcy the downloaded zip file name from the JavaScript side in modern browsers, which means we can clean this API up!

summary = "Get the contents of a file in a configset.",
tags = {"configsets"})
ConfigSetFileContentsResponse getConfigSetFile(
@PathParam("configSetName") String configSetName, @QueryParam("path") String filePath)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[-0] IMO "filePath" should be a path parameter rather than a query-parameter. That would allow this API to mirror the "update-configset-file" API really well, and also bring it into line with what we've done for other similar APIs in filestore and elsewhere.

To be explicit, I'm suggesting this API be: GET /api/configsets/<configSetName>/files/some/file/path.txt

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting... That mimics the suggestion on haveing a /files/ pattern that Claude gave me re the download. Makes sense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, done with this.

@Operation(
summary = "Get the contents of a file in a configset.",
tags = {"configsets"})
ConfigSetFileContentsResponse getConfigSetFile(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Can you add a bit of context around the decision to return a structured POJO here vs. "just" the verbatim file bytes approach we take in filestore, replication handler, ZooKeeperReadAPI, etc.

The structured POJO route seems to conflict a bit with the @Produces(TEXT_PLAIN) annotation above, unless I'm missing something?

How does this behave if a user puts a small binary file in their configset?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, this ia a damn good question, that looked simple and then got more complex. Yeah, what if you store a model object that is binary? Or an image? I could imagine that in the near future we will have more binary objects that are part of a configset, and that you might put and get individually... for example, some of the NLP stuff has small binary objects that might be in your configset...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, Did a pretty big reworking to bring this in line with other apis in solr... FYI, FileStore still does werid things with wt=raw, but apparently that is because it supports v1 and v2. This only does v2, so easier.

final String fileName = safeName + "_configset.zip";
return Response.ok((StreamingOutput) outputStream -> outputStream.write(zipBytes))
.type("application/zip")
.header("Content-Disposition", "attachment; filename=\"" + fileName + "\"")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] "Content-Disposition" is a browser-based header AFAICT - it's primarily a hint for browsers as to whether something's an attachment or not. Is it worth having here? Why would we include such a header here, but not on (e.g.) the filestore API or in ReplicationHandler, or somewhere similar that's downloading file content?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to confirm that this doesn't cause any issues with how Schema Designer works.. Which I think may be the only one who cares about this, versus the filestore api etc. You know, I may back it out until we migrate schema signer over, and if it needs it, we can add it back in, or have a discsussion...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, turns out this is a older way of handling things, modern browsers take what the schema-designer.js specifies as the filename., so we don't need this at all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also addresses an earlier ocmment of yours!

import org.junit.Test;

/** Unit tests for {@link UploadConfigSet}. */
public class UploadConfigSetAPITest extends SolrTestCase {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Can you share a bit of context around this test file please? I'm not going to complain about new tests, but I'm surprised to find that a third of this PR is tests for an API that wasn't touched by the PR 😛

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, so I was messing around with this PR and SOLR-16341 (blank file in configset) and realized this was lacking and fit in the "complete".

[[configsets-download]]
== Download a Configset

The `download` command downloads an entire configset as a zipped file.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[0] The "command" language here is unfortunate. It fits the v1 APIs and their action=LIST|UPLOAD|CREATE|etc syntax well, but doesn't make sense for the v2 API IMO.

Maybe something like the following would be a bit clearer:

The v2 API allows configsets to be downloaded as a single zipped file. This is useful for backing up configsets, sharing ...
The download API takes the following parameters:

[[configsets-get-file]]
== Get a Single File from a Configset

This command retrieves the contents of a single file from an existing configset.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, re: replacing "command" with "API" or less v1 specific language

+
The path to the file within the configset (e.g., `solrconfig.xml` or `lang/stopwords_en.txt`).

The response will be a JSON object containing:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[-0] Why "JSON" here? v2 APIs in general support javabin, xml, etc. in addition to JSON. Is this API specifically JSON-only in some way? If not, then we might not want to be overly specific here. Maybe drop the word "JSON" and leave the rest as-is?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, dealt wiht in the other refacotr...

@epugh
Copy link
Copy Markdown
Contributor Author

epugh commented Apr 11, 2026

I have just started reviewing the Solr code base. I just left a minor comment other than that, PR seems to be fine.

@VishnuPriyaChandraSekar by the way, we need more code reviewers for the PR's in Solr. With AI, it's easier then ever to generate code, but reviewing it really isn't any easier. It would be great if you reviewed more PRs, as that would help contributors get some early feedback! I believe you can use the Code Review feature and approve a PR, even if you aren't a committer, you just can't merge.

@epugh
Copy link
Copy Markdown
Contributor Author

epugh commented Apr 11, 2026

Lots of progress, I think I am down to doc fixes.

@epugh
Copy link
Copy Markdown
Contributor Author

epugh commented Apr 12, 2026

Working through the docs.. I am going to split uploading a complete configset from the putting of a single file in an existing configset. Right now the methods are on the same upload interface, and it's a bit confusing.

@epugh
Copy link
Copy Markdown
Contributor Author

epugh commented Apr 12, 2026

@gerlowskija I couldn't help it, I also found some remnants of the trusted configset concept that i could remove in this... Simplified osme of the tests!

I think I'm ready for final review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:api cat:cloud documentation Improvements or additions to documentation tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants