Remove n+1 fields types from Repositories#28150
Conversation
| private java.util.Set<String> expandableFields() { | ||
| Set<String> childFields = childCollectionFields(); | ||
| if (childFields == null || childFields.isEmpty()) { | ||
| return allowedFields; | ||
| } | ||
| return allowedFields.stream() | ||
| .filter(f -> !childFields.contains(f)) | ||
| .collect(java.util.stream.Collectors.toCollection(java.util.LinkedHashSet::new)); | ||
| } | ||
|
|
||
| /** | ||
| * Repositories override to declare fields that represent unbounded child-entity collections | ||
| * (e.g. {@code tables} on {@code DatabaseSchema}, {@code apiEndpoints} on | ||
| * {@code APICollection}). These are excluded from `fields=*` expansion to prevent OOMs on | ||
| * parents with very large child counts. Returns an empty set by default. |
There was a problem hiding this comment.
💡 Quality: Fully qualified class names used despite existing imports
Multiple new methods use fully qualified class names (java.util.Set, java.util.Collections, java.util.stream.Collectors, java.util.LinkedHashSet) even though these are already imported in the respective files. This applies to expandableFields() and childCollectionFields() in EntityRepository.java, the overrides in ChartRepository, ServiceEntityRepository, TeamRepository, and the fetchChildRefsForIndexing method in VectorDocBuilder.java. Per project rules: 'No fully qualified names.'
Use the already-imported class names (Set, Collections, Collectors, LinkedHashSet) instead of fully qualified references.:
// In EntityRepository.java:
private Set<String> expandableFields() {
Set<String> childFields = childCollectionFields();
if (childFields == null || childFields.isEmpty()) {
return allowedFields;
}
return allowedFields.stream()
.filter(f -> !childFields.contains(f))
.collect(Collectors.toCollection(LinkedHashSet::new));
}
protected Set<String> childCollectionFields() {
return Collections.emptySet();
}
- Apply fix
Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎
| } catch (Exception ex) { | ||
| LOG.warn( | ||
| "Failed to fetch child refs for semantic indexing of {} {}: {}", | ||
| entity.getEntityReference() != null ? entity.getEntityReference().getType() : "?", | ||
| entity.getFullyQualifiedName(), | ||
| ex.getMessage()); | ||
| return Collections.emptyList(); | ||
| } |
There was a problem hiding this comment.
⚠️ Edge Case: VectorDocBuilder swallows all exceptions silently during indexing
The fetchChildRefsForIndexing method catches Exception broadly and returns an empty list with only a WARN log. If the repository or filter is misconfigured (e.g., a typo in parentFilterKey), the semantic indexing will silently produce incomplete embeddings for all affected entities with no clear signal to operators. This can degrade search quality without any visibility. At minimum, this should log at ERROR level for unexpected exceptions (not just entity-not-found scenarios), or rethrow non-recoverable errors.
Distinguish expected (entity not found) from unexpected exceptions, logging the latter at ERROR with full stack trace for operator visibility.:
} catch (EntityNotFoundException ex) {
LOG.debug("No children found for {} {}: {}",
entity.getEntityReference() != null ? entity.getEntityReference().getType() : "?",
entity.getFullyQualifiedName(), ex.getMessage());
return Collections.emptyList();
} catch (Exception ex) {
LOG.error("Failed to fetch child refs for semantic indexing of {} {}",
entity.getEntityReference() != null ? entity.getEntityReference().getType() : "?",
entity.getFullyQualifiedName(), ex);
return Collections.emptyList();
}
- Apply fix
Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎
✅ TypeScript Types Auto-UpdatedThe generated TypeScript types have been automatically updated based on JSON schema changes in this PR. |
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent OOMs and reduce unbounded payload materialization by deprecating embedded “child collection” fields on several entities (e.g., schemas on databases, tables on schemas, charts on dashboards) and replacing them with lightweight *Count fields plus paginated listing endpoints/filters.
Changes:
- Updates JSON schemas to deprecate embedded child lists and introduces computed
*Countfields (e.g.,schemaCount,tableCount,chartCount,endpointCount). - Changes backend
fields=*expansion to exclude repository-declared child-collection fields, adds new list filters (e.g., charts by dashboard), and updates repositories/resources to serve counts instead of embedded lists. - Updates/extends integration tests to assert wildcard fetches do not materialize large child collections.
Reviewed changes
Copilot reviewed 32 out of 38 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-ui/src/main/resources/ui/src/utils/DashboardDetailsUtils.tsx | Updates dashboard default fields to request counts instead of embedded chart/dataModel lists. |
| openmetadata-ui/src/main/resources/ui/src/enums/entity.enum.ts | Adds TabSpecificField enum entries for new *Count fields. |
| openmetadata-spec/src/main/resources/json/schema/entity/data/databaseSchema.json | Deprecates tables[], adds tableCount. |
| openmetadata-spec/src/main/resources/json/schema/entity/data/database.json | Deprecates databaseSchemas[], adds schemaCount. |
| openmetadata-spec/src/main/resources/json/schema/entity/data/dashboard.json | Limits embedded lists to explicit requests and adds chartCount/dataModelCount. |
| openmetadata-spec/src/main/resources/json/schema/entity/data/container.json | Deprecates children[], adds childrenCount. |
| openmetadata-spec/src/main/resources/json/schema/entity/data/chart.json | Deprecates dashboards[], adds dashboardCount. |
| openmetadata-spec/src/main/resources/json/schema/entity/data/apiCollection.json | Limits embedded endpoints to explicit requests and adds endpointCount. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/vector/VectorDocBuilder.java | Switches semantic child context to fetch children via repositories/filters rather than embedded lists. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/storages/ContainerResource.java | Adds parent query param for container listing to support child pagination. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/domains/DomainResource.java | Adds parent query param to domain listing. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/databases/DatabaseSchemaResource.java | Replaces default FIELDS/view ops from tables to tableCount. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/databases/DatabaseResource.java | Replaces default FIELDS/view ops from databaseSchemas to schemaCount. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/dashboards/DashboardResource.java | Replaces default FIELDS/view ops from embedded lists to counts. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/charts/ChartResource.java | Updates default FIELDS and adds dashboard filter for listing charts by dashboard. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/apis/APICollectionResource.java | Replaces default FIELDS/view ops from apiEndpoints to endpointCount. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TeamRepository.java | Excludes users from fields=* expansion. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ServiceEntityRepository.java | Excludes pipelines from fields=* expansion. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ListFilter.java | Adds dashboard→charts filtering condition. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java | Implements fields=* expansion control via childCollectionFields(). |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseSchemaRepository.java | Stops materializing tables[]; computes tableCount. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseRepository.java | Stops materializing databaseSchemas[]; computes schemaCount. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DashboardRepository.java | Excludes embedded lists from * and adds count computation. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContainerRepository.java | Adds childrenCount and prevents children[] materialization. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java | Adds relationship count helpers used by new count fields. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ChartRepository.java | Stops materializing dashboards[]; computes dashboardCount. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/APICollectionRepository.java | Adds endpointCount computation and excludes endpoints from fields=*. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/DatabaseSchemaResourceIT.java | Updates bulk-fetch tests and adds wildcard regression test for tables[] OOM. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/DatabaseResourceIT.java | Adds wildcard regression test for databaseSchemas[] OOM. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/DashboardResourceIT.java | Adds wildcard regression test to exclude embedded lists and validate counts. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/ChartResourceIT.java | Adds wildcard regression test for excluding dashboards[] and validating dashboardCount. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/APICollectionResourceIT.java | Adds wildcard regression test for excluding apiEndpoints[] and validating endpointCount. |
Comments suppressed due to low confidence (2)
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/APICollectionRepository.java:171
endpointCountis only set insetFields(...), but list responses go throughsetFieldsInBulk(...)/fetchAndSetFields(...). Since this repository does not register a bulk field fetcher forendpointCount,GET /v1/apiCollections?fields=endpointCount(and the defaultFIELDSvalue) will return null counts. Add afieldFetchers.put("endpointCount", ...)implementation (ideally using a batched relationship count) or overridesetFieldsInBulkto populate it.
@Override
public void setFieldsInBulk(Fields fields, List<APICollection> entities) {
if (entities == null || entities.isEmpty()) {
return;
}
// Bulk fetch and set service for all API collections first
fetchAndSetServices(entities);
// Then call parent's implementation which handles standard fields
super.setFieldsInBulk(fields, entities);
}
openmetadata-service/src/main/java/org/openmetadata/service/search/vector/VectorDocBuilder.java:588
fetchChildRefsForIndexingcallsrepo.listAfter(...), which always computes alistCount(...)total before fetching the page. On the indexing path this adds an extra COUNT query per parent entity (and pollutesListCountCache) even though you only need up toMAX_CHILD_NAMES_IN_CONTEXTrefs. Consider using the DAO'slistAfter(...)directly (no total) or adding a lightweight repository method that skips the total-count calculation for this indexing use case.
org.openmetadata.service.jdbi3.EntityRepository<?> repo =
Entity.getEntityRepository(spec.childEntityType());
org.openmetadata.service.jdbi3.ListFilter filter =
new org.openmetadata.service.jdbi3.ListFilter(
org.openmetadata.schema.type.Include.NON_DELETED)
.addQueryParam(spec.parentFilterKey(), entity.getFullyQualifiedName());
var page = repo.listAfter(null, repo.getFields(""), filter, MAX_CHILD_NAMES_IN_CONTEXT, null);
List<EntityReference> refs = new ArrayList<>(page.getData().size());
|
|
||
| // eslint-disable-next-line max-len | ||
| export const defaultFields = `${TabSpecificField.DOMAINS},${TabSpecificField.OWNERS}, ${TabSpecificField.FOLLOWERS}, ${TabSpecificField.TAGS}, ${TabSpecificField.CHARTS},${TabSpecificField.VOTES},${TabSpecificField.DATA_PRODUCTS},${TabSpecificField.EXTENSION}`; | ||
| export const defaultFields = `${TabSpecificField.DOMAINS},${TabSpecificField.OWNERS},${TabSpecificField.FOLLOWERS},${TabSpecificField.TAGS},${TabSpecificField.CHART_COUNT},${TabSpecificField.DATA_MODEL_COUNT},${TabSpecificField.VOTES},${TabSpecificField.DATA_PRODUCTS},${TabSpecificField.EXTENSION}`; |
| String fieldsParam, | ||
| @Parameter( | ||
| description = | ||
| "Filter Domains by parent Domain FQN. Returns only direct sub-domains of the given parent. Omit to list all root-level domains plus their descendants (the legacy behavior).", |
| * direct children via the child DAO, identified by {@code childEntityType} filtered by | ||
| * {@code parentFilterKey} = parent's FQN. {@code DATA_PRODUCT} assets remain on the entity | ||
| * (many-to-many, bounded) and use {@code embeddedGetter}. |
| "chartCount": { | ||
| "description": "Number of charts linked to this dashboard. Computed on demand when `chartCount` is requested in `fields`.", | ||
| "type": "integer", | ||
| "default": null | ||
| }, | ||
| "dataModels": { | ||
| "description": "List of data models used by this dashboard or the charts contained on it.", | ||
| "description": "Dashboard data models on this Dashboard. Populated only when `fields=dataModels` is explicitly requested. Excluded from `fields=*` expansion.", | ||
| "$ref": "../../type/entityReferenceList.json", | ||
| "default": null | ||
| }, | ||
| "dataModelCount": { | ||
| "description": "Number of dashboard data models linked to this dashboard. Computed on demand when `dataModelCount` is requested in `fields`.", | ||
| "type": "integer", | ||
| "default": null | ||
| }, |
| private void fetchAndSetChartCounts(List<Dashboard> dashboards, Fields fields) { | ||
| if (!fields.contains("chartCount") || dashboards == null || dashboards.isEmpty()) { | ||
| return; | ||
| } | ||
| for (Dashboard dashboard : dashboards) { | ||
| dashboard.setChartCount(getChartCount(dashboard)); | ||
| } | ||
| } | ||
|
|
||
| private void fetchAndSetDataModelCounts(List<Dashboard> dashboards, Fields fields) { | ||
| if (!fields.contains("dataModelCount") || dashboards == null || dashboards.isEmpty()) { | ||
| return; | ||
| } | ||
| for (Dashboard dashboard : dashboards) { | ||
| dashboard.setDataModelCount(getDataModelCount(dashboard)); | ||
| } | ||
| } |
🔴 Playwright Results — 19 failure(s), 12 flaky✅ 4096 passed · ❌ 19 failed · 🟡 12 flaky · ⏭️ 107 skipped
Genuine Failures (failed on all attempts)❌
|
| const initializeCharts = useCallback(async () => { | ||
| if (!dashboardDetails?.fullyQualifiedName) { | ||
| return; | ||
| } | ||
| try { | ||
| const res = await fetchCharts( | ||
| listChartIds, | ||
| chartFilters.showDeletedCharts | ||
| // Charts are no longer embedded on the dashboard entity — fetch via the dedicated | ||
| // listing endpoint filtered by dashboard FQN. The full chart objects (incl. tags) | ||
| // are returned in one paginated call instead of N individual lookups. | ||
| const res = await getChartsByDashboard( | ||
| dashboardDetails.fullyQualifiedName, | ||
| TabSpecificField.TAGS, | ||
| undefined, | ||
| chartFilters.showDeletedCharts ? Include.Deleted : Include.NonDeleted | ||
| ); | ||
| setCharts(res); | ||
| setCharts((res.data as unknown as ChartType[]) ?? []); |
There was a problem hiding this comment.
🚨 Bug: DashboardChartTable only loads first 10 charts, no pagination
The initializeCharts function calls getChartsByDashboard with the default limit = PAGE_SIZE (10) and never fetches subsequent pages. The Table component is rendered with pagination={false}. Previously, the component received all chart references from the embedded dashboard.charts list and fetched each one individually (N+1 but complete). Now, dashboards with more than 10 charts will silently truncate the chart table to only 10 entries.
This is a user-visible data loss regression — any dashboard with >10 charts will appear incomplete.
Paginate through all chart pages using the after cursor returned by the API, with a larger per-page limit to reduce round-trips.:
const initializeCharts = useCallback(async () => {
if (!dashboardDetails?.fullyQualifiedName) {
return;
}
try {
let allCharts: ChartType[] = [];
let after: string | undefined;
do {
const res = await getChartsByDashboard(
dashboardDetails.fullyQualifiedName,
TabSpecificField.TAGS,
after ? { after } : undefined,
chartFilters.showDeletedCharts ? Include.Deleted : Include.NonDeleted,
100
);
allCharts = [...allCharts, ...(res.data as unknown as ChartType[])];
after = res.paging?.after;
} while (after);
setCharts(allCharts);
} catch (error) {
showErrorToast(error as AxiosError, ...);
}
}, [dashboardDetails?.fullyQualifiedName, chartFilters.showDeletedCharts]);
- Apply fix
Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎
❌ UI Checkstyle Failed❌ ESLint + Prettier + Organise Imports (src)One or more source files have linting or formatting issues. Affected files
Fix locally (fast — only checks files changed in this branch): make ui-checkstyle-changed |
✅ TypeScript Types Auto-UpdatedThe generated TypeScript types have been automatically updated based on JSON schema changes in this PR. |
| const res = await getChartsByDashboard( | ||
| dashboardDetails.fullyQualifiedName, | ||
| TabSpecificField.TAGS, | ||
| undefined, | ||
| chartFilters.showDeletedCharts ? Include.Deleted : Include.NonDeleted | ||
| ); | ||
| setCharts(res); | ||
| setCharts((res.data as unknown as ChartType[]) ?? []); | ||
| } catch (error) { |
| // eslint-disable-next-line max-len | ||
| export const defaultFields = `${TabSpecificField.DOMAINS},${TabSpecificField.OWNERS}, ${TabSpecificField.FOLLOWERS}, ${TabSpecificField.TAGS}, ${TabSpecificField.CHARTS},${TabSpecificField.VOTES},${TabSpecificField.DATA_PRODUCTS},${TabSpecificField.EXTENSION}`; | ||
| export const defaultFields = `${TabSpecificField.DOMAINS},${TabSpecificField.OWNERS},${TabSpecificField.FOLLOWERS},${TabSpecificField.TAGS},${TabSpecificField.VOTES},${TabSpecificField.DATA_PRODUCTS},${TabSpecificField.EXTENSION}`; | ||
|
|
| @@ -143,8 +149,11 @@ public ResultList<Domain> list( | |||
| schema = @Schema(type = "string")) | |||
| @QueryParam("after") | |||
| String after) { | |||
| return listInternal( | |||
| uriInfo, securityContext, fieldsParam, new ListFilter(null), limitParam, before, after); | |||
| ListFilter filter = new ListFilter(null); | |||
| if (parent != null && !parent.isBlank()) { | |||
| filter.addQueryParam("parent", parent); | |||
| } | |||
| return listInternal(uriInfo, securityContext, fieldsParam, filter, limitParam, before, after); | |||
| * direct children via the child DAO, identified by {@code childEntityType} filtered by | ||
| * {@code parentFilterKey} = parent's FQN. {@code DATA_PRODUCT} assets remain on the entity |
Code Review 🚫 Blocked 1 resolved / 4 findingsRefactors repositories to remove N+1 field materialization and bulk counts, addressing inefficient data loading. However, the implementation is blocked by a critical pagination defect in 🚨 Bug: DashboardChartTable only loads first 10 charts, no pagination📄 openmetadata-ui/src/main/resources/ui/src/components/Dashboard/DashboardChartTable/DashboardChartTable.tsx:129-143 📄 openmetadata-ui/src/main/resources/ui/src/rest/chartsAPI.ts:86-100 The This is a user-visible data loss regression — any dashboard with >10 charts will appear incomplete. Paginate through all chart pages using the `after` cursor returned by the API, with a larger per-page limit to reduce round-trips.
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
|



Describe your changes:
Fixes #
I worked on ... because ...
Type of change:
High-level design:
N/A — small change.
Tests:
Use cases covered
Unit tests
Backend integration tests
Ingestion integration tests
Playwright (UI) tests
Manual testing performed
UI screen recording / screenshots:
Not applicable.
Checklist:
Fixes <issue-number>: <short explanation>Fixes #<issue-number>above.Summary by Gitar
users,apiEndpoints,dashboards,charts) in parent repositories to prevent OOM errors.schemaCount,chartCount) in favor of paginated listing endpoints.Team,APICollection,Chart, andDashboardrepositories to return null for previously embedded child collections.GET /v1/users?team={fqn}) for all child entity access.DashboardChartTableto fetch charts viagetChartsByDashboardrather than reading fromdashboard.charts.This will update automatically on new commits.