Skip to content

replace fan-out JOINs with EXISTS subquery for MapAttribute RSQL filt…#3088

Open
vasilchev wants to merge 1 commit into
eclipse-hawkbit:masterfrom
boschglobal:rsql/improvements
Open

replace fan-out JOINs with EXISTS subquery for MapAttribute RSQL filt…#3088
vasilchev wants to merge 1 commit into
eclipse-hawkbit:masterfrom
boschglobal:rsql/improvements

Conversation

@vasilchev
Copy link
Copy Markdown
Contributor

@vasilchev vasilchev commented May 18, 2026

Problem

RSQL queries filtering targets by map attributes (controller attributes, software module metadata) with multiple AND conditions produce catastrophically large intermediate result sets that cause queries to hang indefinitely in production.

Reported query (attribute.sw_1_version=out=(...) and attribute.sw_2_version=out=(...) and attribute.sw_3_version=out=(...)) generated this SQL:

  SELECT DISTINCT COUNT(DISTINCT(t0.id))
  FROM sp_target t0
    LEFT OUTER JOIN sp_target_attributes t2 ON (t2.target = t0.id)
    LEFT OUTER JOIN sp_target_attributes t4 ON (t4.target = t0.id)
    LEFT OUTER JOIN sp_target_attributes t6 ON (t6.target = t0.id)
  , sp_target_attributes t5
  , sp_target_attributes t3
  , sp_target_attributes t1
  WHERE (t5.attribute_value IS NULL OR t5.attribute_value NOT IN (...))
    AND (t3.attribute_value IS NULL OR t3.attribute_value NOT IN (...))
    AND (t1.attribute_value IS NULL OR t1.attribute_value NOT IN (...))
    AND t5.target = t0.id AND t5.attribute_key = 'sw_1_version'
    AND t3.target = t0.id AND t3.attribute_key = 'sw_2_version'
    AND t1.target = t0.id AND t1.attribute_key = 'sw_3_version'

This affects any map-typed RSQL field: attribute., metadata., or any custom @ElementCollection Map<String,String> field.

Replace all MapAttribute non-null operator handling with a correlated EXISTS subquery:

The compare() method handles all operators:

  • EQ / IN / LIKE / GT / GTE / LT / LTE → positive predicate inside EXISTS
  • NE / NOT_IN / NOT_LIKE → IS NULL OR inside EXISTS

Semantics preserved: INNER JOIN inside the subquery means targets without the attribute key are excluded — identical to the previous behaviour.

Unaffected: =is=null / =not=null checks (handled by a separate code path using getJoinOn() with LEFT JOIN), SetAttribute (tag, etc.), singular fields, entity references.


SQL comparison

Single/direct attribute.key!=value

-- BEFORE

SELECT DISTINCT t.* FROM target t
   LEFT JOIN attrs ghost ON ghost.target = t.id JOIN attrs a ON a.target = t.id AND a.key = ? WHERE a.value IS NULL OR a.value <> ?

-- AFTER

  SELECT DISTINCT t.* FROM target t
  WHERE EXISTS (
      SELECT 1 FROM attrs a
      WHERE a.target = t.id AND a.key = ?
      AND (a.value IS NULL OR a.value <> ?)
  )

Multiple different key AND conditions attribute.k1=out=(...) and attribute.k2=out=(...) and attribute.k3=out=(...)

-- BEFORE (Hibernate; even worse on EclipseLink which produces comma cross-joins)

  SELECT DISTINCT t.* FROM target t
    LEFT JOIN attrs g1 ON g1.target = t.id JOIN attrs a1 ON a1.target = t.id AND a1.key = ? LEFT JOIN attrs g2 ON g2.target = t.id JOIN attrs a2 ON a2.target = t.id AND a2.key = ? LEFT JOIN attrs g3 ON g3.target = t.id JOIN attrs a3 ON a3.target = t.id AND a3.key = ? WHERE (a1.value IS NULL OR a1.value NOT IN (...)) AND (a2.value IS NULL OR a2.value NOT IN (...)) AND (a3.value IS NULL OR a3.value NOT IN (...)) 

-- Intermediate rows per target: 100 (g1) × 100 (g2) × 100 (g3) = 1,000,000

-- AFTER

  SELECT DISTINCT t.* FROM target t
  WHERE EXISTS (SELECT 1 FROM attrs WHERE target=t.id AND key=? AND value NOT IN (...))
    AND EXISTS (SELECT 1 FROM attrs WHERE target=t.id AND key=? AND value NOT IN (...)) AND EXISTS (SELECT 1 FROM attrs WHERE target=t.id AND key=? AND value NOT IN (...))

-- Each EXISTS: 1 PK index lookup. Total: O(N_targets × 3)

Multiple Different AND conditions attribute.k1==v1 and attribute.k2==v2 and attribute.k3==v3 (positive operators)

-- BEFORE

  SELECT DISTINCT t.* FROM target t
    LEFT JOIN attrs a1 ON a1.target = t.id LEFT JOIN attrs a2 ON a2.target = t.id LEFT JOIN attrs a3 ON a3.target = t.id WHERE a1.key=? AND a1.value=? AND a2.key=? AND a2.value=? AND a3.key=? AND a3.value=?

-- Intermediate rows per target: 100^3 = 1,000,000

-- AFTER

  SELECT DISTINCT t.* FROM target t
  WHERE EXISTS (SELECT 1 FROM attrs WHERE target=t.id AND key=? AND value=?)
    AND EXISTS (SELECT 1 FROM attrs WHERE target=t.id AND key=? AND value=?) AND EXISTS (SELECT 1 FROM attrs WHERE target=t.id AND key=? AND value=?) 

-- 3 PK index lookups

…ering

  Problem

  RSQL queries filtering targets by map attributes (controller attributes, software module metadata) with multiple AND conditions produce catastrophically large intermediate result sets that cause queries to hang indefinitely in production.

  Reported query (attribute.sw_1_version=out=(...) and attribute.sw_2_version=out=(...) and attribute.sw_3_version=out=(...)) generated this SQL:

  SELECT DISTINCT COUNT(DISTINCT(t0.id))
  FROM sp_target t0
    LEFT OUTER JOIN sp_target_attributes t2 ON (t2.target = t0.id)
    LEFT OUTER JOIN sp_target_attributes t4 ON (t4.target = t0.id)
    LEFT OUTER JOIN sp_target_attributes t6 ON (t6.target = t0.id)
  , sp_target_attributes t5
  , sp_target_attributes t3
  , sp_target_attributes t1
  WHERE (t5.attribute_value IS NULL OR t5.attribute_value NOT IN (...))
    AND (t3.attribute_value IS NULL OR t3.attribute_value NOT IN (...))
    AND (t1.attribute_value IS NULL OR t1.attribute_value NOT IN (...))
    AND t5.target = t0.id AND t5.attribute_key = 'sw_1_version'
    AND t3.target = t0.id AND t3.attribute_key = 'sw_2_version'
    AND t1.target = t0.id AND t1.attribute_key = 'sw_3_version'

  This affects any map-typed RSQL field: attribute.*, metadata.*, or any custom @ElementCollection Map<String,String> field.

  Replace all MapAttribute non-null operator handling with a correlated EXISTS subquery:

  The compare() method handles all operators:
  - EQ / IN / LIKE / GT / GTE / LT / LTE → positive predicate inside EXISTS
  - NE / NOT_IN / NOT_LIKE → IS NULL OR <negated predicate> inside EXISTS

  Semantics preserved: INNER JOIN inside the subquery means targets without the attribute key are excluded — identical to the previous behaviour.

  Unaffected: =is=null / =not=null checks (handled by a separate code path using getJoinOn() with LEFT JOIN), SetAttribute (tag, etc.), singular fields, entity references.

  ---
  SQL comparison

  Single condition attribute.key!=value

  -- BEFORE
  SELECT DISTINCT t.* FROM target t
    LEFT JOIN attrs ghost ON ghost.target = t.id
    JOIN attrs a ON a.target = t.id AND a.key = ?
  WHERE a.value IS NULL OR a.value <> ?

  -- AFTER
  SELECT DISTINCT t.* FROM target t
  WHERE EXISTS (
      SELECT 1 FROM attrs a
      WHERE a.target = t.id AND a.key = ?
      AND (a.value IS NULL OR a.value <> ?)
  )

  Three AND conditions attribute.k1=out=(...) and attribute.k2=out=(...) and attribute.k3=out=(...)

  -- BEFORE (Hibernate; even worse on EclipseLink which produces comma cross-joins)
  SELECT DISTINCT t.* FROM target t
    LEFT JOIN attrs g1 ON g1.target = t.id
    JOIN attrs a1 ON a1.target = t.id AND a1.key = ?
    LEFT JOIN attrs g2 ON g2.target = t.id
    JOIN attrs a2 ON a2.target = t.id AND a2.key = ?
    LEFT JOIN attrs g3 ON g3.target = t.id
    JOIN attrs a3 ON a3.target = t.id AND a3.key = ?
  WHERE (a1.value IS NULL OR a1.value NOT IN (...))
    AND (a2.value IS NULL OR a2.value NOT IN (...))
    AND (a3.value IS NULL OR a3.value NOT IN (...))
  -- Intermediate rows per target: 100 (g1) × 100 (g2) × 100 (g3) = 1,000,000

  -- AFTER
  SELECT DISTINCT t.* FROM target t
  WHERE EXISTS (SELECT 1 FROM attrs WHERE target=t.id AND key=? AND value NOT IN (...))
    AND EXISTS (SELECT 1 FROM attrs WHERE target=t.id AND key=? AND value NOT IN (...))
    AND EXISTS (SELECT 1 FROM attrs WHERE target=t.id AND key=? AND value NOT IN (...))
  -- Each EXISTS: 1 PK index lookup. Total: O(N_targets × 3)

  Three AND conditions attribute.k1==v1 and attribute.k2==v2 and attribute.k3==v3 (positive operators)

  -- BEFORE
  SELECT DISTINCT t.* FROM target t
    LEFT JOIN attrs a1 ON a1.target = t.id
    LEFT JOIN attrs a2 ON a2.target = t.id
    LEFT JOIN attrs a3 ON a3.target = t.id
  WHERE a1.key=? AND a1.value=?
    AND a2.key=? AND a2.value=?
    AND a3.key=? AND a3.value=?
  -- Intermediate rows per target: 100^3 = 1,000,000

  -- AFTER
  SELECT DISTINCT t.* FROM target t
  WHERE EXISTS (SELECT 1 FROM attrs WHERE target=t.id AND key=? AND value=?)
    AND EXISTS (SELECT 1 FROM attrs WHERE target=t.id AND key=? AND value=?)
    AND EXISTS (SELECT 1 FROM attrs WHERE target=t.id AND key=? AND value=?)
  -- 3 PK index lookups
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant