MDEV-39014: FULL JOIN Phase 2#4940
Conversation
DaveGosselin-MariaDB
commented
Apr 14, 2026
|
Putting the NULL-complemented record generation into the branch where end_of_records=1 seems wrong. create table t10 (a int, b int, index(a));
create table t11 (a int, b int, index(a));
insert into t10 select seq, seq from seq_1_to_10;
insert into t11 select seq*2, seq*2 from seq_1_to_10;create table t20 (a varchar(100), b varchar(100), index(a));
create table t21 (a varchar(100), b varchar(100), index(a));
insert into t20 values('match','match'), ('no-match-t20', 'no-match-t20');
insert into t21 values('match','match'), ('no-match-t21', 'no-match-t21');Building block one (no error) here: Building block two (all is fine here, too): The probllem query: select * from (t10 full outer join t11 on t10.a=t11.a) , (t20 full outer join t21 on t20.a=t21.a);The row combination with (NULL-NULL-12-12-match-match-match-match) is missing, along with "similar ones" (imprecise wording but hopefully it's clear) Similar question here. This one is evaluated at the end of execution so here we get the |
|
What about possible join orders? Take t10 and t11 from the previous testcase. The join order is This happens because with empty join prefix,JOIN::get_allowed_nj_tables() calls But why would 'two' not be allowed as the first table? |
|
(I think I wrote this before somewhere but writing here to not forget). We should consider putting some FULL OUTER JOIN code into its own file(s). Some logic is of course all over the place in |
I agree with this. Let's shake out all the other changes first, then make that code movement be the last step. |
|
Let's try this also: |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements support for FULL [OUTER] JOIN and NATURAL FULL JOIN, including parser updates, optimizer logic for null-complementation, and extensive test coverage. Feedback identifies a logic error in the NATURAL FULL JOIN column coalescing loop that skips the last element and an incorrect implementation of the peek_ref() iterator method. Additionally, it is recommended to maintain the original bit value for the JOIN_TYPE_OUTER constant to avoid breaking existing logic.
|
Consider this: create table t1 (
a int,
b int,
index(a),
index(b)
);
create table t2 like t1;
insert into t1 select seq, seq from seq_1_to_100;
insert into t2 select seq, seq from seq_95_to_195;(Cross-database script: https://gist.github.com/spetrunia/43f3df610e5cbcd15a2f50e465edfb43) explain
select * from t1 full outer join t2 on (t1.a=t2.a and t1.b>90 and t2.b<110)gives So, we will use range access for table t1. Running the query, I see That is, the row is not there. |
After the "LEFT JOIN" pass completes, we start a second pass to generate the null-complement rows. Walking the At the moment I'm writing this, the current tip of this branch allows you to use For now, I think it's best to emit a In general, to summarize a bit of the above:
|
Hi @spetrunia , with a bit of hacking on my end I can see how we can lift this up from the |
|
@spetrunia hmm perhaps lifting the null-complement pass out of |
|
A question about NATURAL JOIN processing. create table t30 (
a int not null,
t30val varchar(32)
);
insert into t30 values
('1', 't30-1'),
('2', 't30-2-nomatch');
create table t31 (
a int not null,
t31val varchar(32)
);
insert into t31 values
('1', 't31-1'),
('3', 't31-3-nomatch');Correct. Incorrect. It seems, |
ebcd308 to
0629a21
Compare
Syntax support for FULL JOIN, FULL OUTER JOIN, NATURAL FULL JOIN, and NATURAL FULL OUTER JOIN in the parser. While we accept full join syntax, such joins are not yet supported. Queries specifying any of the above joins will fail with ER_NOT_SUPPORTED_YET.
Allow FULL OUTER JOIN queries to proceed through name resolution. Permits limited EXPLAIN EXTENDED support so tests can prove that the JOIN_TYPE_* table markings are reflected when the query is echoed back by the server. This happens in at least two places: via a Warning message during EXPLAIN EXTENDED and during VIEW .frm file creation. While the query plan output is mostly meaningless at this point, this limited EXPLAIN support improves the SELECT_LEX print function for the new JOIN types. TODO: fix PS protocol before end of FULL OUTER JOIN development
Rewrite FULL OUTER JOIN queries as either LEFT, RIGHT, or INNER JOIN by checking if and how the WHERE clause rejects nulls. For example, the following two queries are equivalent because the WHERE condition rejects nulls from the left table and allows matches in the right table (or NULL from the right table) for the remaining rows: SELECT * FROM t1 FULL JOIN t2 ON t1.v = t2.v WHERE t1.v IS NOT NULL; SELECT * FROM t1 LEFT JOIN t2 ON t1.v = t2.v; SELECT * FROM t1 FULL JOIN t2 ON t1.v = t2.v WHERE t1.a=t2.a; SELECT * FROM t1 INNER JOIN t2 ON t1.v = t2.v WHERE t1.a=t2.a;
FULL JOIN yields result sets with columns from both tables participating in
the join (for the sake of explanation, assume base tables). However,
NATURAL FULL JOIN should show unique columns in the output.
Given the following query:
SELECT * FROM t1 NATURAL JOIN t2;
transform it into:
SELECT COALESCE(t1.f_1, t2.f_1), ..., COALESCE(t1.f_n, t2.f_n) FROM
t1 NATURAL JOIN t2;
This change applies only in the case of NATURAL FULL JOIN. Otherwise,
NATURAL JOINs work as they have in the past, which is using columns
from the left table for the resulting column set.
Prevent elimination of tables participating in a FULL OUTER JOIN during eliminate_tables as part of phase one FULL OUTER JOIN development. Move the functionality gate for FULL JOIN further into the codebase: convert LEX::has_full_outer_join to a counter so we can see how many FULL JOINs remain which makes the gate work correctly after simplify_joins and eliminate_tables are called. Fixes an old bug where, when running the server as a debug build and in debug mode, a null pointer deference in Dep_analysis_context::dbug_print_deps would cause a crash.
Move the temporary gate against FULL OUTER JOIN deeper into the codebase, which causes the FULL OUTER JOIN query plans to have more relevant information (hence the change). In some cases, the join order of nested INNER JOINs within the FULL OUTER JOIN changed. Small cleanups in get_sargable_cond ahead of the feature work in the next commit.
Fetches the ON condition from the FULL OUTER JOIN as the sargable condition. We ignore the WHERE clause here because we don't want accidental conversions from FULL JOIN to INNER JOIN during, for example, range analysis, as that would produce wrong results. GCOV shows that existing FULL OUTER JOIN tests exercise this new codepath.
73265e6 to
57819e9
Compare
In phase 1, FULL [OUTER] JOIN was only supported when simplify_joins()
could rewrite it into an equivalent LEFT, RIGHT, or INNER JOIN based
on NULL-rejecting WHERE predicates. Queries that could not be
rewritten raised ER_NOT_SUPPORTED_YET. (Phase 1 was not released.)
This commit removes that restriction by adding proper support for FULL
JOIN by executing a 'LEFT JOIN pass' that emits matched rows and left
null-complemented rows, then a second "null-complement" pass which
rescans the right table to emit null-complement rows that were never
matched.
FULL JOIN supports nested joins on the left of the FULL JOIN,
NATURAL FULL JOIN, semi-joins, CTEs / derived tables (kept
materialized when they participate in a FULL JOIN), prepared
statements, stored procedures, and aggregates. Examples:
SELECT * FROM (d1 FULL JOIN d2 ON d1.a = d2.a)
FULL JOIN t3 ON d1.a = t3.a;
SELECT * FROM t1 NATURAL FULL JOIN t2;
SELECT * FROM t1 INNER JOIN t2 FULL JOIN t3 ON t1.a = t3.a;
PREPARE st FROM
'SELECT COUNT(*) FROM t1 FULL JOIN t2 ON t1.a = t2.a';
Limitations:
- Statistics and cost estimates for the null-complement pass have
not been fully implemented; the optimizer may under- or
over-estimate FULL JOIN costs in plans involving multiple
FULL JOINs. Again, a follow-up will optimize the cost calculations.
- Optimizations for constant tables not fully supported.
- Nested tables on the right side of a FULL JOIN are not yet supported.
If a table that's in a FULL OUTER JOIN is found to be a const table, then don't allow the constant table optimization to take place. Later, when we support FULL OUTER JOIN on the inner side of other join types then we may be able to relax this restriction.
Prevent simplify_joins from rewriting a chained FULL JOIN into a query where a FULL JOIN could end up on the inner side of another outer join. Of course, this means that we will have a null complement pass that the rewritten query would have avoided. Once we support FULL JOINs on the inner side of outer joins, in phase 3, then we can relax this constraint.
The outermost FULL JOIN's right operand can be a nested join rather than a single base table. The parser places the nest on the right when the outermost FULL JOIN's ON is the last one written, because the parser keeps the outermost FULL JOIN pending until its ON arrives, and the inner FULL JOINs reduce first into a nest that becomes the right operand. alloc_full_join_duplicate_filters allocates the fj_dups filter on a JOIN_TAB carrying JOIN_TYPE_FULL | JOIN_TYPE_RIGHT, so with the FULL|RIGHT bits on the nest, which is never a JOIN_TAB, no filter was allocated and the null complement pass never fired. The unmatched rows from the right side were never emitted, producing a result with missing rows. Add swap_full_join_sides, called from rewrite_full_outer_joins when a FULL JOIN survives simplify_joins with a leaf on the left and a nested join on the right. FULL JOIN is symmetric on its operands, so swapping does not change query semantics; after the swap the leaf carries the FULL|RIGHT bits and the rescan target is a single base table.
check_full_join_base_tables runs before simplify_joins and rejects the disallowed FULL JOIN shapes that are visible in the parse tree. simplify_joins can rewrite a FULL JOIN to a LEFT, RIGHT, or INNER JOIN, so sometimes disallowed queries appear only afterward. Add check_full_join_after_simplify, called from optimize_inner once simplify_joins is done, to reject unsupported queries after rewrite by simplify_joins.
Two EXPLAIN queries in table_elim place a nested join on the right side of a FULL JOIN. Phase 2 supports only base tables there, so check_full_join_base_tables rejects them with ER_FULL_JOIN_BASE_TABLES_ONLY.
57819e9 to
1c10432
Compare