diff options
| author | alexpaniman <[email protected]> | 2026-03-16 15:49:36 +0300 |
|---|---|---|
| committer | alexpaniman <[email protected]> | 2026-03-17 06:00:22 +0300 |
| commit | 456efb878536940cf1442d584ea367ec4bcbfa0e (patch) | |
| tree | db8864f689e7b5457736be5c17d067bb08418b8b /library/python/pytest | |
| parent | 4fbf7a5a5141851e0d61261763812e16d96aca41 (diff) | |
[CBO] Fix interesting orderings lookup: make it order-sensitive, i.e. shuffle [A, B] != [B, A]
# Description for reviewers
Interesting orderings FSM, considers shuffles "not natural" (before this PR), i.e. the order doesn't matter. Since our hash function is order-depended this leads to correctness bugs.
For example, this SQL query:
```sql
$semi_join = (
SELECT *
FROM customer c
LEFT SEMI JOIN oorder o
ON c.C_W_ID = o.O_W_ID
AND c.C_D_ID = o.O_D_ID
);
SELECT s.C_W_ID, s.C_D_ID, n.NO_O_ID
FROM $semi_join s
INNER JOIN new_order as n
ON s.C_W_ID = n.NO_W_ID
AND s.C_D_ID = n.NO_D_ID;
```
Used to produce this plan:
```
┌> ResultSet
└─┬> InnerJoin (Grace) (c.C_D_ID,c.C_W_ID = n.NO_D_ID,n.NO_W_ID)
├─┬> LeftSemiJoin (Grace) (c.C_D_ID,c.C_W_ID = o.O_W_ID,o.O_D_ID)
│ ├─┬> HashShuffle (KeyColumns: ["C_W_ID","C_D_ID"], HashFunc: "HashV2")
│ │ └──> TableFullScan (Table: customer, ReadColumns: ["C_W_ID (-∞, +∞)","C_D_ID (-∞, +∞)","C_ID (-∞, +∞)"])
│ └─┬> HashShuffle (KeyColumns: ["O_W_ID","O_D_ID"], HashFunc: "HashV2")
│ └──> TableFullScan (Table: oorder, ReadColumns: ["O_W_ID (-∞, +∞)","O_D_ID (-∞, +∞)","O_ID (-∞, +∞)"])
└─┬> HashShuffle (KeyColumns: ["NO_W_ID","NO_D_ID"], HashFunc: "HashV2")
└──> TableFullScan (Table: new_order, ReadColumns: ["NO_W_ID (-∞, +∞)","NO_D_ID (-∞, +∞)","NO_O_ID (-∞, +∞)"])
```
This plan is wrong, because InnerJoin reuses \[C\_W\_ID, C\_D\_ID\] shuffle, when it really needs \[C\_D\_ID, C\_W\_ID\] - this will lead to incorrect result if executed on more than 1 node with overwhelming probability.
After this PR, it produces a correct plan
```
┌> ResultSet
└─┬> InnerJoin (Grace) (c.C_D_ID,c.C_W_ID = n.NO_D_ID,n.NO_W_ID)
├─┬> HashShuffle (KeyColumns: ["c.C_D_ID","c.C_W_ID"], HashFunc:
"HashV2")
│ └─┬> LeftSemiJoin (Grace) (c.C_D_ID,c.C_W_ID = o.O_W_ID,o.O_D_ID)
│ ├─┬> HashShuffle (KeyColumns: ["C_W_ID","C_D_ID"], HashFunc:
"HashV2")
│ │ └──> TableFullScan (Table: customer, ReadColumns: ["C_W_ID (-∞,
+∞)","C_D_ID (-∞, +∞)","C_ID (-∞, +∞)"])
│ └─┬> HashShuffle (KeyColumns: ["O_W_ID","O_D_ID"], HashFunc:
"HashV2")
│ └──> TableFullScan (Table: oorder, ReadColumns: ["O_W_ID (-∞,
+∞)","O_D_ID (-∞, +∞)","O_ID (-∞, +∞)"])
└─┬> HashShuffle (KeyColumns: ["NO_D_ID","NO_W_ID"], HashFunc:
"HashV2")
└──> TableFullScan (Table: new_order, ReadColumns: ["NO_W_ID (-∞,
+∞)","NO_D_ID (-∞, +∞)","NO_O_ID (-∞, +∞)"])
```
This mirrors PR made on github: <https://github.com/ydb-platform/ydb/pull/35609>
# Changelog entry
Type: fix
Component: CBO
Make interesting orderings order-sensitive
commit_hash:2d8e98048cb669bfc7048a227a8b42504664a273
Diffstat (limited to 'library/python/pytest')
0 files changed, 0 insertions, 0 deletions
