summaryrefslogtreecommitdiffstats
path: root/library/python/pytest
diff options
context:
space:
mode:
authoralexpaniman <[email protected]>2026-03-16 15:49:36 +0300
committeralexpaniman <[email protected]>2026-03-17 06:00:22 +0300
commit456efb878536940cf1442d584ea367ec4bcbfa0e (patch)
treedb8864f689e7b5457736be5c17d067bb08418b8b /library/python/pytest
parent4fbf7a5a5141851e0d61261763812e16d96aca41 (diff)
[CBO] Fix interesting orderings lookup: make it order-sensitive, i.e. shuffle [A, B] != [B, A]
# Description for reviewers Interesting orderings FSM, considers shuffles "not natural" (before this PR), i.e. the order doesn't matter. Since our hash function is order-depended this leads to correctness bugs. For example, this SQL query: ```sql $semi_join = ( SELECT * FROM customer c LEFT SEMI JOIN oorder o ON c.C_W_ID = o.O_W_ID AND c.C_D_ID = o.O_D_ID ); SELECT s.C_W_ID, s.C_D_ID, n.NO_O_ID FROM $semi_join s INNER JOIN new_order as n ON s.C_W_ID = n.NO_W_ID AND s.C_D_ID = n.NO_D_ID; ``` Used to produce this plan: ``` ┌> ResultSet └─┬> InnerJoin (Grace) (c.C_D_ID,c.C_W_ID = n.NO_D_ID,n.NO_W_ID) ├─┬> LeftSemiJoin (Grace) (c.C_D_ID,c.C_W_ID = o.O_W_ID,o.O_D_ID) │ ├─┬> HashShuffle (KeyColumns: ["C_W_ID","C_D_ID"], HashFunc: "HashV2") │ │ └──> TableFullScan (Table: customer, ReadColumns: ["C_W_ID (-∞, +∞)","C_D_ID (-∞, +∞)","C_ID (-∞, +∞)"]) │ └─┬> HashShuffle (KeyColumns: ["O_W_ID","O_D_ID"], HashFunc: "HashV2") │ └──> TableFullScan (Table: oorder, ReadColumns: ["O_W_ID (-∞, +∞)","O_D_ID (-∞, +∞)","O_ID (-∞, +∞)"]) └─┬> HashShuffle (KeyColumns: ["NO_W_ID","NO_D_ID"], HashFunc: "HashV2") └──> TableFullScan (Table: new_order, ReadColumns: ["NO_W_ID (-∞, +∞)","NO_D_ID (-∞, +∞)","NO_O_ID (-∞, +∞)"]) ``` This plan is wrong, because InnerJoin reuses \[C\_W\_ID, C\_D\_ID\] shuffle, when it really needs \[C\_D\_ID, C\_W\_ID\] - this will lead to incorrect result if executed on more than 1 node with overwhelming probability. After this PR, it produces a correct plan ``` ┌> ResultSet └─┬> InnerJoin (Grace) (c.C_D_ID,c.C_W_ID = n.NO_D_ID,n.NO_W_ID) ├─┬> HashShuffle (KeyColumns: ["c.C_D_ID","c.C_W_ID"], HashFunc: "HashV2") │ └─┬> LeftSemiJoin (Grace) (c.C_D_ID,c.C_W_ID = o.O_W_ID,o.O_D_ID) │ ├─┬> HashShuffle (KeyColumns: ["C_W_ID","C_D_ID"], HashFunc: "HashV2") │ │ └──> TableFullScan (Table: customer, ReadColumns: ["C_W_ID (-∞, +∞)","C_D_ID (-∞, +∞)","C_ID (-∞, +∞)"]) │ └─┬> HashShuffle (KeyColumns: ["O_W_ID","O_D_ID"], HashFunc: "HashV2") │ └──> TableFullScan (Table: oorder, ReadColumns: ["O_W_ID (-∞, +∞)","O_D_ID (-∞, +∞)","O_ID (-∞, +∞)"]) └─┬> HashShuffle (KeyColumns: ["NO_D_ID","NO_W_ID"], HashFunc: "HashV2") └──> TableFullScan (Table: new_order, ReadColumns: ["NO_W_ID (-∞, +∞)","NO_D_ID (-∞, +∞)","NO_O_ID (-∞, +∞)"]) ``` This mirrors PR made on github: <https://github.com/ydb-platform/ydb/pull/35609> # Changelog entry Type: fix Component: CBO Make interesting orderings order-sensitive commit_hash:2d8e98048cb669bfc7048a227a8b42504664a273
Diffstat (limited to 'library/python/pytest')
0 files changed, 0 insertions, 0 deletions