summaryrefslogtreecommitdiffstats
path: root/yql/essentials/sql
Commit message (Collapse)AuthorAgeFilesLines
...
* YQL-19747 Publish ParseUdfs, ParseTypes and othersvityaman2025-04-292-2/+25
| | | | | | | | | | | | | | A client might want to have completions of its own private UDFs. Then a client should make a JSON document and parse it to create a custom `TNameSet`. --- - Related to https://github.com/ydb-platform/ydb/issues/9056 - Related to https://github.com/vityaman/ydb/issues/36 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1251 commit_hash:bbee9be4a480262aa788e7b242b7abdc90882ba7
* YQL-19747 Introduce SimpleSchemaGatewayvityaman2025-04-299-36/+133
| | | | | | | | | | | | | | | | | | | Introduce the `SimpleSchemaGateway` to make it easier to implement `SchemaGateway`s. The idea is that actually existing schema providers really do not support filtration such as by name and type, so in practice they return us the whole list and we need to filter it by hand. The `SimpleSchemaGateway` to `SchemaGateway` adapter does this for us -- we only need to implement a path splitting and folder listing. The other and important feature of the `SimpleSchemaGateway` is that it is simple to implement a caching decorator for it -- just store a mapping `Path -> [FolderEntry]`, while caching a `SchemaGateway` with filters is soooo not trivial. I also added string constants for known folder entry types, because they should be documented somewhere. --- - Related to `YQL-19747` - Related to https://github.com/vityaman/ydb/issues/14 - Related to https://github.com/vityaman/ydb/issues/34 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1245 commit_hash:dda6dcac544ca95d5e8e08f1e7de9de6b5770f25
* YQL-19747 Enable custom NameSet and FrequencyDatavityaman2025-04-2817-140/+196
| | | | | | | | | | | | | | | | | | | | | | Clients might want to use custom `NameSet` and `FrequencyData` in their environment, for example, to get their private UDFs and to have more relevant ranking that includes their private UDF and respect their usage pattern. To achieve this goal I decided to load pure `NameSet` and `FrequencyData` and provide functions for pruning. Also I checked defaults and decided that it is more common for a client to create a `StaticNameService` from pure `NameSet` and `FrequencyData` to keep their pruning consistent. I also extracted a separate module `ranking`. It will be needed when I will implement `UnionNameService` to union `StaticNameService` and `SchemaNameService`. `UnionNameService` will load `Limit` names from each child and then crop them to sorted prefix of length `Limit` using `IRanking`. --- - Related to `YQL-19747` - Related to https://github.com/ydb-platform/ydb/issues/9056 - Related to https://github.com/vityaman/ydb/issues/14 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1246 Co-authored-by: vvvv <[email protected]> Co-authored-by: vvvv <[email protected]> commit_hash:cdca301a58a34e56d537a447b4ff779cd70faea6
* YQL-19747 Use TMaybevityaman2025-04-285-18/+22
| | | | | | | | | | | | | | Just replaced `std::optional` usages with `TMaybe` to prevent this refactoring noise in future PRs. --- - Related to `YQL-19747` - Related to https://github.com/ydb-platform/ydb/issues/9056 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1247 commit_hash:dca8a6849e5ba9cb614d8350996f9423a1dc2373
* YQL-19747: Refactor sql/v1/complete usage of Future and Ptrvityaman2025-04-2531-330/+262
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - [x] Chained futures, add `CompleteAsync` method (then will migrate the YDB CLI on it). - [x] Removed deadlined and fallback NameService as unused - [x] Annotate thread-safe methods with `const` and use `AtomicSharedPtr` for them. - [x] Move `name` to `name/service` with backward compatibility with the YDB CLI. --- `CompletionEngine` is left thread-unsafe because of the dependency chain `CompletionEngine -> LocalSyntaxAnalysis -> C3Engine` which is thread-unsafe, but readonly indexed data structures such as `Ranking` and `NameService` are annotated with const and distributed via shared pointers. I removed deadlined and fallback name services because the first is stupid the second is ahead of its time and is better to be added later to keep interfaces as minimal as possible. --- The migration on async complete plan: 1. Introduce CompleteAsync 2. Migrate clients on CompleteAsync 3. Make Complete to return Future 4. Migrate clients on Complete 5. Remove CompleteAsync --- - Related to https://github.com/ydb-platform/ydb/issues/9056 - Related to https://github.com/vityaman/ydb/issues/33 - Related to https://github.com/vityaman/ydb/issues/31 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1241 Co-authored-by: vvvv <[email protected]> Co-authored-by: vvvv <[email protected]> commit_hash:497cc081ab78bebf7354e0acfaa418d936cc8240
* Primitives for case insensitive simple pattern matchzverevgeny2025-04-255-8/+25
| | | | commit_hash:5f4bdb090c2f60459073e3e95ccd39ec58b95232
* YQL-19747 Detect a token at the caret positionvityaman2025-04-2310-115/+302
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I tried to implement a folder and object names completion at `ID_QUOTED` I faced with a problem, that I actually can't detect, that cursor is at `ID_QUOTED` token because `TCompletionInput::Text` it was cut until the `TCompletionInput::CursorPosition`, therefore at input ``` SELECT * FROM `#` ``` prefix was ``` SELECT * FROM `# ``` and then lexer failed. While we actually want tokenize the whole current statement, `C3` still needs to receive a prefix as input. I tried to tokenize the whole statement and then on input `SELECT Optional<#>` got nothing because `<>` is solid token in the `SQL`. The only way to fix it I found is to cut a query to prefix until the cursor position. BTW, current implementation is not so efficient as we tokenize the input multiple times. Especially `SplitQueryToStatemnts` seems heavy. In future we anyway will parse the whole input so will need to design APIs to receive ready token streams to do statements splitting, for example, just not to do the work twice. ![image](https://github.com/user-attachments/assets/114804d3-f311-4a46-be84-8ed4650bc9dd) So I introduce you the following changes - [x] Select the whole current statement, not just prefix. - [x] Find the token at caret and output no candidates when caret is at `STRING_VALUE`, `DIGIGTS` and so on. - [x] Change `C3` wrapper interface to take `TCompletionInput` to hide an implementation detail that it runs on cut prefix. - [x] `#` annotated queries in unit tests. - [x] Detect `CaretTokenPosition` -- if is it enclosed with a token or between two. - [x] Ensure that `maxErrors` in `ILexer::Tokenize` is positive. Just a tiny bugfix. --- - Related to https://github.com/ytsaurus/ytsaurus/pull/1209 - Related to https://github.com/ydb-platform/ydb/issues/9056 - Related to https://github.com/vityaman/ydb/issues/14 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1225 commit_hash:a434b9888ec8a7356247d63d9f1420e256ae4fca
* YQL-19747 Normalize names for ranking and filteringvityaman2025-04-2210-22/+183
| | | | | | | | | | | | | | I was lazy to search for a most frequent used name among equivalent by the relation `(a ~ b) iff (NormalizeName(a) = NormalizeName(b))`. Because it seems that names we receive from JSONs are canonized and therefore in a preferable style by the opinion of the YQL language designers. But because of duplicates at `statements_opensource.json` we have, for example, both `IGNORETYPEV3` and `IGNORE_TYPE_V3` in candidates list. I think that we should just remove `IGNORETYPEV3` from the JSON. --- - Related to https://github.com/ydb-platform/ydb/issues/9056 - Related to https://github.com/vityaman/ydb/issues/21 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1229 commit_hash:fe73374ae27df1fcacb0adccda930ec98ed1d7a6
* Intermediate changesrobot-piglet2025-04-229-0/+293
| | | | commit_hash:a1ac242ea5619c1e83dd638a82653e1d5c385821
* YQL-19845 CurrentLanguageVersion funcvvvv2025-04-221-0/+1
| | | | commit_hash:2af511a18740c931b471dc1f2ff36a8b4ce573a8
* YQL-19864 docsvvvv2025-04-184-5/+16
| | | | commit_hash:cc82995d04c8bd3b7ca4d6fe69e91edc092e1b32
* YQL-19864 sql flag + test with explicit flag & by versionvvvv2025-04-174-3/+21
| | | | commit_hash:902cfa0c1b574c1addb5df96a4b38c792ae82258
* YQL-19845 support of lang version checking inside facadevvvv2025-04-162-0/+3
| | | | commit_hash:5cfb2a0aa2904106df4ae69b9311bcc5a695928d
* YQL-19747 Remove default completion engine factoryvityaman2025-04-153-25/+0
| | | | | | | | | | | | | This is to decouple from the `sql/v1/lexer` implementation. - Related to https://github.com/ydb-platform/ydb/issues/9056 - Related to https://github.com/vityaman/ydb/issues/18 - Following https://github.com/ydb-platform/ydb/pull/16820 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1196 commit_hash:4995f3a7cb14b0b735a48e211111764f65be8033
* YQL-19616 refactor test lexers from sql2yql, supported facade run toolsvvvv2025-04-144-0/+113
| | | | commit_hash:fb1727dda2b8c7d2ff42d4436c54cb7aa1ce4bc2
* YQL-19747 Rank keywords just by plain usagesvityaman2025-04-1419-91/+198
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - [x] Rank keywords just by plain usages - [x] `LocalSyntaxAnalysis` now returns a mapping `:: Keyword -> [Following Keywords]`. - [x] Extracted keyword sequence formatting from `syntax/local` to `syntax/format`. - [x] Extracted token display logic from `syntax/local` to `antlr4/vocabulary` as it is ANTLR dependent. --- Example ```python $ ./yql_complete <<< "select " [Keyword] CAST( [Keyword] NULL [Keyword] NOT [FunctionName] If( [FunctionName] Yson::ConvertToString( [FunctionName] Count( [FunctionName] Sum( [FunctionName] Unwrap( [FunctionName] Coalesce( [Keyword] DISTINCT [Keyword] ALL [Keyword] CASE [FunctionName] Max( [Keyword] FALSE [FunctionName] Some( ``` --- - Related to https://github.com/ydb-platform/ydb/issues/9056 - Related to https://github.com/vityaman/ydb/issues/17 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1197 commit_hash:f42cb4aaffe6de7c9137069c4d9c635ee110a805
* Intermediate changesrobot-piglet2025-04-144-3/+50
| | | | commit_hash:b6187f8eba6e8debc23f1928b2e44a396f3511ad
* YQL-19616 Fix lexer/regex STRING_VALUE and TSKIP recognitionvityaman2025-04-113-10/+40
| | | | | | | | | | - Related to https://github.com/ydb-platform/ydb/issues/15129 - Related to https://github.com/vityaman/ydb/issues/11 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1201 commit_hash:53ef677a35649a6dc77d8c4269a8aceefcd15026
* YQL-19790 allow distinct over keysvvvv2025-04-104-1/+9
| | | | commit_hash:5f778a5600a05b527c9ff0b07dcf55e207782165
* YQL-19747 Complete read hints on PROCESS and REDUCEvityaman2025-04-104-13/+33
| | | | | | | | | | | | Forgot to support it. - Related to https://github.com/ydb-platform/ydb/issues/9056 - Related to https://github.com/vityaman/ydb/issues/24 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1195 commit_hash:a2f5937dbca5712f3ecbfccdf66662ce99e70619
* YQL-19747 Complete select and insert hintsvityaman2025-04-0920-28/+220
| | | | | | | | | | - Related to https://github.com/ydb-platform/ydb/issues/9056 - Related to https://github.com/vityaman/ydb/issues/19 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1189 commit_hash:7f1cb1dcf0617aa2c94c3f2188fc9bd481380252
* YQL-19616 Fix regex lexervityaman2025-04-096-9/+27
| | | | | | | | | | | | | | | | | | Fixed regex lexer issues: - `TSKIP` token recognition - `HEXGIGITS` number recognition - `EOF` token content --- - Related to https://github.com/ydb-platform/ydb/issues/15129 - Related to https://github.com/vityaman/ydb/issues/11 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1190 commit_hash:497c39efcbbe4e387da523b5e2c8abaa6485d93b
* YQL-19747 Complete after PRAGMA and multi-token namesvityaman2025-04-0815-30/+286
| | | | | | | | | | | | - [x] Complete after PRAGMA - [x] Complete multi-token names correctly, for example, `yt.` returns only `DisableStrict`, not `yt.DisableStrict` and `DateTime::` returns `Split`, not `DateTime::Split`. I tried to implement it using `CompletedToken` edition, but not all completion environments support candidates with various `contextLen` (`Replxx` does not). So I decided that completions should rewrite only the current token, not sequences. For example, on `DateTime::Spl` rewrite only `Spl`. It makes sense as multi-token names have some namespace separated by a punctuation, so used types only namespace and gets names inside of it. --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1181 commit_hash:9d8967ac43b9348f6dbb53837d92a9dcc9b51f48
* Fix keeping aggregate columns in case of distinct aggregation over windowziganshinmr2025-04-081-11/+11
| | | | commit_hash:6aa8a8297542455d107d7debbfaac3f30f48d885
* YQL-19747 Improve yql_complete tool and add input validationvityaman2025-04-082-9/+32
| | | | | | | | | No description --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1185 commit_hash:1def5874ff6a9a5b3dcdd0ad285d2e64b16c9306
* YQL-19747 statements (hints etc)vvvv2025-04-072-0/+12
| | | | commit_hash:1288e94c1f35aed35f40ac5e9b59e708b7cfafad
* Intermediate changesrobot-piglet2025-04-078-14/+34
| | | | commit_hash:6768768ea3a3962231d3fabdffb2ce0db44e9347
* YQL-19747 Complete type name as a function argumentvityaman2025-04-074-6/+52
| | | | | | | | | | | | | | | | | As I understand, type name should not be completed at `SELECT |`, so I added a check that we are at `invoke_expr` context. Currently composite type keywords are suggested at `SELECT |` and also are uppercased. I will fix it separately when this merged during - https://github.com/users/vityaman/projects/5?pane=issue&itemId=105056723&issue=vityaman%7Cydb%7C8 --- - Related to https://github.com/ydb-platform/ydb/issues/9056 - Related to https://github.com/users/vityaman/projects/5/views/1?pane=issue&itemId=105056423&issue=vityaman%7Cydb%7C7 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1182 commit_hash:e87565867cf9fa82d9ac49a88d59b293d6686fe7
* YQL-19747 pragmasvvvv2025-04-034-0/+73
| | | | commit_hash:7aaa06cd58cc9563a1656a7118c14a461e7f4e2d
* YQL-19747 Complete token sequencesvityaman2025-04-027-82/+120
| | | | | | | | | | | | | Token sequences plan - [x] [Easy] Support `GROUP BY`, `ORDER BY`. - [x] [Easy] Support `Optional<`, `List<`, `Dict<`. - [x] [Easy] Support `Avg(`, `Sum(`. --- Co-authored-by: Victor Smirnov [[email protected]] Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1173 commit_hash:a443dec666c486fef7f891be04d68a786be83049
* YQL-19747 Introduce types and functions rankingvityaman2025-04-0218-147/+526
| | | | | | | | | | | | - [x] Fix bug with incorrect no-case sorting. - [x] Get names from `sql_functions.json` and `types.json`. - [x] Add types and functions ranking according to `rules_corr_basic.json` data via a `PartialSort`. - [x] Add benchmark workspace. --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1167 commit_hash:84d93265fb69bf5651f905d6af038056657e9a16
* Intermediate changesrobot-piglet2025-04-022-5/+7
| | | | commit_hash:28e9e4bd6b02e8914d82b2aafe9f341b5492421f
* Fix DISTINCT over window over joinziganshinmr2025-04-011-0/+4
| | | | commit_hash:d7101ec6fbc95dde360e2a18ac52159dd4535764
* Intermediate changesrobot-piglet2025-04-018-86/+327
| | | | commit_hash:e57b3e95787cc8037f200f1b6b6073e35403b27e
* view: support show create in parserdeminds2025-03-317-11/+51
| | | | commit_hash:1cf0e84327c47568687a689e091a6efbc8286bed
* Error for batch operations with RETURNINGditimizhev2025-03-312-4/+28
| | | | commit_hash:23ea6a6a3224e161a1998aceb2162dfe84744831
* YQL-19747 Complete Function Namesvityaman2025-03-2832-291/+517
| | | | | | | | | | | - Function names are suggested now - Changed the module structure - Checking ruleIndex independence on mode (ansi | default) via unit tests --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1163 commit_hash:1b1a27d2cff8db663c5c7e8efb57896476823315
* YQL-19747 sql functionsvvvv2025-03-282-394/+437
| | | | commit_hash:9f628fe1894ee7dcdcbdd161855b668ca6e7380f
* YQL-19616 Convert YQL lexer grammar to regexesvityaman2025-03-2820-64/+1263
| | | | | | | | | | | | | | | | | - [x] Parse YQL grammar to extract lexer grammar into `TLexerGrammar`. - [x] Translate `TLexerGrammar` into regexes. - [x] Implement a lexer via regexes `TRegexLexer` to test generated regexes validity. - [x] Test on `Default` syntax mode. - [x] Test on `ANSI` syntax mode. --- - Related to https://github.com/ydb-platform/ydb/issues/15129 - Requirement for https://github.com/ytsaurus/ytsaurus/pull/1112 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1127 commit_hash:03ffffe81cdafe7f93a4d3fd9a3212fe67f1c72d
* YQL-19747 Complete a simple type namevityaman2025-03-2819-38/+544
| | | | | | | | | - Related to https://github.com/ydb-platform/ydb/issues/9056 --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1151 commit_hash:6e1e429a2ea805016bf00a1e60b501b7fc8dc8de
* YQL-19747 Split statementsvityaman2025-03-279-32/+127
| | | | | | | | | | | | When we run completion engine on multi-statement query, where preceding statements are syntactically incorrect, `antlr4-c3` does not return candidates. Running engine only on a current statement provides a best-effort attempt to provide candidates. - Related to https://github.com/ydb-platform/ydb/issues/9056 - Depends on https://github.com/ytsaurus/ytsaurus/pull/1127 (`ELexerFlavor`) --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1144 commit_hash:0ced9443a9712191f5420246531f781ca4bc5f42
* YQL-19716: Fix Block Readermrlolthe1st2025-03-211-1/+6
| | | | commit_hash:61891dc030f4c526542b5e7e070d1660880e6c08
* YQL-19746 fixvvvv2025-03-204-6/+17
| | | | commit_hash:49e5e33ed38f7fd623cef80d2765d464a353ff9c
* YQL-19616 Implement ILexer via antlr_astVictor Smirnov2025-03-1917-23/+172
| | | | | | | | | | | | | - [x] Added `antlr_ast/antlr4` module and moved `TLexerTokensCollector4` there from `proto_ast/antlr4`. - [x] Moved stuff around back and forth. Ready for a review. --- Co-authored-by: vityaman [[email protected]] Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1128 commit_hash:e08785c3408ef813505bdc7511560e9536f4ab79
* Added settings of create transfer statement: flush_interval, batch_size, ↵tesseract2025-03-191-1/+12
| | | | | | | consumer_name <https://github.com/ydb-platform/ydb/pull/15770> commit_hash:f9e7a01a29938b4bec4aacdaa4e116101326f7da
* YQL fix complete logicvityaman2025-03-182-23/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## Why these changes are proposed? I figured out that all this time I did not understand how `antlr4-c3` really works. It is configured with ```ts ignoredTokens: Set<number> preferredRules: Set<number> ``` On completion, it returns ```ts class CandidatesCollection { tokens: Map<number, TokenList> rules: Map<number, ICandidateRule> } class ICandidateRule { startTokenIndex: number ruleList: RuleList } type TokenList = number[] type RuleList = number[] ``` I thought that `rules` is a mapping from a `TokenId` to a `ParserCallStack`, but it totally is not true, as `rules` key is in fact a `RuleId`. The documentation says > Whenever the c3 engine hits a __lexer token__ ... it will check the call stack for it and, if that contains any of the **preferred rules**, will select that **instead** of the __lexer token__. > [Preferred] Rules which replace any candidate token they contain. So when we add the rule `X` to `preferredRules`, then when `C3` hits a lexer token with a `Parser Call Stack` containing the rule `X`, it will not add it to `CandidatesCollection::tokens`, but instead it will add an entry to `CandidatesCollection::rules` with a `Parser Call Stack`. It used when we have `ID_PLAIN` in a grammar, but this `ID_PLAIN` has different meaning depending on the context (`Parser Call Stack`), e.g. it can be a Table Name, Column Name and so on... So we can ask C3 not to include `ID_PLAIN` in `CandidatesCollection::tokens`, but instead emit `rules` like `table_id`, `column_id` and so on into the `CandidatesCollection::rules`. So we have a `Parser Call Stack` for `rules`, but not for `tokens`. ## How it works correctly now then? Described in the comments section. ## How to live on? - [BAD ] Make a rule for each token to capture keywords at `CandidatesCollection::rules`. - [GOOD] Extend `antlr4-c3` to include `Parser Call Stack` for `CandidatesCollection::tokens`. --- Pull Request resolved: https://github.com/ytsaurus/ytsaurus/pull/1131 commit_hash:1a954f55098b9c090ab87e88f8bee61d9ff319ed
* YQL-19712 RuntimeLogLevel setting, mrjob loggervvvv2025-03-183-0/+23
| | | | | init commit_hash:6178c9e20a737d499b13f1b38fdfb621f2d8db2f
* YQL-19701 linter extension for unknown clustersvvvv2025-03-132-3/+7
| | | | commit_hash:79c042af0cf2c51389b5a22bd866cd211b6acf64
* view: use parent context for query AST buildingdeminds2025-03-114-69/+70
| | | | | | | | | | | | | | | | | | Issues: * <HIDDEN_URL> * <https://github.com/ydb-platform/ydb/issues/14709> Previously it was impossible to create a simple view like the following: ```sql CREATE VIEW demo_view WITH (security_invoker = TRUE) AS SELECT "bbb" LIKE Unwrap("aaa") ``` The problem was caused by the fact that a separate parsing context was used to build a view select AST for later validation. The list of UDFs is global for the whole parsing process to optimize the calculations. This global list was lost in the separate context and the view validation failed in such cases. We fix the issue by using the same context for the view query validation in which the `CREATE VIEW` statement is executed. Moreover, we have started capturing all the statements in the view query text that can affect the compilation (see the `NeedUseForAllStatements` function). commit_hash:f84d54ff1688fb43af7170a4db35f6729f2c4f10
* YQL-19212 more fixes to translator for case insensitive udf namesvvvv2025-03-101-12/+18
| | | | commit_hash:b73ddecc5bdf4eb52cb62f9ba551c72e2d2d3e73