diff options
author | alextarazanov <[email protected]> | 2022-09-21 14:39:43 +0300 |
---|---|---|
committer | alextarazanov <[email protected]> | 2022-09-21 14:39:43 +0300 |
commit | 37a16126ac62d0af47b33d50c786b9adc009f6f3 (patch) | |
tree | 3c29af4070bc696e36066bbd6df7385a0f5e6a60 | |
parent | cc0c2266f4c100fd5cf2ee93b34244c2d2552daf (diff) |
[review] [YDB] Check translate's
Проверка всех "зависших" и текущих переводов в работе — все сошлось, все файлы на месте.
Лог локальной сборки без ошибок.
25 files changed, 385 insertions, 426 deletions
diff --git a/ydb/docs/en/core/best_practices/_includes/batch_upload.md b/ydb/docs/en/core/best_practices/_includes/batch_upload.md index e7ab3fd4c3b..34863e4f07f 100644 --- a/ydb/docs/en/core/best_practices/_includes/batch_upload.md +++ b/ydb/docs/en/core/best_practices/_includes/batch_upload.md @@ -6,32 +6,32 @@ There are anti-patterns and non-optimal settings for uploading data. They don't To accelerate data uploads, consider the following recommendations: * Shard a table when creating it. This lets you effectively use the system bandwidth as soon as you start uploading data. - By default, a new table consists of a single shard. {{ ydb-short-name }} supports automatic table sharding by data volume. This means that a table shard is divided into two shards when it reaches a certain size. - The acceptable size for splitting a table shard is 2 GB. As the number of shards grows, the data upload bandwidth increases, but it remains low for some time at first. - Therefore, when uploading a large amount of data for the first time, we recommend initially creating a table with the desired number of shards. You can calculate the number of shards based on 1 GB of data per shard in a resulting set. + By default, a new table consists of a single shard. {{ ydb-short-name }} supports automatic table sharding by data volume. This means that a table shard is divided into two shards when it reaches a certain size. + The acceptable size for splitting a table shard is 2 GB. As the number of shards grows, the data upload bandwidth increases, but it remains low for some time at first. + Therefore, when uploading a large amount of data for the first time, we recommend initially creating a table with the desired number of shards. You can calculate the number of shards based on 1 GB of data per shard in a resulting set. * Insert multiple rows in each transaction to reduce the overhead of the transactions themselves. - Each transaction in {{ ydb-short-name }} has some overhead. To reduce the total overhead, you should make transactions that insert multiple rows. Good performance indicators terminate a transaction when it reaches 1 MB of data or 100,000 rows. - When uploading data, avoid transactions that insert a single row. + Each transaction in {{ ydb-short-name }} has some overhead. To reduce the total overhead, you should make transactions that insert multiple rows. Good performance indicators terminate a transaction when it reaches 1 MB of data or 100,000 rows. + When uploading data, avoid transactions that insert a single row. * Within each transaction, insert rows from the primary key-sorted set to minimize the number of shards that the transaction affects. - In {{ ydb-short-name }}, transactions that span multiple shards have higher overhead compared to transactions that involve exactly one shard. Moreover, this overhead increases with the growing number of table shards involved in the transaction. - We recommend selecting rows to be inserted in a particular transaction so that they're located in a small number of shards, ideally, in one. + In {{ ydb-short-name }}, transactions that span multiple shards have higher overhead compared to transactions that involve exactly one shard. Moreover, this overhead increases with the growing number of table shards involved in the transaction. + We recommend selecting rows to be inserted in a particular transaction so that they're located in a small number of shards, ideally, in one. * If you need to push data to multiple tables, we recommend pushing data to a single table within a single query. * If you need to push data to a table with a synchronous secondary index, we recommend that you first push data to a table and, when done, build a secondary index. * You should avoid writing data sequentially in ascending or descending order of the primary key. - Writing data to a table with a monotonically increasing key causes all new data to be written to the end of the table, since all tables in YDB are sorted by ascending primary key. As YDB splits table data into shards based on key ranges, inserts are always processed by the same server that is responsible for the "last" shard. Concentrating the load on a single server results in slow data uploading and inefficient use of a distributed system. + Writing data to a table with a monotonically increasing key causes all new data to be written to the end of the table, since all tables in YDB are sorted by ascending primary key. As YDB splits table data into shards based on key ranges, inserts are always processed by the same server that is responsible for the "last" shard. Concentrating the load on a single server will result in slow data uploading and inefficient use of a distributed system. * Some use cases require writing the initial data (often large amounts) to a table before enabling OLTP workloads. In this case, transactionality at the level of individual queries is not required and you can use ```BulkUpsert``` calls in the API and SDK. Since no transactionality is used, this approach has much lower overhead as compared to YQL queries. In case of a successful response to the query, the ```BulkUpsert``` method guarantees that all data added within this query is committed. {% note warning %} -The ```BulkUpsert``` method isn't supported on tables with secondary indexes. +The ```BulkUpsert``` method isn't supported for tables with secondary indexes. {% endnote %} -We recommend the following algorithm for efficiently uploading data to {{ ydb-short-name }}: - 1. Create a table with the desired number of shards based on 1 GB of data per shard. - 2. Sort the source data set by the expected primary key. - 3. Partition the resulting data set by the number of shards in the table. Each part will contain a set of consecutive rows. - 4. Upload the resulting parts to the table shards concurrently. - 5. Make a ```COMMIT``` after every 100,000 rows or 1 MB of data. +We recommend the following algorithm for efficiently uploading data to {{ ydb-short-name }}: +1. Create a table with the desired number of shards based on 1 GB of data per shard. +2. Sort the source data set by the expected primary key. +3. Partition the resulting data set by the number of shards in the table. Each part will contain a set of consecutive rows. +4. Upload the resulting parts to the table shards concurrently. +5. Make a ```COMMIT``` after every 100,000 rows or 1 MB of data. diff --git a/ydb/docs/en/core/best_practices/_includes/pk_scalability.md b/ydb/docs/en/core/best_practices/_includes/pk_scalability.md index a1f1786f23d..4d6142b2ca8 100644 --- a/ydb/docs/en/core/best_practices/_includes/pk_scalability.md +++ b/ydb/docs/en/core/best_practices/_includes/pk_scalability.md @@ -1,46 +1,46 @@ -# Choosing a primary key for maximum performance +# Selecting a primary key for maximum performance -Proper design of the table's primary key is important for the performance for both data access and load operations. +The way columns are selected for a table's primary key defines YDB's ability to scale load and improve performance. General recommendations for choosing a primary key: -* Avoid situations when the significant part of the workload falls on a single [partition](../../concepts/datamodel/table.md#partitioning) of a table. The more evenly the workload is distributed across the partitions, the higher the performance. -* Reduce the number of table partitions that are affected by a single request. Moreover, if the request affects no more than one partition, it is executed using a special simplified protocol. This significantly increases the speed of execution and conserves the resources. +* Avoid situations where the main load falls on one [partition](../../concepts/datamodel/table.md#partitioning) of a table. The more evenly load is distributed across partitions, the better the performance. +* Reduce the number of partitions that can be affected in a single request. Moreover, if the request affects no more than one partition, it is performed using a special simplified protocol. This significantly increases the speed and saves the resources. All {{ ydb-short-name }} tables are sorted by primary key in ascending order. In a table with a monotonically increasing primary key, this will result in new data being added at the end of a table. As {{ ydb-short-name }} splits table data into partitions based on key ranges, inserts are always processed by the same server that is responsible for the "last" partition. Concentrating the load on a single server results in slow data uploading and inefficient use of a distributed system. As an example, let's take logging of user events to a table with the ```( timestamp, userid, userevent, PRIMARY KEY (timestamp, userid) )``` schema. -The values in the ```timestamp``` column increase monotonically resulting in all new records being added at the end of a table, and the final partition, which is responsible for this range of keys, handles all the table inserts. This makes the data ingestion process unscalable, because the performance will be limited by the single process servicing the trailing partition of the table. +The values in the ```timestamp``` column increase monotonically resulting in all new records being added at the end of a table, and the final partition, which is responsible for this range of keys, handles all the table inserts. This makes scaling insert loads impossible and performance will be limited by the single process servicing this partition and won't increase as new servers are added to a cluster. -{{ ydb-short-name }} supports automatic splitting of table partitions based on thresholds for the data volume or workload. However, in this scenario, once the split occurs, the new trailing partition will start handling all the inserts again, and the situation will recur. +{{ ydb-short-name }} supports further automatic partition splitting upon a threshold size or load being reached. However, in this situation, once it splits off, the new partition will again begin handling all the inserts, and the situation will recur. -## Techniques to evenly distribute the workload across table partitions {#balance-shard-load} +## Techniques that let you evenly distribute load across table partitions {#balance-shard-load} ### Changing the sequence of key components {#key-order} -Writing data to a table with the ```( timestamp, userid, userevent, PRIMARY KEY (timestamp, userid) )``` schema results in an uneven load on table partitions due to a monotonically increasing primary key. Changing the sequence of key components so that the monotonically increasing part isn't the first component can help distribute the workload more evenly. Redefining the table's primary key as ```PRIMARY KEY (userid, timestamp)``` allows {{ ydb-short-name }} to distribute writes with the same timestamp values across multiple table partitions, if there is a sufficient number of users generating events. +Writing data to a table with the ```( timestamp, userid, userevent, PRIMARY KEY (timestamp, userid) )``` schema results in an uneven load on table partitions due to a monotonically increasing primary key. Changing the sequence of key components so that the monotonically increasing part isn't the first component can help distribute the load more evenly. If you redefine a table's primary key as ```PRIMARY KEY (userid, timestamp)```, the DB writes will distribute more evenly across the partitions provided there is a sufficient number of users generating events. ### Using a hash of key column values as a primary key {#key-hash} To obtain a more even distribution of inserts across a table's partitions, make the primary key "prefix" (initial part) values more varied. To do this, make the primary key include the value of a hash of the entire primary key or a part of the primary key. -For instance, the table with the schema ```( timestamp, userid, userevent, PRIMARY KEY (userid, timestamp) )``` can be modified to include the additional field computed as a hash: ```userhash = HASH(userid)```. This would change the table schema as follows: +For instance, the schema of this table with the schema ```( timestamp, userid, userevent, PRIMARY KEY (userid, timestamp) )``` might be made to include an additional field computed as a hash: ```userhash = HASH(userid)```. This would change the table schema as follows: ``` ( userhash, userid, timestamp, userevent, PRIMARY KEY (userhash, userid, timestamp) ) ``` -With the proper choice of a hash function, table rows will be distributed evenly throughout the entire key space, which will result in a more even workload. At the same time, the fact that the key includes ```userid, timestamp``` after ```userhash``` keeps the data local and sorted by time for a specific user. +If you select the hash function properly, rows will be distributed fairly evenly throughout the entire key space, which will result in a more even load on the system. At the same time, the fact that the key includes ```userid, timestamp``` after ```userhash``` keeps the data local and sorted by time for a specific user. The ```userhash``` field in the example above must be computed by the application and specified explicitly both for inserting new records into the table and for data access by primary key. ### Reducing the number of partitions affected by a single query {#decrease-shards} -Let's assume that the main scenario for working with table data is to read all events by a specific ```userid```. Querying the table with schema ```( timestamp, userid, userevent, PRIMARY KEY (timestamp, userid) )``` with the filter over the ```userid``` column requires to access all the partitions of the table. Moreover, each partition will have to be fully scanned, since the rows related to a specific ```userid``` are located in an order that isn't known in advance. Changing the sequence of ```( timestamp, userid, userevent, PRIMARY KEY (userid, timestamp) )``` key components causes all rows related to a specific ```userid``` to follow each other. This row distribution will be useful for reading data by ```userid``` and will reduce load. +Let's assume that the main scenario for working with table data is to read all events by a specific ```userid```. Then, when you use the ```( timestamp, userid, userevent, PRIMARY KEY (timestamp, userid) )``` table schema, each read affects all the partitions of the table. Moreover, each partition is fully scanned, since the rows related to a specific ```userid``` are located in an order that isn't known in advance. Changing the sequence of ```( timestamp, userid, userevent, PRIMARY KEY (userid, timestamp) )``` key components causes all rows related to a specific ```userid``` to follow each other. This row distribution will be useful for reading data by ```userid``` and will reduce load. ## NULL value in a key column {#key-null} -In {{ ydb-short-name }}, all columns, including key ones, may contain a NULL value. Using NULL as values in key columns isn't recommended. According to the SQL standard (ISO/IEC 9075), NULL values cannot be compared with other values. Therefore, the use of concise SQL statements with simple comparison operators may lead, for example, to skipping rows containing NULL during filtering. +In {{ ydb-short-name }}, all columns, including key ones, may contain a NULL value. Using NULL as values in key columns isn't recommended. According to the SQL standard (ISO/IEC 9075), you can't compare NULL with other values. Therefore, the use of concise SQL statements with simple comparison operators may lead, for example, to skipping rows containing NULL during filtering. ## Row size limit {#limit-string} diff --git a/ydb/docs/en/core/best_practices/_includes/secondary_indexes.md b/ydb/docs/en/core/best_practices/_includes/secondary_indexes.md index c911d292ade..dfd2a008137 100644 --- a/ydb/docs/en/core/best_practices/_includes/secondary_indexes.md +++ b/ydb/docs/en/core/best_practices/_includes/secondary_indexes.md @@ -1,14 +1,14 @@ # Secondary indexes -[Indexes]{% if lang == "ru" %}(https://ru.wikipedia.org/wiki/Индекс_(базы_данных)){% endif %}{% if lang == "en" %}(https://en.wikipedia.org/wiki/Database_index){% endif %} are auxiliary database structures that are used to locate data based on specific criteria without searching the entire database. They are also used to retrieve sorted samples without sorting, which would require processing the full dataset. +[Indexes]{% if lang == "ru" %}(https://ru.wikipedia.org/wiki/Индекс_(базы_данных)){% endif %}{% if lang == "en" %}(https://en.wikipedia.org/wiki/Database_index){% endif %} are auxiliary structures within databases that help find data by certain criteria without having to search an entire database, and retrieve sorted samples without actually sorting, which would require processing the entire dataset. -Data in YDB tables is always sorted by the primary key. This means that regardless of the total number of table entries, retrieving an entry from the database with specific values in primary key fields will always take the minimum amount of time. Indexing by the primary key makes it possible to retrieve any consecutive range of entries in ascending or descending order of the primary key. The execution time for this operation depends only on the number of retrieved entries rather than the total number of table values. +Data in a YDB dataset is always sorted by the primary key. That means that retrieving any entry from the table with specified field values comprising the primary key always takes the minimum fixed time, regardless of the total number of table entries. Indexing by the primary key makes it possible to retrieve any consecutive range of entries in ascending or descending order of the primary key. Execution time for this operation depends only on the number of retrieved entries rather than on the total number of table values. -To use a similar feature with any field or combination of fields, additional indexes, called **secondary indexes**, can be created for them. +To use a similar feature with any field or combination of fields, additional indexes called **secondary indexes** can be created for them In transactional systems, indexes are used to limit or avoid performance degradation and increase of query cost as your data grows. -This article describes basic operations with secondary indexes and gives references to a detailed description of each operation. For more information about various types of secondary indexes and their specific features, see [Secondary indexes](../../concepts/secondary_indexes.md) in the Concepts section. +This article describes the main operations with secondary indexes and gives references to detailed information on each operation. For more information about various types of secondary indexes and their specifics, see [Secondary indexes](../../concepts/secondary_indexes.md) in the Concepts section. ## Creating secondary indexes {#create} @@ -16,16 +16,16 @@ A secondary index is a data schema object that can be set when creating a table The [`table index add` command](../../reference/ydb-cli/commands/secondary_index.md#add) is supported in the YDB CLI. -Since an index contains its own data derived from table data, when creating an index on an existing table with data, an operation is performed to initially build an index. This may take a long time. This operation is executed in the background and you can continue to work with the table while it's in progress. However, you can't use a new index until it's created. +Since an index contains its own data derived from table data, when creating an index on an existing table with data, an operation is performed to initially build an index. This may take a long time. This operation is executed in the background and you can keep working with the table while it's in progress. However, you can't use the new index until it's created. An index can only be used in the order of the fields included in it. If an index contains two fields, such as `a` and `b`, you can effectively use it for queries such as: -* `WHERE a = $var1 AND b = $var2`. -* `WHERE a = $var1`. +* `WHERE a = $var1 AND b = $var2`; +* `WHERE a = $var1`; * `WHERE a > $var1` and other comparison operators. * `WHERE a = $var1 AND b > $var2` and any other comparison operators in which the first field must be checked for equality. This index can't be used in the following queries: -* `WHERE b = $var1`. +* `WHERE b = $var1`; * `WHERE a > $var1 AND b > $var2`, which is equivalent to `WHERE a > $var1` in terms of applying the index. * `WHERE b > $var1`. @@ -56,7 +56,7 @@ If you use the YDB CLI, select the `--stats` option to enable printing statistic ## Updating data using a secondary index {#update} -The [`UPDATE`](../../yql/reference/syntax/update.md), [`UPSERT`](../../yql/reference/syntax/upsert_into.md), and [`REPLACE`](../../yql/reference/syntax/replace_into.md) YQL statements don't allow indicating the use of a secondary index to perform a search for data, so an attempt to make an `UPDATE ... WHERE indexed_field = $value` will result in a full scan of the table. To avoid this, you can first run `SELECT` by index to get the primary key value and then `UPDATE` by the primary key. You can also use `UPDATE ON`. +The [`UPDATE`](../../yql/reference/syntax/update.md), [`UPSERT`](../../yql/reference/syntax/upsert_into.md), and [`REPLACE`](../../yql/reference/syntax/replace_into.md) YQL statements don't permit indicating the use of a secondary index to perform a search for data, so an attempt to make an `UPDATE ... WHERE indexed_field = $value` will result in a full scan of the table. To avoid this, you can first run `SELECT` by index to get the primary key value and then `UPDATE` by the primary key. You can also use `UPDATE ON`. To update data in the `table1` table, run the query: @@ -84,8 +84,8 @@ WHERE views = 0; ## Performance of data writes to tables with secondary indexes {#write_performance} -You need additional data structures to enable secondary indexes. Support for these structures increases the cost of table data update operations. +You need additional data structures to enable secondary indexes. Support for these structures makes table data update operations more costly. -During synchronous index updates, a transaction is only committed after all the necessary data is written both in a table and synchronous indexes. As a result, it takes longer to execute it and makes it necessary to use [distributed transactions](../../concepts/transactions#distributed-tx) even if adding or updating entries in a single partition. +During synchronous index updates, a transaction is only committed after all the necessary data is written in both a table and synchronous indexes. As a result, it takes longer to execute it and makes it necessary to use [distributed transactions](../../concepts/transactions#distributed-tx) even if adding or updating entries in a single partition. -Indexes that are updated asynchronously let you use single-shard transactions. However, they only guarantee eventual consistency and still create a load on the database. +Indexes that are updated asynchronously let you use single-shard transactions. However, they only guarantee eventual consistency and still put a load on the database. diff --git a/ydb/docs/en/core/best_practices/cdc.md b/ydb/docs/en/core/best_practices/cdc.md index 5a042267faa..fc1ee597065 100644 --- a/ydb/docs/en/core/best_practices/cdc.md +++ b/ydb/docs/en/core/best_practices/cdc.md @@ -4,7 +4,7 @@ With [Change Data Capture](../concepts/cdc.md) (CDC), you can track changes in t ## Enabling and disabling CDC {#add-drop} -CDC is represented as a data schema object: a changefeed that can be added to a table or deleted from it using the [ADD CHANGEFEED and DROP CHANGEFEED](../yql/reference/syntax/alter_table.md#changefeed) directives of the YQL `ALTER TABLE` statement. +CDC is represented as a data schema object: a changefeed that can be added to a table or deleted from them using the [ADD CHANGEFEED and DROP CHANGEFEED](../yql/reference/syntax/alter_table.md#changefeed) directives of the YQL `ALTER TABLE` statement. ## Reading data from a topic {#read} @@ -14,7 +14,7 @@ You can read data using an [SDK](../reference/ydb-sdk) or the [{{ ydb-short-name path/to/table/changefeed_name ``` -> For example, if a table named `table` contains a changefeed named `updates_feed` in the `my` directory, its path looks like this: +> For example, if a table named `table` contains a changefeed named `updates_feed` in the `my` directory, its path looks as follows: > > ```text > my/table/updates_feed @@ -50,7 +50,7 @@ As a result, queries may take longer to execute and size limits for stored data In real-world use cases, enabling CDC has virtually no impact on the query execution time (whatever the mode), since almost all data required for making records is stored in the cache , while the records themselves are sent to a topic asynchronously. However, record delivery background activity slightly (by 1% to 10%) increases CPU utilization. -In addition, a changefeed is currently stored to a topic which has limited elasticity. This means that if the table partitioning scheme changes significantly, there arises an imbalance between the table partitions and topic partitions. This imbalance may also increase the time it takes to execute queries or lead to additional overheads for storing a changefeed. +When creating a changefeed for a table, the number of partitions of its storage (topic) is determined based on the current number of table partitions. If the number of source table partitions changes significantly (for example, after uploading a large amount of data or as a result of intensive accesses), an imbalance occurs between the table partitions and the topic partitions. This imbalance can also result in longer execution time for queries to modify data in the table or in unnecessary storage overheads for the changefeed. You can recreate the changefeed to correct the imbalance. ## Load testing {#workload} diff --git a/ydb/docs/en/core/best_practices/toc_i.yaml b/ydb/docs/en/core/best_practices/toc_i.yaml index f2e213ba550..a3f13c2f008 100644 --- a/ydb/docs/en/core/best_practices/toc_i.yaml +++ b/ydb/docs/en/core/best_practices/toc_i.yaml @@ -1,7 +1,7 @@ items: - name: Overview href: index.md -- name: Choosing a primary key for maximum performance +- name: Selecting a primary key for maximum performance href: pk_scalability.md - name: Schema design href: schema_design.md @@ -13,9 +13,10 @@ items: href: secondary_indexes.md - name: Change Data Capture href: cdc.md + when: feature_changefeed - name: Paginated output href: paging.md -- name: Loading large data volumes +- name: Uploading large data volumes href: batch_upload.md - name: Using timeouts - href: timeouts.md
\ No newline at end of file + href: timeouts.md diff --git a/ydb/docs/en/core/concepts/cdc.md b/ydb/docs/en/core/concepts/cdc.md index ef837e417f3..ac52c0b452f 100644 --- a/ydb/docs/en/core/concepts/cdc.md +++ b/ydb/docs/en/core/concepts/cdc.md @@ -15,9 +15,9 @@ When adding, updating, or deleting a table row, CDC generates a change record by * The number of topic partitions is fixed as of changefeed creation and remains unchanged (unlike tables, topics are not elastic). * Changefeeds support records of the following types of operations: * Updates - * Erases + * Deletes - Adding rows is a special update case, and a record of adding a row in a changefeed will look similar to an update record. + Adding rows is a special case of updates, and a record of adding a row in a changefeed will look similar to an update record. ## Record structure {#record-structure} @@ -37,9 +37,9 @@ A [JSON](https://en.wikipedia.org/wiki/JSON) record has the following structure: * `key`: An array of primary key component values. Always present. * `update`: Update flag. Present if a record matches the update operation. In `UPDATES` mode, it also contains the names and values of updated columns. -* `erase`: Erase flag. Present if a record matches the erase operation. -* `newImage`: Row snapshot that results from its being changed. Present in `NEW_IMAGE` and `NEW_AND_OLD_IMAGES` modes. Contains column names and values. -* `oldImage`: Row snapshot before the change. Present in `OLD_IMAGE` and `NEW_AND_OLD_IMAGES` modes. Contains column names and values. +* `erase`: Erase flag. Present if a record matches the delete operation. +* `newImage`: Row snapshot that results from its change. Present in `NEW_IMAGE` and `NEW_AND_OLD_IMAGES` modes. Contains column names and values. +* `oldImage`: Row snapshot before its change. Present in `OLD_IMAGE` and `NEW_AND_OLD_IMAGES` modes. Contains column names and values. > Sample record of an update in `UPDATES` mode: > @@ -81,7 +81,7 @@ A [JSON](https://en.wikipedia.org/wiki/JSON) record has the following structure: {% note info %} -* The same record may not contain the `update` and `erase` fields simultaneously, since these fields are operation flags (you can't update and erase a table row at the same time). However, each record contains one of these fields (any operation is either an update or an erase). +* The same record may not contain the `update` and `erase` fields simultaneously, since these fields are operation flags (you can't update and erase a table row at the same time). However, each record contains one of these fields (any operation is either an update or erase). * In `UPDATES` mode, the `update` field for update operations is an operation flag (update) and contains the names and values of updated columns. * JSON object fields containing column names and values (`newImage`, `oldImage`, and `update` in `UPDATES` mode), *do not include* the columns that are primary key components. * If a record contains the `erase` field (indicating that the record matches the erase operation), this is always an empty JSON object (`{}`). @@ -90,7 +90,7 @@ A [JSON](https://en.wikipedia.org/wiki/JSON) record has the following structure: ## Creating and deleting a changefeed {#ddl} -You can add a changefeed to an existing table or erase it using the [ADD CHANGEFEED and DROP CHANGEFEED](../yql/reference/syntax/alter_table.md#changefeed) directives of the YQL `ALTER TABLE` statement. When erasing a table, the changefeed added to it is also deleted. +You can add a changefeed to an existing table or delete it using the [ADD CHANGEFEED and DROP CHANGEFEED](../yql/reference/syntax/alter_table.md#changefeed) directives of the YQL `ALTER TABLE` statement. When deleting a table, the changefeed added to it is also deleted. ## CDC purpose and use {#best_practices} diff --git a/ydb/docs/en/core/concepts/cluster/_includes/common_scheme_ydb/tablets.md b/ydb/docs/en/core/concepts/cluster/_includes/common_scheme_ydb/tablets.md index a5dfb9e5a55..e50a0df12b4 100644 --- a/ydb/docs/en/core/concepts/cluster/_includes/common_scheme_ydb/tablets.md +++ b/ydb/docs/en/core/concepts/cluster/_includes/common_scheme_ydb/tablets.md @@ -16,13 +16,13 @@ A basic tablet is an LSM tree that holds all of its table data. One level below To learn more about blobs and distributed storages, see [here](../../distributed_storage.md). -For BlobStorage, blobs are an opaque entity. A tablet can store several types of blobs. The most frequently written blob is a (recovery) log blob. A tablet's log is arranged in a list of blobs, each of which contains information about the change being made to the tables. When run, the tablet finds the last blob in the log and then recursively reads all related blobs following the links. The log may also mention snapshot blobs, which are a type of blob that contain data from multiple log blobs after a merge (the merge operation in the LSM tree). +For BlobStorage, blobs are an opaque entity. A tablet can store several types of blobs. The most frequently written blob is a (recovery) log blob. A tablet's log is arranged as a list of blobs, each containing information about the change being made to the tables. When run, the tablet finds the last blob in the log and then recursively reads all related blobs following the links. The log may also contain links to snapshot blobs, which contain data from multiple log blobs after a merge (the merge operation in the LSM tree). The tablet writes blobs of different types to different *channels*. A channel specifies the branch of storage to store blobs in and performs various functions, such as: 1. Selecting a storage type (different channels may be linked to different types of storage devices: SSD, HDD, or NVMe). -1. Load balancing, because each channel has a limit on IOPS, available space and bandwidth. -1. Specifying the data type. When restoring the log, only the blobs from the null channel are read, which lets you distinguish them from other blobs. +2. Load balancing, because each channel has a limit on IOPS, available space and bandwidth. +3. Specifying the data type. When restoring the log, only the blobs from the null channel are read, which lets you distinguish them from other blobs. ### Tablet channel history {#history} @@ -35,4 +35,3 @@ This mechanism works as follows: For each channel, the TTabletStorageInfo structure contains the TTabletChannelInfo substructure with generation ranges and the group number corresponding to each range. The ranges are strictly adjacent to each other, the last range is open. Group numbers may overlap in different ranges and even across different channels: this is legal and quite common. When writing a blob, a tablet selects the most recent range for the corresponding channel since a write is always performed on behalf of a tablet's current generation. When reading a blob, the group number is fetched based on the BlobId.Generation of the blob being read. - diff --git a/ydb/docs/en/core/concepts/cluster/_includes/distributed_storage/distributed_storage_interface.md b/ydb/docs/en/core/concepts/cluster/_includes/distributed_storage/distributed_storage_interface.md index 0f0937693af..160247242e8 100644 --- a/ydb/docs/en/core/concepts/cluster/_includes/distributed_storage/distributed_storage_interface.md +++ b/ydb/docs/en/core/concepts/cluster/_includes/distributed_storage/distributed_storage_interface.md @@ -15,7 +15,7 @@ Each blob has a 192-bit ID consisting of the following fields (in the order used Two blobs are considered different if at least one of the first five parameters (TabletId, Channel, Generation, Step, or Cookie) differs in their IDs. So it is impossible to write two blobs that only differ in BlobSize and/or CrcMode. -For debugging purposes, there is string blob ID formatting that has interactions `[TabletId:Generation:Step:Channel:Cookie:BlobSize:PartId]`, for example, `[12345:1:1:0:0:1000:0]`. +For debugging purposes, there is string blob ID representation in `[TabletId:Generation:Step:Channel:Cookie:BlobSize:PartId]` format, for example, `[12345:1:1:0:0:1000:0]`. When writing a blob, the tablet selects the Channel, Step, and Cookie parameters. TabletId is fixed and must point to the tablet performing the write operation, while Generation must indicate the generation that the tablet performing the operation is running in. diff --git a/ydb/docs/en/core/concepts/datamodel/_includes/table.md b/ydb/docs/en/core/concepts/datamodel/_includes/table.md index 9573c5695af..32b168de30e 100644 --- a/ydb/docs/en/core/concepts/datamodel/_includes/table.md +++ b/ydb/docs/en/core/concepts/datamodel/_includes/table.md @@ -143,7 +143,7 @@ Each column group has a unique name within a table. You can set the composition A column family may contain any number of columns of its table, including none. Each table column can belong to a single column group (that is, column groups can't overlap). Column groups are set up when creating a table, but can be modified later. -Each table has a `default` column group that includes all the columns that don't belong to any other column group. Primary-key columns are always in the default column group and can't be moved to another group. +Each table has a `default` column group that includes all the columns that don't belong to any other column group. Primary-key columns are always in the default column group and can't be moved to another group. Column groups are assigned attributes that affect data storage: diff --git a/ydb/docs/en/core/concepts/toc_i.yaml b/ydb/docs/en/core/concepts/toc_i.yaml index 021b1428b45..2583de2e5c0 100644 --- a/ydb/docs/en/core/concepts/toc_i.yaml +++ b/ydb/docs/en/core/concepts/toc_i.yaml @@ -1,20 +1,19 @@ items: - { name: Overview, href: index.md } -- { name: Terms and definitions, href: databases.md } -- { name: Connecting to a database, href: connect.md } +- { name: Terms and definitions, href: databases.md } +- { name: Connecting to a database, href: connect.md } - name: Authentication href: auth.md - name: Data model and schema include: { path: datamodel/toc_p.yaml, mode: link } -- { name: Topic, href: topic.md } -- { name: Serverless and Dedicated operation modes, href: serverless_and_dedicated.md } -- { name: Data types, href: datatypes.md, hidden: true } # Deprecated -- { name: Transactions, href: transactions.md } -- { name: Secondary indexes, href: secondary_indexes.md } -- { name: Change Data Capture (CDC), href: cdc.md, when: feature_changefeed } -- { name: Time to Live (TTL), href: ttl.md } -- { name: Scan queries, href: scan_query.md } -- { name: Database limits, href: limits-ydb.md } +- { name: Serverless and Dedicated operation modes, href: serverless_and_dedicated.md } +- { name: Data types, href: datatypes.md, hidden: true } # Deprecated +- { name: Transactions, href: transactions.md } +- { name: Secondary indexes, href: secondary_indexes.md } +- { name: Change Data Capture (CDC), href: cdc.md, when: feature_changefeed } +- { name: Time to Live (TTL), href: ttl.md } +- { name: Scan queries, href: scan_query.md } +- { name: Database limits, href: limits-ydb.md } - name: YDB cluster items: - name: Overview diff --git a/ydb/docs/en/core/concepts/topic.md b/ydb/docs/en/core/concepts/topic.md index 0ae1f257eeb..b5889513c37 100644 --- a/ydb/docs/en/core/concepts/topic.md +++ b/ydb/docs/en/core/concepts/topic.md @@ -49,7 +49,7 @@ The recommended maximum number of <producer ID, message group ID> pairs is up to Let's consider a finance application that calculates the balance on a user's account and permits or prohibits debiting the funds. -For such tasks, you can use a message queue. When you top up your account, debit funds, or make a purchase, a message with the account ID, amount, and transaction type is registered in the queue. The application processes incoming messages and calculates the balance. +For such tasks, you can use a [message queue](https://en.wikipedia.org/wiki/Message_queue). When you top up your account, debit funds, or make a purchase, a message with the account ID, amount, and transaction type is registered in the queue. The application processes incoming messages and calculates the balance. To accurately calculate the balance, the message processing order is crucial. If a user first tops up their account and then makes a purchase, messages with details about these transactions must be processed by the app in the same order. Otherwise there may be an error in the business logic and the app will reject the purchase as a result of insufficient funds. There are guaranteed delivery order mechanisms, but they cannot ensure a message order within a single queue on an arbitrary data amount. @@ -95,7 +95,7 @@ A message group ID is an arbitrary string up to 2048 characters long. This is us ## Message sequence numbers {#seqno} -All messages from the same source have a [`sequence number`](#seqno) used for their deduplication. A message sequence number should monotonically increase within a `topic`, `source` pair. If the server receives a message whose sequence number is less than or equal to the maximum number written for the `topic`, `source` pair, the message will be skipped as a duplicate. Some sequence numbers in the sequence may be skipped. Message sequence numbers must be unique within the `topic`, `source` pair. +All messages from the same source have a [`sequence number`](#seqno) used for their deduplication. A message sequence number should monotonically increase within a `topic`, `source` pair. If the server receives a message whose sequence number is less than or equal to the maximum number written for the `topic`, `source` pair, the message will be skipped as a duplicate. Some sequence numbers in the sequence may be skipped. Message sequence numbers must be unique within the `topic`, `source` pair. ### Sample message sequence numbers {#seqno-examples} @@ -104,11 +104,11 @@ All messages from the same source have a [`sequence number`](#seqno) used for th | File | Offset of transferred data from the beginning of a file | You can't delete lines from the beginning of a file, since this will lead to skipping some data as duplicates or losing some data. | | DB table | Auto-increment record ID | -## Message retention period { #retention-time } +## Message retention period {#retention-time} The message retention period is set for each topic. After it expires, messages are automatically deleted. An exception is data that hasn't been read by an [important](#important-consumer) consumer: this data will be stored until it's read. -## Data compression { #message-codec } +## Data compression {#message-codec} When transferring data, the producer app indicates that a message can be compressed using one of the supported codecs. The codec name is passed while writing a message, saved along with it, and returned when reading the message. Compression applies to each individual message, no batch message compression is supported. Data is compressed and decompressed on the producer and consumer apps end. @@ -123,15 +123,15 @@ Supported codecs are explicitly listed in each topic. When making an attempt to {% endif %} `zstd` | [zstd](https://en.wikipedia.org/wiki/Zstd) compression. -## Consumer { #consumer } +## Consumer {#consumer} A consumer is a named entity that reads data from a topic. A consumer contains committed consumer offsets for each topic read on their behalf. -### Consumer offset { #consumer-offset } +### Consumer offset {#consumer-offset} A consumer offset is a saved [offset](#offset) of a consumer by each topic partition. It's saved by a consumer after sending commits of the data read. When a new read session is established, messages are delivered to the consumer starting with the saved consumer offset. This lets users avoid saving the consumer offset on their end. -### Important consumer { #important-consumer } +### Important consumer {#important-consumer} A consumer may be flagged as "important". This flag indicates that messages in a topic won't be removed until the consumer reads and confirms them. You can set this flag for most critical consumers that need to handle all data even if there's a long idle time. diff --git a/ydb/docs/en/core/getting_started/_includes/yql.md b/ydb/docs/en/core/getting_started/_includes/yql.md index b38f904fc1f..4134e06331a 100644 --- a/ydb/docs/en/core/getting_started/_includes/yql.md +++ b/ydb/docs/en/core/getting_started/_includes/yql.md @@ -24,10 +24,10 @@ In {{ ydb-short-name }}, you can make YQL queries to a database using: ### {{ ydb-short-name }} CLI {#cli} -To enable scripts execution using the {{ ydb-short-name }} CLI, ensure you have completed the following prerequisites: +To execute scripts using the {{ ydb-short-name }} CLI, first do the following: * [Install the CLI](../cli.md#install). -* Define and check [DB connection settings](../cli#scheme-ls) +* Define and check [DB connection parameters](../cli#scheme-ls). * [Create a `db1` profile](../cli.md#profile) configured to connect to your database. Save the text of the scripts below to a file. Name it `script.yql` to be able to run the statements given in the examples by simply copying them through the clipboard. Next, run `{{ ydb-cli }} yql` indicating the use of the `db1` profile and reading the script from the `script.yql` file: @@ -40,15 +40,15 @@ Save the text of the scripts below to a file. Name it `script.yql` to be able to ### Creating tables {#create-table} -A table with the specified columns is created [using the YQL `CREATE TABLE`](../../yql/reference/syntax/create_table.md) statement. Make sure the primary key is defined in the table. Column data types are described in [YQL data types](../../yql/reference/types/index.md). +A table with the specified columns is created [using the YQL `CREATE TABLE` command](../../yql/reference/syntax/create_table.md). Make sure the primary key is defined in the table. Column data types are described in [YQL data types](../../yql/reference/types/index.md). -Currently, {{ ydb-short-name }} doesn't support the `NOT NULL` constraint, all columns allow null values, including the primary key columns. In addition, {{ ydb-short-name }} doesn't support the `FOREIGN KEY` constraint. +All columns are optional by default and can contain `NULL`. You can specify a `NOT NULL` limit for the columns that are part of the primary key. {{ ydb-short-name }} does not support `FOREIGN KEY` limits. Create series directory tables named `series`, `seasons`, and `episodes` by running the following script: ```sql CREATE TABLE series ( - series_id Uint64, + series_id Uint64 NOT NULL, title Utf8, series_info Utf8, release_date Date, @@ -76,11 +76,11 @@ CREATE TABLE episodes ( For a description of everything you can do when working with tables, review the relevant sections of the YQL documentation: -* [CREATE TABLE](../../yql/reference/syntax/create_table.md): Create a table and define its initial properties. -* [ALTER TABLE](../../yql/reference/syntax/alter_table.md): Modify a table's column structure and properties. +* [CREATE TABLE](../../yql/reference/syntax/create_table.md): Create a table and define its initial parameters. +* [ALTER TABLE](../../yql/reference/syntax/alter_table.md): Modify a table's column structure and parameters. * [DROP TABLE](../../yql/reference/syntax/drop_table.md): Delete a table. -To execute a script via the {{ ydb-short-name }} CLI, follow the instructions provided under [Executing YQL scripts in the {{ ydb-short-name }} CLI](#cli) above. +To execute the script via the {{ ydb-short-name }} CLI, follow the instructions given under [Executing YQL queries in the {{ ydb-short-name }} CLI](#cli) in this article. ### Getting a list of existing DB tables {#scheme-ls} @@ -88,7 +88,7 @@ Check that the tables are actually created in the database. {% include [yql/ui_scheme_ls.md](yql/ui_scheme_ls.md) %} -To get a list of existing DB tables via the {{ ydb-short-name }} CLI, make sure that the prerequisites under [Executing YQL scripts in the {{ ydb-short-name }} CLI](#cli) above are complete and run the [`scheme ls` statement](../cli.md#ping): +To get a list of existing DB tables via the {{ ydb-short-name }} CLI, make sure that the prerequisites under [Executing YQL scripts in the {{ ydb-short-name }} CLI](#cli) are complete and run the [`scheme ls` command](../cli.md#ping): ```bash {{ ydb-cli }} --profile db1 scheme ls @@ -96,13 +96,13 @@ To get a list of existing DB tables via the {{ ydb-short-name }} CLI, make sure ## Operations with data {#dml} -Commands for running YQL queries and scripts in the YDB CLI and the web interface run in Autocommit mode meaning that a transaction is committed automatically after it is completed. +Commands for running YQL queries and scripts in the YDB CLI and the web interface run in Autocommit mode meaning that a transaction is committed automatically after it is completed. ### UPSERT: Adding data {#upsert} -The most efficient way to add data to {{ ydb-short-name }} is through the [`UPSERT`](../../yql/reference/syntax/upsert_into.md) statement. It inserts new data by primary keys regardless of whether data by these keys previously existed in the table. As a result, unlike regular `INSERT` and `UPDATE`, it does not require a data pre-fetch on the server to verify that a key is unique. When working with {{ ydb-short-name }}, always consider `UPSERT` as the main way to add data and only use other statements when absolutely necessary. +The most efficient way to add data to {{ ydb-short-name }} is through the [`UPSERT` command](../../yql/reference/syntax/upsert_into.md). It inserts new data by primary keys regardless of whether data by these keys previously existed in the table. As a result, unlike regular `INSERT`and `UPDATE`, it does not require a data pre-fetch from the server to verify that a key is unique before it runs. When working with {{ ydb-short-name }}, always consider `UPSERT` as the main way to add data and only use other statements when absolutely necessary. -All statements that write data to {{ ydb-short-name }} support working with both subqueries and multiple entries passed directly in a query. +All commands that write data to {{ ydb-short-name }} support working with both samples and multiple logs passed directly in a query. Let's add data to the previously created tables: @@ -139,21 +139,21 @@ VALUES ; ``` -To execute a script via the {{ ydb-short-name }} CLI, follow the instructions provided under [Executing YQL scripts in the {{ ydb-short-name }} CLI](#cli) above. +To execute the script via the {{ ydb-short-name }} CLI, follow the instructions given under [Executing YQL queries in the {{ ydb-short-name }} CLI](#cli) in this article. To learn more about commands for writing data, see the YQL reference: -* [INSERT](../../yql/reference/syntax/insert_into.md): Add records. -* [REPLACE](../../yql/reference/syntax/replace_into.md): Add/update records. +* [INSERT](../../yql/reference/syntax/insert_into.md): Add logs. +* [REPLACE](../../yql/reference/syntax/replace_into.md): Add/update logs. * [UPDATE](../../yql/reference/syntax/update.md): Update specified fields. -* [UPSERT](../../yql/reference/syntax/upsert_into.md): Add records/modify specified fields. +* [UPSERT](../../yql/reference/syntax/upsert_into.md): Add logs/update specified fields. ### SELECT : Data retrieval {#select} Make a select of the data added in the previous step: ```sql -SELECT +SELECT series_id, title AS series_title, release_date @@ -168,25 +168,25 @@ SELECT * FROM episodes; If there are several `SELECT` statements in the YQL script, its execution will return several samples, each of which can be accessed separately. Run the above `SELECT` statements as a single script. -To execute a script via the {{ ydb-short-name }} CLI, follow the instructions provided under [Executing YQL scripts in the {{ ydb-short-name }} CLI](#cli) above. +To execute the script via the {{ ydb-short-name }} CLI, follow the instructions given under [Executing YQL queries in the {{ ydb-short-name }} CLI](#cli) in this article. To learn more about the commands for selecting data, see the YQL reference: * [SELECT](../../yql/reference/syntax/select.md): Select data. -* [SELECT ... JOIN](../../yql/reference/syntax/join.md): Join tables in a select. -* [SELECT ... GROUP BY](../../yql/reference/syntax/group_by.md): Group data in a select. +* [SELECT ... JOIN](../../yql/reference/syntax/join.md): Join tables when selecting data. +* [SELECT ... GROUP BY](../../yql/reference/syntax/group_by.md): Group data when selecting it. ### Parameterized queries {#param} -Transactional applications working with a database are characterized by the execution of multiple similar queries that only differ in parameters. Like most databases, {{ ydb-short-name }} will work more efficiently if you define variable parameters and their types and then initiate the execution of a query by passing the parameter values separately from its text. +Transactional applications working with a database are characterized by the execution of multiple similar queries that only differ in parameters. Like most databases, {{ ydb-short-name }} will work more efficiently if you define updateable parameters and their types and then initiate the execution of a query by passing the parameter values separately from its text. -To define parameters in the text of a YQL query, use the [DECLARE](../../yql/reference/syntax/declare.md) statement. +To define parameters in the text of a YQL query, use the [DECLARE](../../yql/reference/syntax/declare.md). -Methods for executing parameterized queries in the {{ ydb-short-name }} SDK are described in the [Test case](../../reference/ydb-sdk/example/index.md) section under Parameterized queries for the appropriate programming language. +A description of the execution methods for the parametrized {{ ydb-short-name }} SDK queries is available in the [Test example](../../reference/ydb-sdk/example/index.md) section under Parametrized queries for the desired programming language. When debugging a parameterized query in the {{ ydb-short-name }} SDK, you can test it by calling the {{ ydb-short-name }} CLI, copying the full text of the query without any edits, and setting parameter values. -Save the parameterized query script in a text file named`script.yql`: +Save the parameterized query script in a text file named `script.yql`: ```sql DECLARE $seriesId AS Uint64; @@ -198,13 +198,13 @@ INNER JOIN series AS sr ON sa.series_id = sr.series_id WHERE sa.series_id = $seriesId AND sa.season_id = $seasonId; ``` -To run a parameterized select query, make sure to complete the prerequisites under [Executing YQL scripts in the {{ ydb-short-name }} CLI](#cli) above and run: +To make a parameterized select query, make sure the prerequisites of the [Executing YQL scripts in the {{ ydb-short-name }} CLI](#cli) section of this article are met, then run: ```bash {{ ydb-cli }} --profile db1 yql -f script.yql -p '$seriesId=1' -p '$seasonId=1' ``` -For a full description of the ways to pass parameters, see [the {{ ydb-short-name }} CLI reference](../../reference/ydb-cli/index.md). +For a full description of the ways to pass parameters, see the [{{ ydb-short-name }} CLI reference](../../reference/ydb-cli/index.md). ## YQL tutorial {#tutorial} @@ -213,4 +213,3 @@ You can learn more about YQL use cases by completing tasks from the [YQL tutoria ## Learn more about YDB {#next} Proceed to the [YDB SDK - Getting started](../sdk.md) article to learn more about YDB. - diff --git a/ydb/docs/en/core/maintenance/manual/cluster_expansion.md b/ydb/docs/en/core/maintenance/manual/cluster_expansion.md index 9e7cd2b0b67..171cf3edfd3 100644 --- a/ydb/docs/en/core/maintenance/manual/cluster_expansion.md +++ b/ydb/docs/en/core/maintenance/manual/cluster_expansion.md @@ -1,78 +1,78 @@ # Cluster extension -You can expand the {{ ydb-short-name }} cluster by adding new nodes to the cluster configuration. +You can extend a {{ ydb-short-name }} cluster by adding new nodes to its configuration. -1. Specify the parameters of additional nodes in the file `names.txt ` NameserviceConfig configuration: +1. Specify the parameters of the additional nodes in the `names.txt` configuration file of NameserviceConfig: - ```protobuf - Node { - NodeId: 1 - Port: <ic-port> - Host: "<existing-host>" - InterconnectHost: "<existing-host>" - Location { - DataCenter: "DC1" - Module: "M1" - Rack: "R1" - Unit: "U1" - } - } - Node { - NodeId: 2 - Port: <ic-port> - Host: "<new-host>" - InterconnectHost: "<new-host>" - Location { - DataCenter: "DC1" - Module: "M2" - Rack: "R2" - Unit: "U2" - } - } - ClusterUUID: "<cluster-UUID>" - AcceptUUID: "<cluster-UUID>" - ``` + ```protobuf + Node { + NodeId: 1 + Port: <ic-port> + Host: "<existing-host>" + InterconnectHost: "<existing-host>" + Location { + DataCenter: "DC1" + Module: "M1" + Rack: "R1" + Unit: "U1" + } + } + Node { + NodeId: 2 + Port: <ic-port> + Host: "<new-host>" + InterconnectHost: "<new-host>" + Location { + DataCenter: "DC1" + Module: "M2" + Rack: "R2" + Unit: "U2" + } + } + ClusterUUID: "<cluster-UUID>" + AcceptUUID: "<cluster-UUID>" + ``` -1. [Update the NameserviceConfig](./cms.md) via CMS. +1. [Update the configuration](./cms.md) of NameserviceConfig using a CMS. -1. Add new nodes to DefineBox +1. Add the new nodes to DefineBox: - ```protobuf - Command { - DefineHostConfig { - HostConfigId: 1 - Drive { - Path: "<device-path>" - Type: SSD - PDiskConfig { - ExpectedSlotCount: 2 - } - } - } - } - Command { - DefineBox { - BoxId: 1 - Host { - Key { - Fqdn: "<existing-host>" - IcPort: <ic-port> - } - HostConfigId: 1 - } - Host { - Key { - Fqdn: "<new-host>" - IcPort: <ic-port> - } - HostConfigId: 1 - } - } - } - ``` + ```protobuf + Command { + DefineHostConfig { + HostConfigId: 1 + Drive { + Path: "<device-path>" + Type: SSD + PDiskConfig { + ExpectedSlotCount: 2 + } + } + } + } + Command { + DefineBox { + BoxId: 1 + Host { + Key { + Fqdn: "<existing-host>" + IcPort: <ic-port> + } + HostConfigId: 1 + } + Host { + Key { + Fqdn: "<new-host>" + IcPort: <ic-port> + } + HostConfigId: 1 + } + } + } + ``` 1. Run the command: - ```protobuf - kikimr -s <endpoint> admin bs config invoke --proto-file DefineBox.txt - ``` + ```protobuf + kikimr -s <endpoint> admin bs config invoke --proto-file DefineBox.txt + ``` diff --git a/ydb/docs/en/core/reference/ydb-cli/commands/workload/_includes/stock.md b/ydb/docs/en/core/reference/ydb-cli/commands/workload/_includes/stock.md index bc67361789b..ca39991d06e 100644 --- a/ydb/docs/en/core/reference/ydb-cli/commands/workload/_includes/stock.md +++ b/ydb/docs/en/core/reference/ydb-cli/commands/workload/_includes/stock.md @@ -5,17 +5,15 @@ Simulates a warehouse of an online store: creates multi-product orders, gets a l ## Types of load {#workload_types} This load test runs 5 types of load: - -* [user-hist](#getCustomerHistory) reads the specified number of orders for the customer with id = 10000. This creates a workload to read the same rows from different threads. -* [rand-user-hist](#getRandomCustomerHistory) reads the specified number of orders made by a randomly selected customer. A load that reads data from different threads is created. -* [add-rand-order](#insertRandomOrder) creates a random order. For example, a customer has created an order of 2 products, but hasn't yet paid for it, hence the quantities in stock aren't decreased for the products. The database writes the data about the order and products. The read/write load is created (the INSERT checks for an existing entry before inserting the data). -* [put-rand-order](#submitRandomOrder) creates and processes a randomly generated order. For example, a customer has created and paid an order of 2 products. The data about the order and products is written to the database, product availability is checked and quantities in stock are decreased. A mixed data load is created. +* [user-hist](#getCustomerHistory): Reads the specified number of orders made by the customer with id = 10000. This creates a workload to read the same rows from different threads. +* [rand-user-hist](#getRandomCustomerHistory): Reads the specified number of orders made by a randomly selected customer. A load that reads data from different threads is created. +* [add-rand-order](#insertRandomOrder): Generates an order at random. For example, a customer has created an order of 2 products, but hasn't yet paid for it, hence the quantities in stock aren't decreased for the products. The database writes the data about the order and products. The read/write load is created (the INSERT checks for an existing entry before inserting the data). +* [put-rand-order](#submitRandomOrder): Generates an order at random and processes it. For example, a customer has created and paid an order of 2 products. The data about the order and products is written to the database, product availability is checked and quantities in stock are decreased. A mixed data load is created. * [put-same-order](#submitSameOrder): Creates orders with the same set of products. For example, all customers buy the same set of products (a newly released phone and a charger). This creates a workload of competing updates of the same rows in the table. -## Load test initialization +## Load test initialization {#init} To get started, create tables and populate them with data: - ```bash {{ ydb-cli }} workload stock init [init options...] ``` @@ -31,22 +29,21 @@ See the description of the command to run the data load: ### Available parameters {#init_options} | Parameter name | Short name | Parameter description | -| --- | --- | --- | -| `--products <value>` | `-p <value>` | Number of products. Valid values: between 1 and 500000. The default value is 100. | -| `--quantity <value>` | `-q <value>` | Quantity of each product in stock. Default value: 1000. | -| `--orders <value>` | `-o <value>` | Initial number of orders in the database. The default value is 100. | -| `--min-partitions <value>` | - | Minimum number of shards for tables. Default value: 40. | +---|---|--- +| `--products <value>` | `-p <value>` | Number of products. Valid values: between 1 and 500000. Default: 100. | +| `--quantity <value>` | `-q <value>` | Quantity of each product in stock. Default: 1000. | +| `--orders <value>` | `-o <value>` | Initial number of orders in the database. Default: 100. | +| `--min-partitions <value>` | - | Minimum number of shards for tables. Default: 40. | | `--auto-partition <value>` | - | Enabling/disabling auto-sharding. Possible values: 0 or 1. Default: 1. | 3 tables are created using the following DDL statements: - ```sql CREATE TABLE `stock`(product Utf8, quantity Int64, PRIMARY KEY(product)) WITH (AUTO_PARTITIONING_BY_LOAD = ENABLED, AUTO_PARTITIONING_MIN_PARTITIONS_COUNT = <min-partitions>); CREATE TABLE `orders`(id Uint64, customer Utf8, created Datetime, processed Datetime, PRIMARY KEY(id), INDEX ix_cust GLOBAL ON (customer, created)) WITH (READ_REPLICAS_SETTINGS = "per_az:1", AUTO_PARTITIONING_BY_LOAD = ENABLED, AUTO_PARTITIONING_MIN_PARTITIONS_COUNT = <min-partitions>, UNIFORM_PARTITIONS = <min-partitions>, AUTO_PARTITIONING_MAX_PARTITIONS_COUNT = 1000); CREATE TABLE `orderLines`(id_order Uint64, product Utf8, quantity Int64, PRIMARY KEY(id_order, product)) WITH (AUTO_PARTITIONING_BY_LOAD = ENABLED, AUTO_PARTITIONING_MIN_PARTITIONS_COUNT = <min-partitions>, UNIFORM_PARTITIONS = <min-partitions>, AUTO_PARTITIONING_MAX_PARTITIONS_COUNT = 1000); ``` -### Load initialization examples {#init-stock-examples} +### Examples of load initialization {#init-stock-examples} Creating a database with 1000 products, 10000 items of each product, and no orders: @@ -55,19 +52,16 @@ Creating a database with 1000 products, 10000 items of each product, and no orde ``` Creating a database with 10 products, 100 items of each product, 10 orders, and a minimum number of shards equal 100: - ```bash {{ ydb-cli }} workload stock init -p 10 -q 100 -o 10 ----min-partitions 100 ``` -## Running a load test +## Running a load test {#run} To run the load, execute the command: - ```bash {{ ydb-cli }} workload stock run [workload type...] [global workload options...] [specific workload options...] ``` - During this test, workload statistics for each time window are displayed on the screen. * `workload type`: The [types of workload](#workload_types). @@ -83,9 +77,9 @@ See the description of the command to run the data load: ### Global parameters for all types of load {#global_workload_options} | Parameter name | Short name | Parameter description | -| --- | --- | --- | -| `--seconds <value>` | `-s <value>` | Duration of the test, in seconds. Default value: 10. | -| `--threads <value>` | `-t <value>` | The number of parallel threads creating the load. Default value: 10. | +---|---|--- +| `--seconds <value>` | `-s <value>` | Duration of the test, in seconds. Default: 10. | +| `--threads <value>` | `-t <value>` | The number of parallel threads creating the load. Default: 10. | | `--quiet` | - | Outputs only the final test result. | | `--print-timestamp` | - | Print the time together with the statistics of each time window. | | `--client-timeout` | - | [Transport timeout in milliseconds](../../../../../best_practices/timeouts.md). | @@ -93,12 +87,12 @@ See the description of the command to run the data load: | `--cancel-after` | - | [Timeout for canceling an operation in milliseconds](../../../../../best_practices/timeouts.md). | | `--window` | - | Statistics collection window in seconds. Default: 1. | -## user-hist load{#getCustomerHistory} + +## The user-hist workload {#getCustomerHistory} This type of load reads the specified number of orders for the customer with id = 10000. YQL query: - ```sql DECLARE $cust AS Utf8; DECLARE $limit AS UInt32; @@ -110,26 +104,23 @@ SELECT id, customer, created FROM orders view ix_cust ``` To run this type of load, execute the command: - ```bash {{ ydb-cli }} workload stock run user-hist [global workload options...] [specific workload options...] ``` * `global workload options`: The [global options for all types of load](#global_workload_options). -* `specific workload options`: [Parameters of a specific type of load](#customer_history_options) +* `specific workload options`: [Options of a specific load type](#customer_history_options). ### Parameters for user-hist {#customer_history_options} - | Parameter name | Short name | Parameter description | -| --- | --- | --- | -| `--limit <value>` | `-l <value>` | The required number of orders. Default value: 10. | +---|---|--- +| `--limit <value>` | `-l <value>` | The required number of orders. Default: 10. | -## rand-user-hist load{#getRandomCustomerHistory} +## The rand-user-hist workload {#getRandomCustomerHistory} This type of load reads the specified number of orders from randomly selected customers. YQL query: - ```sql DECLARE $cust AS Utf8; DECLARE $limit AS UInt32; @@ -141,26 +132,23 @@ SELECT id, customer, created FROM orders view ix_cust ``` To run this type of load, execute the command: - ```bash {{ ydb-cli }} workload stock run rand-user-hist [global workload options...] [specific workload options...] ``` * `global workload options`: The [global options for all types of load](#global_workload_options). -* `specific workload options`: [Parameters of a specific type of load](#random_customer_history_options) +* `specific workload options`: [Options of a specific load type](#random_customer_history_options). ### Parameters for rand-user-hist {#random_customer_history_options} - | Parameter name | Short name | Parameter description | -| --- | --- | --- | +---|---|--- | `--limit <value>` | `-l <value>` | The required number of orders. Default: 10. | -## add-rand-order load{#insertRandomOrder} +## The add-rand-order workload {#insertRandomOrder} This type of load creates a randomly generated order. The order includes several different products, 1 item per product. The number of products in the order is generated randomly based on an exponential distribution. YQL query: - ```sql DECLARE $ido AS UInt64; DECLARE $cust AS Utf8; @@ -174,26 +162,23 @@ UPSERT INTO `orderLines`(id_order, product, quantity) ``` To run this type of load, execute the command: - ```bash {{ ydb-cli }} workload stock run add-rand-order [global workload options...] [specific workload options...] ``` * `global workload options`: The [global options for all types of load](#global_workload_options). -* `specific workload options`: [Parameters of a specific type of load](#insert_random_order_options) +* `specific workload options`: [Options of a specific load type](#insert_random_order_options). ### Parameters for add-rand-order {#insert_random_order_options} - | Parameter name | Short name | Parameter description | -| --- | --- | --- | -| `--products <value>` | `-p <value>` | Number of products in the test. The default value is 100. | +---|---|--- +| `--products <value>` | `-p <value>` | Number of products in the test. Default: 100. | -## put-rand-order load {#submitRandomOrder} +## The put-rand-order workload {#submitRandomOrder} This type of load creates a randomly generated order and processes it. The order includes several different products, 1 item per product. The number of products in the order is generated randomly based on an exponential distribution. Order processing consists in decreasing the number of ordered products in stock. YQL query: - ```sql DECLARE $ido AS UInt64; DECLARE $cust AS Utf8; @@ -231,26 +216,23 @@ SELECT * FROM $newq AS q WHERE q.quantity < 0 ``` To run this type of load, execute the command: - ```bash {{ ydb-cli }} workload stock run put-rand-order [global workload options...] [specific workload options...] ``` * `global workload options`: The [global options for all types of load](#global_workload_options). -* `specific workload options`: [Parameters of a specific type of load](#submit_random_order_options) +* `specific workload options`: [Options of a specific load type](#submit_random_order_options). ### Parameters for put-rand-order {#submit_random_order_options} - | Parameter name | Short name | Parameter description | -| --- | --- | --- | -| `--products <value>` | `-p <value>` | Number of products in the test. The default value is 100. | +---|---|--- +| `--products <value>` | `-p <value>` | Number of products in the test. Default: 100. | -## put-same-order load{#submitSameOrder} +## The put-same-order workload {#submitSameOrder} This type of load creates an order with the same set of products and processes it. Order processing consists in decreasing the number of ordered products in stock. YQL query: - ```sql DECLARE $ido AS UInt64; DECLARE $cust AS Utf8; @@ -275,7 +257,7 @@ $newq = LEFT JOIN stock AS s ON s.product = p.product; -$check = SELECT COUNT(*) as cntd FROM $newq AS q WHERE q.quantity >= 0; +$check = SELECT COUNT(*) AS cntd FROM $newq as q WHERE q.quantity >= 0; UPSERT INTO stock SELECT product, quantity FROM $newq WHERE $check=$cnt; @@ -288,30 +270,25 @@ SELECT * FROM $newq AS q WHERE q.quantity < 0 ``` To run this type of load, execute the command: - ```bash {{ ydb-cli }} workload stock run put-same-order [global workload options...] [specific workload options...] ``` * `global workload options`: The [global options for all types of load](#global_workload_options). -* `specific workload options`: [Parameters of a specific type of load](#submit_same_order_options) +* `specific workload options`: [Options of a specific load type](#submit_same_order_options). ### Parameters for put-same-order {#submit_same_order_options} - | Parameter name | Short name | Parameter description | -| --- | --- | --- | -| `--products <value>` | `-p <value>` | Number of products per order. The default value is 100. | +---|---|--- +| `--products <value>` | `-p <value>` | Number of products per order. Default: 100. | ## Examples of running the loads -* Run the load `add-rand-order` for 5 seconds across 10 threads with 1000 products. - +* Run the `add-rand-order` workload for 5 seconds across 10 threads with 1000 products. ```bash {{ ydb-cli }} workload stock run add-rand-order -s 5 -t 10 -p 1000 ``` - Possible result: - ```text Elapsed Txs/Sec Retries Errors p50(ms) p95(ms) p99(ms) pMax(ms) 1 132 0 0 69 108 132 157 @@ -324,27 +301,21 @@ Txs Txs/Sec Retries Errors p50(ms) p95(ms) p99(ms) pMax(ms) 779 155.8 0 0 62 89 108 157 ``` -* Run the `put-same-order` load for 5 seconds across 5 threads with 2 products per order, printing out only final results. - +* Run the `put-same-order` workload for 5 seconds across 5 threads with 2 products per order, printing out only final results. ```bash {{ ydb-cli }} workload stock run put-same-order -s 5 -t 5 -p 1000 --quiet ``` - Possible result: - ```text Txs Txs/Sec Retries Errors p50(ms) p95(ms) p99(ms) pMax(ms) 16 3.2 67 3 855 1407 1799 1799 ``` -* Run the `rand-user-hist` load for 5 seconds across 100 threads, printing out time for each time window. - +* Run the `rand-user-hist` workload for 5 seconds across 100 threads, printing out time for each time window. ```bash {{ ydb-cli }} workload stock run rand-user-hist -s 5 -t 10 --print-timestamp ``` - Possible result: - ```text Elapsed Txs/Sec Retries Errors p50(ms) p95(ms) p99(ms) pMax(ms) Timestamp 1 1046 0 0 7 16 25 50 2022-02-08T17:47:26Z @@ -358,7 +329,6 @@ Txs Txs/Sec Retries Errors p50(ms) p95(ms) p99(ms) pMax(ms) ``` ## Interpretation of results - * `Elapsed`: Time window ID. By default, a time window is 1 second. * `Txs/sec`: Number of successful load transactions in the time window. * `Retries`: The number of repeat attempts to execute the transaction by the client in the time window. @@ -368,4 +338,3 @@ Txs Txs/Sec Retries Errors p50(ms) p95(ms) p99(ms) pMax(ms) * `p99(ms)`: 99th percentile of request latency, in ms. * `pMax(ms)`: 100th percentile of request latency, in ms. * `Timestamp`: Timestamp of the end of the time window. - diff --git a/ydb/docs/en/core/reference/ydb-cli/topic.md b/ydb/docs/en/core/reference/ydb-cli/topic.md index 4e597bf2c63..a71492e6517 100644 --- a/ydb/docs/en/core/reference/ydb-cli/topic.md +++ b/ydb/docs/en/core/reference/ydb-cli/topic.md @@ -138,7 +138,7 @@ Delete the [previously created](#topic-create) topic: {{ ydb-cli }} -p db1 topic drop my-topic ``` -## Adding a consumer for a topic {#consumer-add} +## Creating a consumer for a topic {#consumer-add} You can use the `topic consumer add` subcommand to create a consumer for a [previously created](#topic-create) topic. @@ -232,62 +232,3 @@ Delete the [previously created](#consumer-add) consumer with the `my-consumer` n --consumer-name my-consumer \ my-topic ``` - -## Reading data from a topic {#topic-read} - -Use the `topic read` subcommand to read messages from a topic. - -Before reading, [create a topic](#topic-create) and [add a consumer](#consumer-add). - -General format of the command: - -```bash -{{ ydb-cli }} [global options...] topic read [options...] <topic-path> -``` - -* `global options`: [Global parameters](commands/global-options.md). -* `options`: [Parameters of the subcommand](#options). -* `topic-path`: Topic path. - -View a description of the read command from the topic: - -```bash -{{ ydb-cli }} topic read --help -``` - -### Parameters of the subcommand {#topic-read} - -| Name | Description | ----|--- -| `-c VAL`, `--consumer-name VAL` | Topic consumer name. | -| `--format STRING` | Result format.<br>Possible values:<ul><li>`pretty`: Result is printed to a pseudo-graphic table.</li><li>`newline-delimited`: The `0x0A` control character is printed at the end of each message.</li><li>`concatenated`: Result is printed without separators.</li></ul> | -| `-f VAL`, `--file VAL` | Write readable data to the specified file.<br>If the parameter is not specified, messages are printed to `stdout`. | -| `--idle-timeout VAL` | Maximum waiting time for a new message.<br>If no messages are received during the waiting time, reading stops.<br>The default value is `1s` (1 second). | -| `--commit VAL` | Sending confirmation for message processing.<br>The default value is `true`.<br>Possible values: `true`, `false`. | -| `--limit VAL` | The number of messages to be read.<br>The default value is `0` (no limits). | -| `-w`, `--wait` | Endless wait for the first message.<br>If the parameter is not specified, the first message is waited for for no more than `--idle-timeout`. | -| `--timestamp VAL` | Time in [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) format. Consumption starts as soon as the first [message](../../concepts/topic.md#message) is received after the specified time. | -| `--with-metadata-fields VAL` | A list of [message attributes](../../concepts/topic.md#message) whose values are to be printed.<br>Possible values:<ul><li>`write_time`: The time a message is written to the server in [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) format.</li><li>`meta`: Message metadata.</li><li>`create_time`: The time a message is created by the source in [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) format.</li><li>`seq_no`: Message [sequence number](../../concepts/topic.md#seqno).</li><li>`offset`: [Message sequence number within a partition](../../concepts/topic.md#offset).</li><li>`message_group_id`: [Message group ID](../../concepts/topic.md#producer-id).</li><li>`body`: Message body.</li></ul> | -| `--transform VAL` | Specify the format of the message body to be converted.<br>The default value is `none`.<br>Possible values:<ul><li>`base64`: Convert to [Base64](https://en.wikipedia.org/wiki/Base64).</li><li>`none`: Do not convert.</li></ul> | - -### Examples {#topic-read} - -Read all messages from the `my-topic` topic through the `my-consumer` consumer and print each of them on a separate line: - -```bash -{{ ydb-cli }} topic read \ - --consumer-name my-consumer \ - --format newline-delimited \ - my-topic -``` - -The following command will read the first 10 messages from the `my-topic` topic through the `my-consumer` consumer and print each of them on a separate line. Before that, the message body will be converted to Base64: - -```bash -{{ ydb-cli }} topic read \ - --consumer-name my-consumer \ - --format newline-delimited - --limit 10 \ - --transform base64 \ - my-topic -``` diff --git a/ydb/docs/en/core/reference/ydb-sdk/example/_includes/example-cpp.md b/ydb/docs/en/core/reference/ydb-sdk/example/_includes/example-cpp.md index a9f0329c9c7..0e1148eafff 100644 --- a/ydb/docs/en/core/reference/ydb-sdk/example/_includes/example-cpp.md +++ b/ydb/docs/en/core/reference/ydb-sdk/example/_includes/example-cpp.md @@ -1,6 +1,6 @@ # App in C++ -This page contains a detailed description of the code of a [test app](https://github.com/ydb-platform/ydb/tree/main/ydb/public/sdk/cpp/examples/basic_example) that is available as part of the {{ ydb-short-name }} [C++ SDK](https://github.com/ydb-platform/ydb/tree/main/ydb/public/sdk/cpp). +This page contains a detailed description of the code of a [test app](https://github.com/ydb-platform/ydb/tree/main/ydb/public/sdk/cpp/examples/basic_example) that is available as part of the {{ ydb-short-name }} [C++SDK](https://github.com/ydb-platform/ydb/tree/main/ydb/public/sdk/cpp). {% include [init.md](steps/01_init.md) %} @@ -29,7 +29,7 @@ To create tables, use the `CreateTable` method: //! Creates sample tables with CrateTable API. ThrowOnError(client.RetryOperationSync([path](TSession session) { auto seriesDesc = TTableBuilder() - .AddNullableColumn("series_id", EPrimitiveType::Uint64) + .AddNonNullableColumn("series_id", EPrimitiveType::Uint64) .AddNullableColumn("title", EPrimitiveType::Utf8) .AddNullableColumn("series_info", EPrimitiveType::Utf8) .AddNullableColumn("release_date", EPrimitiveType::Uint64) @@ -40,7 +40,7 @@ To create tables, use the `CreateTable` method: })); ``` -Use the `describeTable` method to output information about the table structure and make sure that it was properly created. +Use the `describeTable` method to view details about the table structure and make sure that it was properly created. ```c++ TMaybe<TTableDescription> desc; @@ -61,11 +61,11 @@ Use the `describeTable` method to output information about the table structure a } ``` -The given code snippet outputs the following text to the console at startup: +The given code snippet prints the following text to the console at startup: ```bash > Describe table: series -Column, name: series_id, type: Uint64? +Column, name: series_id, type: Uint64 Column, name: title, type: Utf8? Column, name: series_info, type: Utf8? Column, name: release_date, type: Uint64? @@ -73,7 +73,7 @@ Column, name: release_date, type: Uint64? {% include [steps/03_write_queries.md](steps/03_write_queries.md) %} -Code snippet for inserting and updating data: +Code snippet for data insert/update: ```c++ //! Shows basic usage of mutating operations. @@ -132,27 +132,30 @@ static TStatus SelectSimpleTransaction(TSession session, const TString& path, {% include [steps/05_results_processing.md](steps/05_results_processing.md) %} -The `TResultSetParser` class is used for processing query results. +The `TResultSetParser` class is used for processing query execution results. + The code snippet below shows how to process query results using the `parser` object: ```c++ TResultSetParser parser(*resultSet); if (parser.TryNextRow()) { Cout << "> SelectSimple:" << Endl << "Series" - << ", Id: " << parser.ColumnParser("series_id").GetOptionalUint64() + << ", Id: " << parser.ColumnParser("series_id").GetUint64() << ", Title: " << parser.ColumnParser("title").GetOptionalUtf8() << ", Release date: " << parser.ColumnParser("release_date").GetOptionalString() << Endl; } ``` -The given code snippet outputs the following text to the console at startup: +The given code snippet prints the following text to the console at startup: ```bash > SelectSimple: series, Id: 1, title: IT Crowd, Release date: 2006-02-03 ``` + + {% include [param_queries.md](steps/06_param_queries.md) %} The code snippet shows the use of parameterized queries and the `GetParamsBuilder` to generate parameters and pass them to the `ExecuteDataQuery` method. @@ -198,7 +201,7 @@ static TStatus SelectWithParamsTransaction(TSession session, const TString& path } ``` -The given code snippet outputs the following text to the console at startup: +The given code snippet prints the following text to the console at startup: ```bash > SelectWithParams: @@ -262,7 +265,7 @@ static TStatus PreparedSelectTransaction(TSession session, const TString& path, } ``` -The given code snippet outputs the following text to the console at startup: +The given code snippet prints the following text to the console at startup: ```bash > PreparedSelect: diff --git a/ydb/docs/en/core/reference/ydb-sdk/example/_includes/example-dotnet.md b/ydb/docs/en/core/reference/ydb-sdk/example/_includes/example-dotnet.md index 66eddc8c040..04e5119c6ff 100644 --- a/ydb/docs/en/core/reference/ydb-sdk/example/_includes/example-dotnet.md +++ b/ydb/docs/en/core/reference/ydb-sdk/example/_includes/example-dotnet.md @@ -1,6 +1,6 @@ # App in C# (.NET) -This page contains a detailed description of the code of a [test app](https://github.com/ydb-platform/ydb-dotnet-examples) that uses the [C# (.NET) SDK](https://github.com/ydb-platform/ydb-dotnet-sdk) {{ ydb-short-name }}. +This page contains a detailed description of the code of a [test app](https://github.com/ydb-platform/ydb-dotnet-examples) that uses the {{ ydb-short-name }} [C# (.NET) SDK](https://github.com/ydb-platform/ydb-dotnet-sdk). {% include [addition.md](auxilary/addition.md) %} @@ -43,7 +43,7 @@ var response = await tableClient.SessionExec(async session => { return await session.ExecuteSchemeQuery(@" CREATE TABLE series ( - series_id Uint64, + series_id Uint64 NOT NULL, title Utf8, series_info Utf8, release_date Date, @@ -75,7 +75,7 @@ response.Status.EnsureSuccess(); {% include [steps/03_write_queries.md](steps/03_write_queries.md) %} -Code snippet for inserting and updating data: +Code snippet for data insert/update: ```c# var response = await tableClient.SessionExec(async session => @@ -147,12 +147,14 @@ The result of query execution (resultset) consists of an organized set of rows. foreach (var row in resultSet.Rows) { Console.WriteLine($"> Series, " + - $"series_id: {(ulong?)row["series_id"]}, " + + $"series_id: {(ulong)row["series_id"]}, " + $"title: {(string?)row["title"]}, " + $"release_date: {(DateTime?)row["release_date"]}"); } ``` + + {% include [scan_query.md](steps/08_scan_query.md) %} ```c# @@ -175,7 +177,7 @@ public void executeScanQuery() foreach (var row in resultSet.Rows) { Console.WriteLine($"> ScanQuery, " + - $"series_id: {(ulong?)row["series_id"]}, " + + $"series_id: {(ulong)row["series_id"]}, " + $"season_id: {(ulong?)row["season_id"]}, " + $"episodes_count: {(ulong)row["episodes_count"]}"); } @@ -183,4 +185,3 @@ public void executeScanQuery() } } ``` - diff --git a/ydb/docs/en/core/reference/ydb-sdk/example/_includes/example-nodejs.md b/ydb/docs/en/core/reference/ydb-sdk/example/_includes/example-nodejs.md index fb380281b76..0a04e2a9982 100644 --- a/ydb/docs/en/core/reference/ydb-sdk/example/_includes/example-nodejs.md +++ b/ydb/docs/en/core/reference/ydb-sdk/example/_includes/example-nodejs.md @@ -37,7 +37,7 @@ async function createTables(session: Session, logger: Logger) { new TableDescription() .withColumn(new Column( 'series_id', - Types.optional(Types.UINT64), + Types.UINT64, // not null column )) .withColumn(new Column( 'title', @@ -108,7 +108,7 @@ async function createTables(session: Session, logger: Logger) { } ``` -You can use the `Session.DescribeTable()` method to output information about the table structure and make sure that it was properly created: +You can use the `Session.DescribeTable()` method to view information about the table structure and make sure that it was properly created: ```ts async function describeTable(session: Session, tableName: string, logger: Logger) { @@ -125,7 +125,7 @@ await describeTable(session, 'episodes', logger); {% include [steps/03_write_queries.md](steps/03_write_queries.md) %} -Code snippet for inserting and updating data: +Code snippet for data insert/update: ```ts async function upsertSimple(session: Session, logger: Logger): Promise<void> { @@ -161,7 +161,8 @@ WHERE series_id = 1;`; {% include [param_queries.md](steps/06_param_queries.md) %} -The code snippet below shows the use of queries prepared with `Session.prepareQuery()` and parameters in the `Session.executeQuery()` method. +Here's a code sample that shows how to use the `Session.executeQuery()` method with the queries and parameters +prepared by `Session.prepareQuery()`. ```ts async function selectPrepared(session: Session, data: ThreeIds[], logger: Logger): Promise<void> { @@ -201,7 +202,7 @@ async function executeScanQueryWithParams(session: Session, logger: Logger): Pro const query = ` ${SYNTAX_V1} DECLARE $value AS Utf8; - + SELECT key FROM ${TABLE} WHERE value = $value;`; @@ -223,7 +224,7 @@ async function executeScanQueryWithParams(session: Session, logger: Logger): Pro {% include [transaction-control.md](steps/10_transaction_control.md) %} -Code snippet for `Session.beginTransaction()` and `Session.commitTransaction()` calls for beginning and ending a transaction: +Here's a code sample that demonstrates how to explicitly use the `Session.beginTransaction()` and `Session.сommitTransaction()` calls to create and terminate a transaction: ```ts async function explicitTcl(session: Session, ids: ThreeIds, logger: Logger) { @@ -258,4 +259,3 @@ async function explicitTcl(session: Session, ids: ThreeIds, logger: Logger) { ``` {% include [error-handling.md](steps/50_error_handling.md) %} - diff --git a/ydb/docs/en/core/reference/ydb-sdk/example/go/index.md b/ydb/docs/en/core/reference/ydb-sdk/example/go/index.md index b9f4a97e2ee..8ed7fe660c7 100644 --- a/ydb/docs/en/core/reference/ydb-sdk/example/go/index.md +++ b/ydb/docs/en/core/reference/ydb-sdk/example/go/index.md @@ -4,7 +4,7 @@ This page contains a detailed description of the code of a [test app](https://gi ## Downloading and starting {#download} -The startup script given below uses [git](https://git-scm.com/downloads) and [Go](https://go.dev/doc/install). Be sure to install the [YDB Go SDK](../../install.md) first. +The following execution scenario is based on [git](https://git-scm.com/downloads) and [Go](https://go.dev/doc/install). Be sure to install the [YDB Go SDK](../../install.md). Create a working directory and use it to run from the command line the command to clone the GitHub repository: @@ -16,6 +16,8 @@ Next, from the same working directory, run the command to start the test app. Th {% include [run_options.md](_includes/run_options.md) %} + + {% include [init.md](../_includes/steps/01_init.md) %} To work with `YDB` in `Go`, import the `ydb-go-sdk` driver package: @@ -26,14 +28,14 @@ import ( "context" "path" - // imports of ydb-go-sdk packages + // importing the packages ydb-go-sdk "github.com/ydb-platform/ydb-go-sdk/v3" - "github.com/ydb-platform/ydb-go-sdk/v3/table" // to work with the table service - "github.com/ydb-platform/ydb-go-sdk/v3/table/options" // to work with the table service - "github.com/ydb-platform/ydb-go-sdk/v3/table/result" // to work with the table service - "github.com/ydb-platform/ydb-go-sdk/v3/table/result/named" // to work with the table service - "github.com/ydb-platform/ydb-go-sdk-auth-environ" // for authentication using environment variables - "github.com/ydb-platform/ydb-go-yc" // to work with YDB in Yandex.Cloud + "github.com/ydb-platform/ydb-go-sdk/v3/table" // needed to work with the table service + "github.com/ydb-platform/ydb-go-sdk/v3/table/options" // needed to work with the table service + "github.com/ydb-platform/ydb-go-sdk/v3/table/result" // needed to work with the table service + "github.com/ydb-platform/ydb-go-sdk/v3/table/result/named" // needed to work with the table service + "github.com/ydb-platform/ydb-go-sdk-auth-environ" // needed to authenticate using the environment variables + "github.com/ydb-platform/ydb-go-yc" // to work with YDB in Yandex Cloud ) ``` @@ -45,20 +47,20 @@ ctx := context.Background() dsn := "grpcs://ydb.serverless.yandexcloud.net:2135/?database=/ru-central1/b1g8skpblkos03malf3s/etn01f8gv9an9sedo9fu" // IAM token token := "t1.9euelZrOy8aVmZKJm5HGjceMkMeVj-..." -// creating a DB connection object, which is the input point for YDB services +// create a connection object called db, it is an entry point for YDB services db, err := ydb.Open(ctx, dsn, -// yc.WithInternalCA(), // using Yandex.Cloud certificates - ydb.WithAccessTokenCredentials(token), // token-based authentication -// ydb.WithAnonimousCredentials(), // anonymous authentication (for example, in docker ydb) -// yc.WithMetadataCredentials(token), // authentication from inside a VM in Yandex.Cloud or a function in Yandex Functions -// yc.WithServiceAccountKeyFileCredentials("~/.ydb/sa.json"), // authentication in Yandex.Cloud using a service account file -// environ.WithEnvironCredentials(ctx), // authentication using environment variables +// yc.WithInternalCA(), // use Yandex Cloud certificates + ydb.WithAccessTokenCredentials(token), // authenticate using the token +// ydb.WithAnonimousCredentials(), // authenticate anonymously (for example, using docker ydb) +// yc.WithMetadataCredentials(token), // authenticate from inside a VM in Yandex Cloud or Yandex Function +// yc.WithServiceAccountKeyFileCredentials("~/.ydb/sa.json"), // authenticate in Yandex Cloud using a service account file +// environ.WithEnvironCredentials(ctx), // authenticate using environment variables ) if err != nil { - // connection error handling + // handle a connection error } -// closing the driver at the end of the program is mandatory +// driver must be closed when done defer func() { _ = db.Close(ctx) } @@ -67,7 +69,7 @@ defer func() { The `db` object is an input point for working with `YDB` services. To work with the table service, use the `db.Table()` client. The client of the table service provides an `API` for making queries to tables. -The most popular method is `db.Table().Do(ctx, op)`. It implements background session creation and repeated attempts to perform the `op` user operation where the created session is passed to the user-defined code. +The most popular method is `db.Table().Do(ctx, op)`. It implements session creation in the background and the repeat attempts to execute the custom `op` operation where the created session is passed to the user's code. The session has an exhaustive `API` that lets you perform `DDL`, `DML`, `DQL`, and `TCL` requests. {% include [steps/02_create_table.md](../_includes/steps/02_create_table.md) %} @@ -79,7 +81,7 @@ err = db.Table().Do( ctx, func(ctx context.Context, s table.Session) (err error) { return s.CreateTable(ctx, path.Join(db.Name(), "series"), - options.WithColumn("series_id", types.Optional(types.TypeUint64)), + options.WithColumn("series_id", types.TypeUint64), // not null column options.WithColumn("title", types.Optional(types.TypeUTF8)), options.WithColumn("series_info", types.Optional(types.TypeUTF8)), options.WithColumn("release_date", types.Optional(types.TypeDate)), @@ -89,7 +91,7 @@ err = db.Table().Do( }, ) if err != nil { - // handling the situation when the request failed + // handling query execution failure } ``` @@ -111,7 +113,7 @@ err = db.Table().Do( } ) if err != nil { - // handling the situation when the request failed + // handling a situation when a query has failed } ``` @@ -134,9 +136,9 @@ err := db.Table().Do( func(ctx context.Context, s table.Session) (err error) { var ( res result.Result - id *uint64 // pointer - for optional results - title *string // pointer - for optional results - date *time.Time // pointer - for optional results + id uint64 // a variable for required results + title *string // a pointer for optional results + date *time.Time // a pointer for optional results ) _, res, err = s.Execute( ctx, @@ -153,22 +155,22 @@ err := db.Table().Do( series_id = $seriesID; `, table.NewQueryParameters( - table.ValueParam("$seriesID", types.Uint64Value(1)), // substitution in the query condition + table.ValueParam("$seriesID", types.Uint64Value(1)), // insert into the query criteria ), ) if err != nil { return err } defer func() { - _ = res.Close() // making sure the result is closed + _ = res.Close() // result must be closed }() log.Printf("> select_simple_transaction:\n") for res.NextResultSet(ctx) { for res.NextRow() { - // passing column names from the scanning line to ScanNamed, - // addresses (and data types) to assign query results to + // use ScanNamed to pass column names from the scan string, + // addresses (and data types) to be assigned the query results err = res.ScanNamed( - named.Optional("series_id", &id), + named.Required("series_id", id), named.Optional("title", &title), named.Optional("release_date", &date), ) @@ -177,7 +179,7 @@ err := db.Table().Do( } log.Printf( " > %d %s %s\n", - *id, *title, *date, + id, *title, *date, ) } } @@ -185,7 +187,7 @@ err := db.Table().Do( }, ) if err != nil { - // handling the query execution error + // handle a query execution error } ``` @@ -220,7 +222,7 @@ err = c.Do( return err } defer func() { - _ = res.Close() // making sure the result is closed + _ = res.Close() // result must be closed }() var ( seriesID uint64 @@ -234,10 +236,10 @@ err = c.Do( return err } for res.NextRow() { - // named.OptionalOrDefault lets you "deploy" optional - // results or use the default value of the go type + // named.OptionalOrDefault enables you to "deploy" optional + // results or use the default type value in Go err = res.ScanNamed( - named.OptionalOrDefault("series_id", &seriesID), + named.Required("series_id", &seriesID), named.OptionalOrDefault("season_id", &seasonID), named.OptionalOrDefault("title", &title), named.OptionalOrDefault("first_aired", &date), @@ -252,16 +254,15 @@ err = c.Do( }, ) if err != nil { - // handling the query execution error + // handling a query execution error } ``` + {% note info %} Sample code of a test app that uses archived of versions the Go SDK: - -- [github.com/yandex-cloud/ydb-go-sdk](https://github.com/yandex-cloud/ydb-go-sdk/tree/v1.5.1) is available at this [link](../archive/example-go-v1.md), -- [github.com/yandex-cloud/ydb-go-sdk/v2](https://github.com/yandex-cloud/ydb-go-sdk/tree/v2.11.2) is available at this [link](../archive/example-go-v2.md). +- [github.com/yandex-cloud/ydb-go-sdk](https://github.com/yandex-cloud/ydb-go-sdk/tree/v1.5.1) is available via the [link](../archive/example-go-v1.md), +- [github.com/yandex-cloud/ydb-go-sdk/v2](https://github.com/yandex-cloud/ydb-go-sdk/tree/v2.11.2) is available via the [link](../archive/example-go-v2.md). {% endnote %} - diff --git a/ydb/docs/en/core/reference/ydb-sdk/example/python/index.md b/ydb/docs/en/core/reference/ydb-sdk/example/python/index.md index 65b5a1cd211..4a478ba4c9e 100644 --- a/ydb/docs/en/core/reference/ydb-sdk/example/python/index.md +++ b/ydb/docs/en/core/reference/ydb-sdk/example/python/index.md @@ -4,11 +4,11 @@ This page contains a detailed description of the code of a [test app](https://gi ## Downloading and starting {#download} -The start scenario given below uses [git](https://git-scm.com/downloads) and [Python3](https://www.python.org/downloads/). Be sure to install the [YDB Python SDK](../../install.md) first. +The following execution scenario is based on [git](https://git-scm.com/downloads) and [Python3](https://www.python.org/downloads/). Be sure to install the [YDB Python SDK](../../install.md). Create a working directory and use it to run from the command line the command to clone the GitHub repository and install the necessary Python packages: -``` bash +```bash git clone https://github.com/ydb-platform/ydb-python-sdk.git python3 -m pip install iso8601 ``` @@ -52,7 +52,7 @@ def create_tables(session, path): session.create_table( os.path.join(path, 'series'), ydb.TableDescription() - .with_column(ydb.Column('series_id', ydb.OptionalType(ydb.PrimitiveType.Uint64))) + .with_column(ydb.Column('series_id', ydb.PrimitiveType.Uint64)) # not null column .with_column(ydb.Column('title', ydb.OptionalType(ydb.PrimitiveType.Utf8))) .with_column(ydb.Column('series_info', ydb.OptionalType(ydb.PrimitiveType.Utf8))) .with_column(ydb.Column('release_date', ydb.OptionalType(ydb.PrimitiveType.Uint64))) @@ -60,7 +60,7 @@ def create_tables(session, path): ) ``` -The absolute path from the root is passed in the path parameter: +The path parameter accepts the absolute path starting from the root: ```python full_path = os.path.join(database, path) @@ -76,7 +76,7 @@ def describe_table(session, path, name): print("column, name:", column.name, ",", str(column.type.item).strip()) ``` -The given code snippet outputs the following text to the console at startup: +The given code snippet prints the following text to the console at startup: ```bash > describe table: series @@ -87,7 +87,7 @@ The given code snippet outputs the following text to the console at startup: ``` {% include [steps/03_write_queries.md](../_includes/steps/03_write_queries.md) %} -Code snippet for inserting and updating data: +Code snippet for data insert/update: ```python def upsert_simple(session, path): @@ -108,7 +108,7 @@ def upsert_simple(session, path): To execute YQL queries, use the `session.transaction().execute()` method. The SDK lets you explicitly control the execution of transactions and configure the transaction execution mode using the `TxControl` class. -In the code snippet below, the transaction is executed using the `transaction().execute()` method. The transaction execution mode set is `ydb.SerializableReadWrite()`. When all the queries in the transaction are completed, the transaction is automatically committed by explicitly setting the flag: `commit_tx=True`. The query body is described using the YQL syntax and is passed to the `execute` method as a parameter. +In the code snippet below, the transaction is executed using the `transaction().execute()` method. The transaction execution mode set is `ydb.SerializableReadWrite()`. When all the queries in the transaction are completed, the transaction is automatically committed by explicitly setting the flag `commit_tx=True`. The query body is described using YQL syntax and is passed to the `execute` method as a parameter. ```python def select_simple(session, path): @@ -173,7 +173,7 @@ def select_prepared(session, path, series_id, season_id, episode_id): return result_sets[0] ``` -The given code snippet outputs the following text to the console at startup: +The given code snippet prints the following text to the console at startup: ```bash > select_prepared_transaction: diff --git a/ydb/docs/en/core/reference/ydb-sdk/toc_i.yaml b/ydb/docs/en/core/reference/ydb-sdk/toc_i.yaml index f277c552e97..67caea54c24 100644 --- a/ydb/docs/en/core/reference/ydb-sdk/toc_i.yaml +++ b/ydb/docs/en/core/reference/ydb-sdk/toc_i.yaml @@ -5,8 +5,8 @@ items: href: install.md - name: Authentication href: auth.md -# - name: Working with topics -# href: topic/topic.md + - name: Working with topics + href: topic.md - name: Test app include: { mode: link, path: example/toc_p.yaml } - name: Handling errors in the API @@ -19,4 +19,3 @@ items: href: health-check-api.md - name: Code recipes include: { mode: link, path: recipes/toc_p.yaml } - diff --git a/ydb/docs/en/core/reference/ydb-sdk/topic.md b/ydb/docs/en/core/reference/ydb-sdk/topic.md index ad4d6a3ff67..2019e813e65 100644 --- a/ydb/docs/en/core/reference/ydb-sdk/topic.md +++ b/ydb/docs/en/core/reference/ydb-sdk/topic.md @@ -1,6 +1,6 @@ # Working with topics -This article provides examples of how to use {{ ydb-short-name }} SDK to work with [topics](../../concepts/topic.md). +This article provides examples of how to use the {{ ydb-short-name }} SDK to work with [topics](../../concepts/topic.md). Before performing the examples, [create a topic](../ydb-cli/topic-create.md) and [add a consumer](../ydb-cli/topic-consumer-add.md). @@ -21,7 +21,7 @@ To create a connection to the existing `my-topic` topic via the added `my-consum {% endlist %} -You can also use the advanced connection creation option to specify multiple topics and set reading parameters. The following code will create a connection to the `my-topic` and `my-specific-topic` topics via the `my-consumer` consumer and also set the time to start reading messages from: +You can also use the advanced connection creation option to specify multiple topics and set read parameters. The following code will create a connection to the `my-topic` and `my-specific-topic` topics via the `my-consumer` consumer and also set the time to start reading messages: {% list tabs %} @@ -47,15 +47,15 @@ You can also use the advanced connection creation option to specify multiple top ## Reading messages {#reading-messages} -The server stores the [consumer offset](../../concepts/topic.md#consumer-offset). After reading a message, the client can [send a processing confirmation to the server](#commit). The consumer offset will change and only unconfirmed messages will be read in case of a new connection. +The server stores the [consumer offset](../../concepts/topic.md#consumer-offset). After reading a message, the client can [send a commit to the server](#commit). The consumer offset will change and only uncommitted messages will be read in case of a new connection. -You can read messages without a [processing confirmation](#no-commit) as well. In this case, all unconfirmed messages, including those processed, will be read if there is a new connection. +You can read messages without a [commit](#no-commit) as well. In this case, all uncommited messages, including those processed, will be read if there is a new connection. Information about which messages have already been processed can be [saved on the client side](#client-commit) by sending the starting consumer offset to the server when creating a new connection. This does not change the consumer offset on the server. The SDK receives data from the server in batches and buffers it. Depending on the task, the client code can read messages from the buffer one by one or in batches. -### Reading without a message processing confirmation {#no-commit} +### Reading without a commit {#no-commit} To read messages one by one, use the following code: @@ -97,9 +97,9 @@ To read message batches, use the following code: {% endlist %} -### Reading with a message processing confirmation {#commit} +### Reading with a commit {#commit} -To confirm the processing of messages one by one, use the following code: +To commit messages one by one, use the following code: {% list tabs %} @@ -112,7 +112,7 @@ To confirm the processing of messages one by one, use the following code: if err != nil { return err } - processMessage(mess) + processMessage(mess) r.Commit(mess.Context(), mess) } } @@ -120,7 +120,7 @@ To confirm the processing of messages one by one, use the following code: {% endlist %} -To confirm the processing of message batches, use the following code: +To commit message batches, use the following code: {% list tabs %} @@ -195,9 +195,9 @@ When reading starts, the client code must transmit the starting consumer offset {{ ydb-short-name }} uses server-based partition balancing between clients. This means that the server can interrupt the reading of messages from random partitions. -In case of a _soft interruption_, the client receives a notification that the server has finished sending messages from the partition and messages will no longer be read. The client can finish message processing and send a confirmation to the server. +In case of a _soft interruption_, the client receives a notification that the server has finished sending messages from the partition and messages will no longer be read. The client can finish processing messages and send a commit to the server. -In case of a _hard interruption_, the client receives a notification that it is no longer possible to work with partitions. The client must stop processing the read messages. Unconfirmed messages will be transferred to another consumer. +In case of a _hard interruption_, the client receives a notification that it is no longer possible to work with partitions. The client must stop processing the read messages. Uncommited messages will be transferred to another consumer. ### Soft reading interruption {#soft-stop} diff --git a/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/lexer.md b/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/lexer.md index acfc6669cc1..ea92d1c75cf 100644 --- a/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/lexer.md +++ b/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/lexer.md @@ -1,3 +1,4 @@ + # Lexical structure The {% if feature_mapreduce %}program {% else %}query {% endif %} in the YQL language is a valid UTF-8 text consisting of _commands_ (statements) separated by semicolons (`;`). @@ -9,11 +10,10 @@ Tokens are separated by whitespace characters (space, tab, line feed) or _commen ## Syntax compatibility modes {#lexer-modes} Two syntax compatibility modes are supported: - * Advanced C++ (default) * ANSI SQL -ANSI SQL mode is enabled with a special comment `--!ansi-lexer` that must be in the beginning of the {% if feature_mapreduce %}program{% else %}query{% endif %}. +ANSI SQL mode is enabled with a special comment `--!ansi-lexer`, which must be in the beginning of the {% if feature_mapreduce %}program{% else %}query{% endif %}. Specifics of interpretation of lexical elements in different compatibility modes are described below. @@ -30,7 +30,6 @@ SELECT 1; -- A single-line comment Some multi-line comment */ ``` - In C++ syntax compatibility mode (default), a multiline comment ends with the _nearest_ `*/`. The ANSI SQL syntax compatibility mode accounts for nesting of multiline comments: @@ -46,7 +45,6 @@ The list of keywords is not fixed and is going to expand as the language develop **Identifiers** are tokens that identify the names of tables, columns, and other objects in YQL. Identifiers in YQL are always case-sensitive. An identifier can be written in the body of the program without any special formatting, if the identifier: - * Is not a keyword * Begins with a Latin letter or underscore * Is followed by a Latin letter, an underscore, or a number @@ -56,7 +54,6 @@ SELECT my_column FROM my_table; -- my_column and my_table are identifiers ``` To include an arbitrary ID in the body of a {% if feature_mapreduce %}program{% else %}query{% endif %}, the ID is enclosed in backticks: - ```sql SELECT `column with space` from T; SELECT * FROM `my_dir/my_table` @@ -67,7 +64,6 @@ IDs in backticks are never interpreted as keywords: ```sql SELECT `select` FROM T; -- select - Column name in the T table ``` - When using backticks, you can use the standard C escaping: ```sql @@ -81,16 +77,70 @@ In ANSI SQL syntax compatibility mode, arbitrary IDs can also be enclosed in dou SELECT 1 as "column with "" double quote"; -- column name will be: column with " double quote ``` +## SQL hints {#sql-hints} + +SQL hints are special settings with which a user can modify a query execution plan +(for example, enable/disable specific optimizations or force the JOIN execution strategy). +Unlike [PRAGMA](../pragma.md), SQL hints act locally – they are linked to a specific point in the YQL query (normally, after the keyword) +and affect only the corresponding statement or even a part of it. +SQL hints are a set of settings "name-value list" and defined inside special comments — +comments with SQL hints must have `+` as the first character: +```sql +--+ Name1(Value1 Value2 Value3) Name2(Value4) ... +``` +An SQL hint name must be comprised of ASCII alphanumeric characters and start with a letter. Hint names are case insensitive. +A hint name must be followed by a custom number of space-separated values. A value can be a custom set of characters. +If there's a space or parenthesis in a set of characters, single quotation marks must be used: + +```sql +--+ foo('value with space and paren)') +``` + +```sql +--+ foo('value1' value2) +-- equivalent to +--+ foo(value1 value2) +``` + +To escape a single quotation within a value, double it: + +```sql +--+ foo('value with single quote '' inside') +``` + +If there're two or more hints with the same name in the list, the latter is used: +```sql +--+ foo(v1 v2) bar(v3) foo() +-- equivalent to +--+ bar(v3) foo() +``` + +Unknown SQL hint names (or syntactically incorrect hints) never result in errors, they're simply ignored: +```sql +--+ foo(value1) bar(value2 baz(value3) +-- due to a missing closing parenthesis in bar, is equivalent to +--+ foo(value1) +``` +Thanks to this behavior, previous valid YQL queries with comments that look like hints remain intact. +Syntactically correct SQL hints in a place unexpected for YQL result in a warning: + +```sql +-- presently, hints after SELECT are not supported +SELECT /*+ foo(123) */ 1; -- warning 'Hint foo will not be used' +``` + +What's important is that SQL hints are hints for an optimizer, so: +* Hints never affect search results. +* As YQL optimizers improve, a situation is possible when a hint becomes outdated and is ignored (for example, the algorithm based on a given hint completely changes or the optimizer becomes so sophisticated that it can be expected to choose the best solution, so some manual settings are likely to interfere). + ## String literals {#string-literals} A string literal (constant) is expressed as a sequence of characters enclosed in single quotes. Inside a string literal, you can use the C-style escaping rules: - ```yql SELECT 'string with\n newline, \x0a newline and \' backtick '; ``` In the C++ syntax compatibility mode (default), you can use double quotes instead of single quotes: - ```yql SELECT "string with\n newline, \x0a newline and \" backtick "; ``` @@ -99,10 +149,10 @@ In ASNI SQL compatibility mode, double quotes are used for IDs, and the only esc ```sql --!ansi_lexer -SELECT 'string with '' quote'; -- result: a string with a ' quote +SELECT 'string with '' quote'; -- result: string with ' quote ``` -String literals can be used to produce [primitive type literals](../../builtins/basic#data-type-literals). +Based on string literals, [simple literals](../../builtins/basic#data-type-literals) can be obtained. ### Multi-line string literals {#multiline-string-literals} @@ -126,14 +176,13 @@ SELECT $text; ### Typed string literals {#typed-string-literals} -* For string literals, for example, [multiline](#multiline-string-literals) literals, the `String` type is used by default. +* For string literals, including [multi-string](#multiline-string-literals) ones, the `String` type is used by default. * You can use the following suffixes to explicitly control the literal type: - * `u`: `Utf8`. - * `y`: `Yson`. - * `j`: `Json`. + * `u` — `Utf8`; + * `y`: `Yson`. + * `j`: `Json`. **Example:** - ```yql SELECT "foo"u, '[1;2]'y, @@{"a":null}@@j; ``` @@ -142,16 +191,16 @@ SELECT "foo"u, '[1;2]'y, @@{"a":null}@@j; * Integer literals have the default type `Int32`, if they fit within the Int32 range. Otherwise, they automatically expand to `Int64`. * You can use the following suffixes to explicitly control the literal type: - * `l`: `Int64`. - * `s`: `Int16`. - * `t`: `Int8`. + * `l`: `Int64`. + * `s`: `Int16`. + * `t`: `Int8`. * Add the suffix `u` to convert a type to its corresponding unsigned type: - * `ul`: `Uint64`. - * `u`: `Uint32`. - * `us`: `Uint16`. - * `ut`: `Uint8`. + * `ul`: `Uint64`. + * `u`: `Uint32`. + * `us`: `Uint16`. + * `ut`: `Uint8`. * You can also use hexadecimal, octal, and binary format for integer literals using the prefixes `0x`, `0o` and `0b`, respectively. You can arbitrarily combine them with the above-mentioned suffixes. -* Floating point literals have the `Double` type by default, but you can use the suffix `f` to narrow it down to `Float`. +* Floating point literals have the `Double` type by default, but you can use the suffix `f` to narrow it down to `Float`. ```sql SELECT @@ -162,4 +211,3 @@ SELECT 456s AS `Int16`, 1.2345f AS `Float`; ``` - diff --git a/ydb/docs/en/core/yql/reference/yql-core/types/_includes/optional.md b/ydb/docs/en/core/yql/reference/yql-core/types/_includes/optional.md index 71bb96d346e..2d01114405f 100644 --- a/ydb/docs/en/core/yql/reference/yql-core/types/_includes/optional.md +++ b/ydb/docs/en/core/yql/reference/yql-core/types/_includes/optional.md @@ -6,12 +6,12 @@ Optional data types in the [text format](../type_string.md) use the question mar The following operations are most often performed on optional data types: * [IS NULL](../../syntax/expressions.md#is-null): Matching an empty value -* [COALESCE](../../builtins/basic.md#coalesce): Leaves the filled values unchanged and replaces `NULL` with the default value that follows -* [UNWRAP](../../builtins/basic.md#optional-ops): Extract the value of the source type from the optional data type, `T?` is converted to `T` -* [JUST](../../builtins/basic#optional-ops) Change the data type to the optional type of the current one, `T` converts to`T?` +* [COALESCE](../../builtins/basic.md#coalesce): Leave the filled values unchanged and replace `NULL` with the default value that follows +* [UNWRAP](../../builtins/basic.md#optional-ops): Extract the value of the original type from the optional data type, `T?`. is converted to `T` +* [JUST](../../builtins/basic#optional-ops): Add optionality to the current type, `T` is converted to `T?`. * [NOTHING](../../builtins/basic.md#optional-ops): Create an empty value with the specified type. -`Optional` (nullable) isn't a property of a data type or column, but a [container](../containers.md) type where containers can be arbitrarily nested into each other. For example, a column with the type `Optional<Optional<Boolean>>` can accept 4 values: `NULL` of the overarching container, `NULL` of the inner container, `TRUE`, and `FALSE`. The above-declared type differs from `List<List<Boolean>>`, because it uses `NULL` as an empty list, and you can't put more than one non-null element in it. You can also use `Optional<Optional<T>>` as a key [lookup](/docs/s_expressions/functions#lookup) in the dictionary (`Dict(k,v)`) with `Optional<T>` values. Using this type of result data, you can distinguish between a `NULL` value in the dictionary and a missing key. +`Optional` (nullable) isn't a property of a data type or column, but a container type where [containers](../containers.md) can be arbitrarily nested into each other. For example, a column with the type `Optional<Optional<Boolean>>` can accept 4 values: `NULL` of the whole container, `NULL` of the inner container, `TRUE`, and `FALSE`. The above-declared type differs from `List<List<Boolean>>`, because it uses `NULL` as an empty list, and you can't put more than one non-null element in it. In addition, `Optional<Optional<T>>` type values are returned as results when searching by the key in the `Dict(k,v)` dictionary with `Optional<T>` type values. Using this type of result data, you can distinguish between a `NULL` value in the dictionary and a situation when the key is missing. **Example** @@ -30,21 +30,20 @@ Result: ## Logical and arithmetic operations with NULL {#null_expr} -The `NULL` literal has a separate singular `Null` type and can be implicitly converted to any optional type (for example, the nested type `OptionalOptional<T>...>>`). In ANSI SQL, `NULL` means "an unknown value", that's why logical and arithmetic operations involving `NULL` or empty `Optional` have certain specifics. +The `NULL` literal has a separate singular `Null` type and can be implicitly converted to any optional type (for example, the nested type `Optional<Optional<...Optional<T>...>>`). In ANSI SQL, `NULL` means "an unknown value", that's why logical and arithmetic operations involving `NULL` or empty `Optional` have certain specifics. **Examples** - ``` SELECT True OR NULL, -- Just(True) (works the same way as True OR <unknown value of type Bool>) False AND NULL, -- Just(False) - True AND NULL, -- NULL (to be more precise, Nothing<Bool?> – <unknown value of type Bool>) - NULL OR NOT NULL, -- NULL (all NULLs are considered "different") + True AND NULL, -- NULL (more precise than Nothing<Bool?> – <unknown value of type Bool>) + NULL OR NOT NULL, -- NULL (all NULLs are "different") 1 + NULL, -- NULL (Nothing<Int32?>) - the result of adding 1 together with - -- an unknown Int value) - 1 == NULL, -- NULL (the result of comparing 1 with an unknown Int value) - (1, NULL) == (1, 2), -- NULL (composite elements are compared by component comparison - -- using `AND') - (2, NULL) == (1, 3), -- Just(False) (the expression is equivalent to 2 == 1 AND NULL == 3) -``` + -- unknown value of type Int) + 1 == NULL, -- NULL (the result of adding 1 together with unknown value of type Int) + (1, NULL) == (1, 2), -- NULL (composite elements are compared by component + -- through `AND`) + (2, NULL) == (1, 3), -- Just(False) (expression is equivalent to 2 == 1 AND NULL == 3) +``` diff --git a/ydb/docs/ru/core/yql/reference/yql-core/syntax/_includes/create_table.md b/ydb/docs/ru/core/yql/reference/yql-core/syntax/_includes/create_table.md index b8bfff4601b..dd92b5f9a72 100644 --- a/ydb/docs/ru/core/yql/reference/yql-core/syntax/_includes/create_table.md +++ b/ydb/docs/ru/core/yql/reference/yql-core/syntax/_includes/create_table.md @@ -113,7 +113,7 @@ WITH ( Здесь key — это название параметра, а value — его значение. -Перечень допустимых имен параметров и их значений приведен на странице [описания таблицы {{ backend_name }}]({{ concept_table }}) +Перечень допустимых имен параметров и их значений приведен на странице [описания таблицы {{ backend_name }}]({{ concept_table }}). Например, такой код создаст таблицу с включенным автоматическим партиционированием по размеру партиции и предпочитаемым размером каждой партиции 512 мегабайт: |