diff options
author | alextarazanov <alextarazanov@yandex-team.com> | 2022-09-16 16:17:27 +0300 |
---|---|---|
committer | alextarazanov <alextarazanov@yandex-team.com> | 2022-09-16 16:17:27 +0300 |
commit | 2b5de74ad81ce19e5d49724b3d1943c542a9a96e (patch) | |
tree | 0a58752b7f4bbde1c3b7ecee1a5116c185486a06 | |
parent | adb84b77f034b6ea45b1fa3873faccb32668c27e (diff) | |
download | ydb-2b5de74ad81ce19e5d49724b3d1943c542a9a96e.tar.gz |
[review] [YDB] Check translate's
39 files changed, 1670 insertions, 43 deletions
diff --git a/ydb/docs/en/core/_includes/ydb-cli-profile.md b/ydb/docs/en/core/_includes/ydb-cli-profile.md new file mode 100644 index 00000000000..120a755baa0 --- /dev/null +++ b/ydb/docs/en/core/_includes/ydb-cli-profile.md @@ -0,0 +1,5 @@ +{% note info %} + +The examples use the `db1` profile. To learn more, see [{#T}](../getting_started/cli.md#profile). + +{% endnote %} diff --git a/ydb/docs/en/core/best_practices/_includes/pk_scalability.md b/ydb/docs/en/core/best_practices/_includes/pk_scalability.md index e228e4a5b48..a1f1786f23d 100644 --- a/ydb/docs/en/core/best_practices/_includes/pk_scalability.md +++ b/ydb/docs/en/core/best_practices/_includes/pk_scalability.md @@ -4,7 +4,7 @@ Proper design of the table's primary key is important for the performance for bo General recommendations for choosing a primary key: -* Avoid situations when the significant part of the workload falls on a single [partition](../../concepts/datamodel.md#partitioning) of a table. The more evenly the workload is distributed across the partitions, the higher the performance. +* Avoid situations when the significant part of the workload falls on a single [partition](../../concepts/datamodel/table.md#partitioning) of a table. The more evenly the workload is distributed across the partitions, the higher the performance. * Reduce the number of table partitions that are affected by a single request. Moreover, if the request affects no more than one partition, it is executed using a special simplified protocol. This significantly increases the speed of execution and conserves the resources. All {{ ydb-short-name }} tables are sorted by primary key in ascending order. In a table with a monotonically increasing primary key, this will result in new data being added at the end of a table. As {{ ydb-short-name }} splits table data into partitions based on key ranges, inserts are always processed by the same server that is responsible for the "last" partition. Concentrating the load on a single server results in slow data uploading and inefficient use of a distributed system. diff --git a/ydb/docs/en/core/best_practices/_includes/table_sharding.md b/ydb/docs/en/core/best_practices/_includes/table_sharding.md index 64d3861f80a..57291253c34 100644 --- a/ydb/docs/en/core/best_practices/_includes/table_sharding.md +++ b/ydb/docs/en/core/best_practices/_includes/table_sharding.md @@ -1,2 +1,2 @@ -This article has been deleted. Its content has been moved to [Partitioning tables](../../concepts/datamodel.md#partitioning) in the article about data schema objects in the "Concepts" section. +This article has been deleted. Its content has been moved to [Partitioning tables](../../concepts/datamodel/table.md#partitioning) in the article about data schema objects in the "Concepts" section. diff --git a/ydb/docs/en/core/best_practices/cdc.md b/ydb/docs/en/core/best_practices/cdc.md new file mode 100644 index 00000000000..5a042267faa --- /dev/null +++ b/ydb/docs/en/core/best_practices/cdc.md @@ -0,0 +1,97 @@ +# Change Data Capture + +With [Change Data Capture](../concepts/cdc.md) (CDC), you can track changes in table data. {{ ydb-short-name }} provides access to changefeeds so that data consumers can monitor changes in near real time. + +## Enabling and disabling CDC {#add-drop} + +CDC is represented as a data schema object: a changefeed that can be added to a table or deleted from it using the [ADD CHANGEFEED and DROP CHANGEFEED](../yql/reference/syntax/alter_table.md#changefeed) directives of the YQL `ALTER TABLE` statement. + +## Reading data from a topic {#read} + +You can read data using an [SDK](../reference/ydb-sdk) or the [{{ ydb-short-name }} CLI](../reference/ydb-cli). As with any other data schema object, you can access a changefeed using its path that has the following format: + +```txt +path/to/table/changefeed_name +``` + +> For example, if a table named `table` contains a changefeed named `updates_feed` in the `my` directory, its path looks like this: +> +> ```text +> my/table/updates_feed +> ``` + +Before reading data, add a [consumer](../concepts/topic.md#consumer). Below is a sample command that adds a consumer named `my_consumer` to the `updates_feed` changefeed of the `table` table in the `my` directory: + +```bash +{{ ydb-cli }} topic consumer add \ + my/table/updates_feed \ + --consumer-name=my_consumer +``` + +Next, you can use the created consumer to start tracking changes. Below is a sample command for tracking data changes in the CLI: + +```bash +{{ ydb-cli }} topic read \ + my/table/updates_feed \ + --consumer=my_consumer \ + --format=newline-delimited \ + --wait +``` + +## Impact on table write performance {#performance-considerations} + +When writing data to a table with CDC enabled, there are additional overheads for the following operations: + +* Making records and saving them to a changefeed. +* Storing records in a changefeed. +* In some [modes](../yql/reference/syntax/alter_table.md#changefeed-options) (such as `OLD_IMAGE` and `NEW_AND_OLD_IMAGES`), data needs to be pre-fetched even if a user query doesn't require this. + +As a result, queries may take longer to execute and size limits for stored data may be exceeded. + +In real-world use cases, enabling CDC has virtually no impact on the query execution time (whatever the mode), since almost all data required for making records is stored in the cache , while the records themselves are sent to a topic asynchronously. However, record delivery background activity slightly (by 1% to 10%) increases CPU utilization. + +In addition, a changefeed is currently stored to a topic which has limited elasticity. This means that if the table partitioning scheme changes significantly, there arises an imbalance between the table partitions and topic partitions. This imbalance may also increase the time it takes to execute queries or lead to additional overheads for storing a changefeed. + +## Load testing {#workload} + +As a load generator, you can use the feature of [emulating an online store](../reference/ydb-cli/commands/workload/stock) built into the {{ ydb-short-name }} CLI: + +1. [Initialize](../reference/ydb-cli/commands/workload/stock#init) a test. +1. Add a changefeed: + + ```sql + ALTER TABLE `orders` ADD CHANGEFEED `updates` WITH ( + FORMAT = 'JSON', + MODE = 'UPDATES' + ); + ``` + +1. Create a consumer: + + ```bash + {{ ydb-cli }} topic consumer add \ + orders/updates \ + --consumer-name=my_consumer + ``` + +1. Start tracking changes: + + ```bash + {{ ydb-cli }} topic read \ + orders/updates \ + --consumer=my_consumer \ + --format=newline-delimited \ + --wait + ``` + +1. [Generate](../reference/ydb-cli/commands/workload/stock#run) a load. + + The following changefeed appears in the CLI: + + ```text + ... + {"update":{"created":"2022-06-24T11:35:00.000000Z","customer":"Name366"},"key":[13195699997286404932]} + {"update":{"created":"2022-06-24T11:35:00.000000Z","customer":"Name3894"},"key":[452209497351143909]} + {"update":{"created":"2022-06-24T11:35:00.000000Z","customer":"Name7773"},"key":[2377978894183850258]} + ... + ``` diff --git a/ydb/docs/en/core/best_practices/toc_i.yaml b/ydb/docs/en/core/best_practices/toc_i.yaml index 626d8b58f99..f2e213ba550 100644 --- a/ydb/docs/en/core/best_practices/toc_i.yaml +++ b/ydb/docs/en/core/best_practices/toc_i.yaml @@ -11,6 +11,8 @@ items: hidden: true - name: Secondary indexes href: secondary_indexes.md +- name: Change Data Capture + href: cdc.md - name: Paginated output href: paging.md - name: Loading large data volumes diff --git a/ydb/docs/en/core/concepts/_includes/index/intro.md b/ydb/docs/en/core/concepts/_includes/index/intro.md index 30fa9eb378b..d599cfa3ad7 100644 --- a/ydb/docs/en/core/concepts/_includes/index/intro.md +++ b/ydb/docs/en/core/concepts/_includes/index/intro.md @@ -16,7 +16,7 @@ description: "Yandex Database (YDB): is a horizontally scalable distributed faul To interact with {{ ydb-short-name }}, you can use the [{{ ydb-short-name }} CLI](../../../reference/ydb-cli/index.md) or [SDK](../../../reference/ydb-sdk/index.md) for {% if oss %}C++,{% endif %} Java, Python, Node.js, PHP, and Go. -{{ ydb-short-name }} supports a relational [data model](../../../concepts/datamodel.md) and manages tables with a predefined schema. To make it easier to organize tables, directories can be created like in the file system. +{{ ydb-short-name }} supports a relational [data model](../../../concepts/datamodel/table.md) and manages tables with a predefined schema. To make it easier to organize tables, directories can be created like in the file system. Database commands are mainly written in YQL, an SQL dialect. This gives the user a powerful and familiar way to interact with the database. diff --git a/ydb/docs/en/core/concepts/_includes/ttl.md b/ydb/docs/en/core/concepts/_includes/ttl.md index 644c96c6e6c..6901ff29ddf 100644 --- a/ydb/docs/en/core/concepts/_includes/ttl.md +++ b/ydb/docs/en/core/concepts/_includes/ttl.md @@ -31,7 +31,7 @@ Data is deleted by the *Background Removal Operation* (*BRO*), consisting of two The *BRO* has the following properties: -* The concurrency unit is a [table partition](../datamodel.md#partitioning). +* The concurrency unit is a [table partition](../datamodel/table.md#partitioning). * For tables with [secondary indexes](../secondary_indexes.md), the delete stage is a [distributed transaction](../transactions.md#distributed-tx). ## Guarantees {#guarantees} diff --git a/ydb/docs/en/core/concepts/cdc.md b/ydb/docs/en/core/concepts/cdc.md new file mode 100644 index 00000000000..ef837e417f3 --- /dev/null +++ b/ydb/docs/en/core/concepts/cdc.md @@ -0,0 +1,97 @@ +# Change Data Capture (CDC) + +Change Data Capture (CDC) captures changes to {{ ydb-short-name }} table rows, uses these changes to generate a _changefeed_, writes them to distributed storage, and provides access to these records for further processing. It uses a [topic](topic.md) as distributed storage to efficiently store the table change log. + +When adding, updating, or deleting a table row, CDC generates a change record by specifying the [primary key](datamodel/table.md) of the row and writes it to the topic partition corresponding to this key. + +## Guarantees {#guarantees} + +* Change records are sharded across topic partitions by primary key. +* Each change is only delivered once (exactly-once delivery). +* Changes by the same primary key are delivered to the same topic partition in the order they took place in the table. + +## Limitations {#restrictions} + +* The number of topic partitions is fixed as of changefeed creation and remains unchanged (unlike tables, topics are not elastic). +* Changefeeds support records of the following types of operations: + * Updates + * Erases + + Adding rows is a special update case, and a record of adding a row in a changefeed will look similar to an update record. + +## Record structure {#record-structure} + +Depending on the [changefeed parameters](../yql/reference/syntax/alter_table.md#changefeed-options), the structure of a record may differ. + +A [JSON](https://en.wikipedia.org/wiki/JSON) record has the following structure: + +```json +{ + "key": [<key components>], + "update": {<columns>}, + "erase": {}, + "newImage": {<columns>}, + "oldImage": {<columns>} +} +``` + +* `key`: An array of primary key component values. Always present. +* `update`: Update flag. Present if a record matches the update operation. In `UPDATES` mode, it also contains the names and values of updated columns. +* `erase`: Erase flag. Present if a record matches the erase operation. +* `newImage`: Row snapshot that results from its being changed. Present in `NEW_IMAGE` and `NEW_AND_OLD_IMAGES` modes. Contains column names and values. +* `oldImage`: Row snapshot before the change. Present in `OLD_IMAGE` and `NEW_AND_OLD_IMAGES` modes. Contains column names and values. + +> Sample record of an update in `UPDATES` mode: +> +> ```json +> { +> "key": [1, "one"], +> "update": { +> "payload": "lorem ipsum", +> "date": "2022-02-22" +> } +> } +> ``` +> +> Record of an erase: +> ```json +> { +> "key": [2, "two"], +> "erase": {} +> } +> ``` +> +> Record with row snapshots: +> ```json +> { +> "key": [1, 2, 3], +> "update": {}, +> "newImage": { +> "textColumn": "value1", +> "intColumn": 101, +> "boolColumn": true +> }, +> "oldImage": { +> "textColumn": null, +> "intColumn": 100, +> "boolColumn": false +> } +> } +> ``` + +{% note info %} + +* The same record may not contain the `update` and `erase` fields simultaneously, since these fields are operation flags (you can't update and erase a table row at the same time). However, each record contains one of these fields (any operation is either an update or an erase). +* In `UPDATES` mode, the `update` field for update operations is an operation flag (update) and contains the names and values of updated columns. +* JSON object fields containing column names and values (`newImage`, `oldImage`, and `update` in `UPDATES` mode), *do not include* the columns that are primary key components. +* If a record contains the `erase` field (indicating that the record matches the erase operation), this is always an empty JSON object (`{}`). + +{% endnote %} + +## Creating and deleting a changefeed {#ddl} + +You can add a changefeed to an existing table or erase it using the [ADD CHANGEFEED and DROP CHANGEFEED](../yql/reference/syntax/alter_table.md#changefeed) directives of the YQL `ALTER TABLE` statement. When erasing a table, the changefeed added to it is also deleted. + +## CDC purpose and use {#best_practices} + +For information about using CDC when developing apps, see [best practices](../best_practices/cdc.md). diff --git a/ydb/docs/en/core/concepts/datamodel/_includes/blockdevice.md b/ydb/docs/en/core/concepts/datamodel/_includes/blockdevice.md new file mode 100644 index 00000000000..a573ef7a81c --- /dev/null +++ b/ydb/docs/en/core/concepts/datamodel/_includes/blockdevice.md @@ -0,0 +1,3 @@ +# Network Block Store Volume + +Implementing a [network block device](https://en.wikipedia.org/wiki/Network_block_device) built on {{ ydb-short-name }} is an example of how {{ ydb-short-name }} can be used as a platform for creating a wide range of data storage and processing systems. Network block devices implement an interface for a local block device, as well as ensure fault-tolerance (through redundancy) and good scalability in terms of volume size and the number of input/output operations per unit of time. The downside of a network block device is that any input/output operation on such device requires network interaction, which might increase the latency of the network device compared to the local device. You can deploy a common file system on a network block device and/or run an application directly on the block device, such as a database management system. diff --git a/ydb/docs/en/core/concepts/datamodel/_includes/dir.md b/ydb/docs/en/core/concepts/datamodel/_includes/dir.md new file mode 100644 index 00000000000..29d929a4d74 --- /dev/null +++ b/ydb/docs/en/core/concepts/datamodel/_includes/dir.md @@ -0,0 +1,3 @@ +# Directory + +For convenience, the service supports creating directories like in a file system, meaning the entire database consists of a directory tree, while tables and other entities are in the leaves of this tree (similar to files in the file system). A directory can host multiple subdirectories and tables. The names of the entities they contain are unique. diff --git a/ydb/docs/en/core/concepts/datamodel/_includes/index.md b/ydb/docs/en/core/concepts/datamodel/_includes/index.md new file mode 100644 index 00000000000..4c707f88a7b --- /dev/null +++ b/ydb/docs/en/core/concepts/datamodel/_includes/index.md @@ -0,0 +1,7 @@ +# Data model and schema + +This section describes the entities that {{ ydb-short-name }} uses within DBs. The {{ ydb-short-name }} core lets you flexibly implement various storage primitives, so new entities may appear in the future. + +* [Directory](../dir.md) +* [Table](../table.md) +* [Topic](../../topic.md) diff --git a/ydb/docs/en/core/concepts/datamodel/_includes/table.md b/ydb/docs/en/core/concepts/datamodel/_includes/table.md new file mode 100644 index 00000000000..9573c5695af --- /dev/null +++ b/ydb/docs/en/core/concepts/datamodel/_includes/table.md @@ -0,0 +1,157 @@ +# Table + +A table in {{ ydb-short-name }} is a [relational table](https://en.wikipedia.org/wiki/Table_(database)) containing a set of related data and made up of rows and columns. Each row is a set of cells that are used for storing specific types of values according to the data schema. The data schema defines the names and types of table columns. An example of the data schema is shown below. The `Series` table consists of four columns named `SeriesId`, `ReleaseDate`, `SeriesInfo`, and `Title` and holding data of type `Uint64?` for the first two and `String?` for the latter two. The `SeriesId` column is declared the primary key. + + + +{{ ydb-short-name }} uses [YQL](../../datatypes.md) data types. [Simple YQL data types](../../../yql/reference/types/primitive.md) can be used as column types. All columns are [optional](../../../yql/reference/types/optional.md) by default and can be assigned `NULL` values. When creating a table, you can set `NOT NULL` for columns included into the primary key. Such columns won't accept NULL in this case. + +{{ ydb-short-name }} tables always have one or more columns that make up the key ([primary key](https://en.wikipedia.org/wiki/Unique_key)). Each table row has a unique key value, so there can be no more than one row per key value. {{ ydb-short-name }} tables are always ordered by key. This means that you can efficiently make point reads by key and range-based queries by key or key prefix (actually using an index). In the example above, the key columns are highlighted in gray and marked with a special sign. Tables consisting only of key columns are supported. However, you can't create tables without a primary key. + +Often, when you design a table schema, you already have a set of fields, which can naturally serve as the primary key. Be careful when selecting the key to avoid hotspots. For example, if you insert data into a table with a monotonically increasing key, you write the data to the end of the table. But since {{ ydb-short-name }} splits table data by key range, your inserts are always processed by the same server, so you lose the main benefits of a distributed database. To distribute the load evenly across different servers and to avoid hotspots when processing large tables, we recommend hashing the natural key and using the hash as the first component of the primary key as well as changing the order of the primary key components. + +## Partitioning {#partitioning} + +A database table can be sharded by primary key value ranges. Each shard of the table is responsible for a specific range of primary keys. Key ranges maintained by different shards do not overlap. Different table shards can be served by different distributed database servers (including ones in different locations). They can also move independently between servers to enable rebalancing or ensure shard operability if servers or network equipment goes offline. + +If there is not a lot of data or load, the table may consist of a single shard. As the amount of data served by the shard or the load on the shard grows, {{ ydb-short-name }} automatically splits this shard into two shards. The data is split by the median value of the primary key if the shard size exceeds the threshold. If partitioning by load is used, the shard first collects a sample of the requested keys (that can be read, written, and deleted) and, based on this sample, selects a key for partitioning to evenly distribute the load across new shards. So in the case of load-based partitioning, the size of new shards may significantly vary. + +The size-based shard split threshold and automatic splitting can be configured (enabled/disabled) individually for each database table. + +In addition to automatically splitting shards, you can create an empty table with a predefined number of shards. You can manually set the exact shard key split range or evenly split it into a predefined number of shards. In this case, ranges are created based on the first component of the primary key. You can set even splitting for tables that have a Uint64 or Uint32 integer as the first component of the primary key. + +Partitioning parameters refer to the table itself rather than to secondary indexes built from its data. Each index is served by its own set of shards and decisions to split or merge its partitions are made independently based on the default settings. These settings may become available to users in the future like the settings of the main table. + +A split or a merge usually takes about 500 milliseconds. During this time, the data involved in the operation becomes temporarily unavailable for reads and writes. Without raising it to the application level, special wrapper methods in the {{ ydb-short-name }} SDK make automatic retries when they discover that a shard is being split or merged. Please note that if the system is overloaded for some reason (for example, due to a general shortage of CPU or insufficient DB disk throughput), split and merge operations may take longer. + +The following table partitioning parameters are defined in the data schema: + +#### AUTO_PARTITIONING_BY_SIZE + +* Type: `Enum` (`ENABLED`, `DISABLED`). +* Default value: `ENABLED`. + +Automatic partitioning by partition size. If a partition size exceeds the value specified by the [AUTO_PARTITIONING_PARTITION_SIZE_MB](#auto_partitioning_partition_size_mb) parameter, it is enqueued for splitting. If the total size of two or more adjacent partitions is less than 50% of the [AUTO_PARTITIONING_PARTITION_SIZE_MB](#auto_partitioning_partition_size_mb) value, they are enqueued for merging. + +#### AUTO_PARTITIONING_BY_LOAD + +* Type: `Enum` (`ENABLED`, `DISABLED`). +* Default value: `DISABLED`. + +Automatic partitioning by load. If a shard consumes more than 50% of the CPU for a few dozens of seconds, it is enqueued for splitting. If the total load on two or more adjacent shards uses less than 35% of a single CPU core within an hour, they are enqueued for merging. + +Performing split or merge operations uses the CPU and takes time. Therefore, when dealing with a variable load, we recommend both enabling this mode and setting [AUTO_PARTITIONING_MIN_PARTITIONS_COUNT](#auto_partitioning_min_partitions_count) to a value other than 1 so that the number of partitions doesn't drop below this value as the load decreases, and YDB doesn't have to split them again when the load is back. + +When choosing the minimum number of partitions, it makes sense to consider that one table partition can only be hosted on one server and use no more than 1 CPU core for data update operations. Hence, you can set the minimum number of partitions for a table on which a high load is expected to at least the number of nodes (servers) or, preferably, to the number of CPU cores allocated to the database. + +#### AUTO_PARTITIONING_PARTITION_SIZE_MB + +* Type: `Uint64`. +* Default value: `2000 MB` ( `2 GB` ). + +Partition size threshold in MB. If exceeded, a shard splits. Takes effect when the [AUTO_PARTITIONING_BY_SIZE](#auto_partitioning_by_size) is enabled. + +#### AUTO_PARTITIONING_MIN_PARTITIONS_COUNT + +* Type: `Uint64`. +* The default value is `1`. + +Partitions are only merged if their actual number exceeds the value specified by this parameter. When using automatic partitioning by load, we recommend that you set this parameter to a value other than 1, so that periodic load drops don't lead to a decrease in the number of partitions below the required one. + +#### AUTO_PARTITIONING_MAX_PARTITIONS_COUNT + +* Type: `Uint64`. +* The default value is `50`. + +Partitions are only split if their number doesn't exceed the value specified by this parameter. With any automatic partitioning mode enabled, we recommend that you set a meaningful value for this parameter and monitor when the actual number of partitions approaches this value, otherwise splitting of partitions will sooner or later stop under an increase in data or load, which will lead to a failure. + +#### UNIFORM_PARTITIONS + +* Type: `Uint64`. +* Default value: Not applicable + +The number of partitions for uniform initial table partitioning. The primary key's first column must have type `Uint64` or `Uint32`. A created table is immediately divided into the specified number of partitions. + +When automatic partitioning is enabled, make sure to set a correct value for [AUTO_PARTITIONING_MIN_PARTITIONS_COUNT](#auto_partitioning_min_partitions_count) so as not to merge all partitions into one immediately after creating the table. + +#### PARTITION_AT_KEYS + +* Type: `Expression`. +* Default value: Not applicable + +Boundary values of keys for initial table partitioning. It's a list of boundary values separated by commas and surrounded with brackets. Each boundary value can be either a set of values of key columns (also separated by commas and surrounded with brackets) or a single value if only the values of the first key column are specified. Examples: `(100, 1000)`, `((100, "abc"), (1000, "cde"))`. + +When automatic partitioning is enabled, make sure to set a correct value for [AUTO_PARTITIONING_MIN_PARTITIONS_COUNT](#auto_partitioning_min_partitions_count) so as not to merge all partitions into one immediately after creating the table. + +## Reading data from replicas {#read_only_replicas} + +When making queries in {{ ydb-short-name }}, the actual execution of a query to each shard is performed at a single point serving the distributed transaction protocol. By storing data in shared storage, you can run one or more shard followers without allocating additional storage space: the data is already stored in replicated format, and you can serve more than one reader (but there is still only one writer at any given moment). + +Reading data from followers allows you: + +* To serve clients demanding minimal delay, which is otherwise unachievable in a multi-DC cluster. This is accomplished by executing queries soon after they are formulated, which eliminates the delay associated with inter-DC transfers. As a result, you can both preserve all the storage reliability guarantees of a multi-DC cluster and respond to occasional read queries in milliseconds. +* To handle read queries from followers without affecting modifying queries running on a shard. This can be useful both for isolating different scenarios and for increasing the partition bandwidth. +* To ensuring continued service when moving a partition leader (both in a planned manner for load balancing and in an emergency). It lets the processes in the cluster survive without affecting the reading clients. +* To increasing the overall shard read performance if many read queries access the same keys. + +You can enable running read replicas for each shard of the table in the table data schema. The read replicas (followers) are typically accessed without leaving the data center network, which ensures response delays in milliseconds. + +| Parameter name | Description | Type | Acceptable values | Update possibility | Reset capability | +| ------------- | --------- | --- | ------------------- | --------------------- | ------------------ | +| `READ_REPLICAS_SETTINGS` | `PER_AZ` means using the specified number of replicas in each AZ and `ANY_AZ` in all AZs in total. | String | `"PER_AZ:<count>"`, `"ANY_AZ:<count>"`, where `<count>` is the number of replicas | Yes | No | + +The internal state of each of the followers is restored exactly and fully consistently from the leader state. + +Besides the data status in storage, followers also receive a stream of updates from the leader. Updates are sent in real time, immediately after the commit to the log. However, they are sent asynchronously, resulting in some delay (usually no more than dozens of milliseconds, but sometimes longer in the event of cluster connectivity issues) in applying updates to followers relative to their commit on the leader. Therefore, reading data from followers is only supported in the`StaleReadOnly()` [transaction mode](../transactions#modes). + +If there are multiple followers, their delay from the leader may vary: although each follower of each of the shards retains internal consistency, artifacts may be observed from shard to shard. Please provide for this in your application code. For that same reason, it's currently impossible to perform cross-shard transactions from followers. + +## Deleting expired data (TTL) {#ttl} + +{{ ydb-short-name }} supports automatic background deletion of expired data. A table data schema may define a column containing a `Datetime` or a `Timestamp` value. A comparison of this value with the current time for all rows will be performed in the background. Rows for which the current time becomes greater than the column value, factoring in the specified delay, will be deleted. + +| Parameter name | Type | Acceptable values | Update possibility | Reset capability | +| ------------- | --- | ------------------- | --------------------- | ------------------ | +| `TTL` | Expression | `Interval("<literal>") ON <column>` | Yes | Yes | + +For more information about deleting expired data, see [Time to Live (TTL)](../../../concepts/ttl.md). + +## Renaming {#rename} + +{{ ydb-short-name }} lets you rename an existing table, move it to another directory of the same database, or replace one table with another, deleting the data in the replaced table. Only the metadata of the table is changed by operations (for example, its path and name). The table data is neither moved nor overwritten. + +Operations are performed in isolation, the external process sees only two states of the table: before and after the operation. This is critical, for example, for table replacement: the data of the replaced table is deleted by the same transaction that renames the replacing table. During the replacement, there might be errors in queries to the replaced table that have [retryable statuses](../../../reference/ydb-sdk/error_handling.md#termination-statuses). + +The speed of renaming is determined by the type of data transactions currently running against the table and doesn't depend on the table size. + +* [Renaming a table in YQL](../../../yql/reference/syntax/alter_table.md#rename) +* [Renaming a table via the CLI](../../../reference/ydb-cli/commands/tools/rename.md) + +## Bloom filter {#bloom-filter} + +With a [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter), you can more efficiently determine if some keys are missing in a table when making multiple single queries by the primary key. This reduces the number of required disk I/O operations but increases the amount of memory consumed. + +| Parameter name | Type | Acceptable values | Update possibility | Reset capability | +| ------------- | --- | ------------------- | --------------------- | ------------------ | +| `KEY_BLOOM_FILTER` | Enum | `ENABLED`, `DISABLED` | Yes | No | + +## Column groups {#column-groups} + +You can group table columns to optimize their storage and use in {{ ydb-short-name }}. Column grouping enables you to improve the performance of data selections by introducing grouped column storage. The most commonly used strategy is to create a separate column group for rarely used attributes (possibly also with compression and on a slower storage). + +Each column group has a unique name within a table. You can set the composition of column groups when [creating a table](../../../yql/reference/syntax/create_table.md#column-family) and [change](../../../yql/reference/syntax/alter_table.md#column-family) it later. You cannot remove column groups from an existing table. + +A column family may contain any number of columns of its table, including none. Each table column can belong to a single column group (that is, column groups can't overlap). Column groups are set up when creating a table, but can be modified later. + +Each table has a `default` column group that includes all the columns that don't belong to any other column group. Primary-key columns are always in the default column group and can't be moved to another group. + +Column groups are assigned attributes that affect data storage: + +* The used data storage device type (SSD or HDD, availability depends on the {{ ydb-short-name }} cluster configuration). +* Data compression mode (without compression or compression using the [LZ4](https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)) algorithm). + +Attributes for a column group are set when creating a table (for example, they can be explicitly set for a default column group) and changed afterwards. Changes in storage attributes aren't applied to the data immediately, but later, at manual or automatic LSM compaction. + +The data stored in the fields of the default column group is accessed faster: this requires less resources than accessing the fields from additional column groups of the same table row. When you search by the primary key, the default column group is always used. When accessing fields from other column groups, besides searching by the primary key, you need additional search operations to determine specific storage positions for these fields. + +This way, by creating a separate column group for certain table columns, you can accelerate read operations for the most important frequently used columns of the default column group by slightly slowing down access to other columns. Furthermore, at the column group level, you can control the data storage parameters: for example, you can select the storage device type and data compression mode. diff --git a/ydb/docs/en/core/concepts/datamodel/blockdevice.md b/ydb/docs/en/core/concepts/datamodel/blockdevice.md new file mode 100644 index 00000000000..8940c9e97df --- /dev/null +++ b/ydb/docs/en/core/concepts/datamodel/blockdevice.md @@ -0,0 +1 @@ +{% include [blockdevice.md](_includes/blockdevice.md) %}
\ No newline at end of file diff --git a/ydb/docs/en/core/concepts/datamodel/dir.md b/ydb/docs/en/core/concepts/datamodel/dir.md new file mode 100644 index 00000000000..965048e35c3 --- /dev/null +++ b/ydb/docs/en/core/concepts/datamodel/dir.md @@ -0,0 +1 @@ +{% include [dir.md](_includes/dir.md) %}
\ No newline at end of file diff --git a/ydb/docs/en/core/concepts/datamodel/index.md b/ydb/docs/en/core/concepts/datamodel/index.md new file mode 100644 index 00000000000..aabc0437429 --- /dev/null +++ b/ydb/docs/en/core/concepts/datamodel/index.md @@ -0,0 +1 @@ +{% include [index.md](_includes/index.md) %}
\ No newline at end of file diff --git a/ydb/docs/en/core/concepts/datamodel/table.md b/ydb/docs/en/core/concepts/datamodel/table.md new file mode 100644 index 00000000000..e2c74aeefbe --- /dev/null +++ b/ydb/docs/en/core/concepts/datamodel/table.md @@ -0,0 +1 @@ +{% include [table.md](_includes/table.md) %}
\ No newline at end of file diff --git a/ydb/docs/en/core/concepts/datamodel/toc_i.yaml b/ydb/docs/en/core/concepts/datamodel/toc_i.yaml new file mode 100644 index 00000000000..00dba1eb9f1 --- /dev/null +++ b/ydb/docs/en/core/concepts/datamodel/toc_i.yaml @@ -0,0 +1,5 @@ +items: +- { name: Overview, href: index.md } +- { name: Directory, href: dir.md } +- { name: Table, href: table.md } +- { name: Topic, href: ../topic.md } diff --git a/ydb/docs/en/core/concepts/datamodel/toc_p.yaml b/ydb/docs/en/core/concepts/datamodel/toc_p.yaml new file mode 100644 index 00000000000..5bfec4365de --- /dev/null +++ b/ydb/docs/en/core/concepts/datamodel/toc_p.yaml @@ -0,0 +1,2 @@ +items: +- include: { mode: link, path: toc_i.yaml }
\ No newline at end of file diff --git a/ydb/docs/en/core/concepts/toc_i.yaml b/ydb/docs/en/core/concepts/toc_i.yaml index 130c3394bdc..021b1428b45 100644 --- a/ydb/docs/en/core/concepts/toc_i.yaml +++ b/ydb/docs/en/core/concepts/toc_i.yaml @@ -4,11 +4,14 @@ items: - { name: Connecting to a database, href: connect.md } - name: Authentication href: auth.md -- { name: Data model and schema, href: datamodel.md } +- name: Data model and schema + include: { path: datamodel/toc_p.yaml, mode: link } +- { name: Topic, href: topic.md } - { name: Serverless and Dedicated operation modes, href: serverless_and_dedicated.md } - { name: Data types, href: datatypes.md, hidden: true } # Deprecated - { name: Transactions, href: transactions.md } - { name: Secondary indexes, href: secondary_indexes.md } +- { name: Change Data Capture (CDC), href: cdc.md, when: feature_changefeed } - { name: Time to Live (TTL), href: ttl.md } - { name: Scan queries, href: scan_query.md } - { name: Database limits, href: limits-ydb.md } diff --git a/ydb/docs/en/core/concepts/topic.md b/ydb/docs/en/core/concepts/topic.md new file mode 100644 index 00000000000..0ae1f257eeb --- /dev/null +++ b/ydb/docs/en/core/concepts/topic.md @@ -0,0 +1,142 @@ +# Topic + +A topic in {{ ydb-short-name }} is an entity for storing unstructured messages and delivering them to multiple subscribers. Basically, a topic is a named set of messages. + +A producer app writes messages to a topic. Consumer apps are independent of each other, they receive and read messages from the topic in the order they were written there. Topics implement the [publish-subscribe]{% if lang == "en" %}(https://en.wikipedia.org/wiki/Publish–subscribe_pattern){% endif %}{% if lang == "ru" %}(https://ru.wikipedia.org/wiki/Издатель-подписчик_(шаблон_проектирования)){% endif %} architectural pattern. + +{{ ydb-short-name }} topics have the following properties: + +* At-least-once delivery guarantees when messages are read by subscribers. +* Exactly-once delivery guarantees when publishing messages (to ensure there are no duplicate messages). +* [FIFO](https://en.wikipedia.org/wiki/Message_queue) message processing guarantees for messages published with the same [source ID](#producer-id). +* Message delivery bandwidth scaling for messages published with different sequence IDs. + +## Messages {#message} + +Data is transferred as message streams. A message is the minimum atomic unit of user information. A message consists of a body and attributes and additional system properties. The content of a message is an array of bytes which is not interpreted by {{ydb-short-name}} in any way. + +Messages may contain user-defined attributes in "key-value" format. They are returned along with the message body when reading the message. User-defined attributes let the consumer decide whether it should process the message without unpacking the message body. Message attributes are set when initializing a write session. This means that all messages written within a single write session will have the same attributes when reading them. + +## Partitioning {#partitioning} + +To enable horizontal scaling, a topic is divided into `partitions` that are units of parallelism. Each partition has a limited bandwidth. The recommended write speed is 1 MBps. + +{% note info %} + +As for now, you can only reduce the number of partitions in a topic by deleting and recreating a topic with a smaller number of partitions. + +{% endnote %} + +### Offset {#offset} + +All messages within a partition have a unique sequence number called an `offset` An offset monotonically increases as new messages are written. + +## Message sources and groups {#producer-id} + +Messages are ordered using the `producer_id` and `message_group_id`. The order of written messages is maintained within pairs: <producer ID, message group ID>. + +When used for the first time, a pair of <producer ID, message group ID> is linked to a topic's [partition](#partition) using the round-robin algorithm and all messages with this pair of IDs get into the same partition. The link is removed if there are no new messages using this producer ID for 14 days. + +{% note warning %} + +The recommended maximum number of <producer ID, message group ID> pairs is up to 100 thousand per partition in the last 14 days. + +{% endnote %} + +{% cut "Why and when the message processing order is important" %} + +**When the message processing order is important** + +Let's consider a finance application that calculates the balance on a user's account and permits or prohibits debiting the funds. + +For such tasks, you can use a message queue. When you top up your account, debit funds, or make a purchase, a message with the account ID, amount, and transaction type is registered in the queue. The application processes incoming messages and calculates the balance. + +To accurately calculate the balance, the message processing order is crucial. If a user first tops up their account and then makes a purchase, messages with details about these transactions must be processed by the app in the same order. Otherwise there may be an error in the business logic and the app will reject the purchase as a result of insufficient funds. There are guaranteed delivery order mechanisms, but they cannot ensure a message order within a single queue on an arbitrary data amount. + +When several application instances read messages from a stream, a message about account top-ups can be received by one instance and a message about debiting by another. In this case, there's no guaranteed instance with accurate balance information. To avoid this issue, you can, for example, save data in the DBMS, share information between application instances, and implement a distributed cache. + +{{ ydb-short-name }} can write data so that messages from one source (for example, about transactions from one account) arrive at the same application instance. The source of a message is identified by the source_id, while the sequence number of a message from the source is used to ensure there are no duplicate messages. {{ydb-short-name}} arranges data streams so that messages from the same source arrive at the same partition. As a result, transaction messages for a given account will always arrive at the same partition and be processed by the application instance linked to this partition. Each of the instances processes its own subset of partitions and there's no need to synchronize the instances. + +**When the processing order is not important** + +For some tasks, the message processing order is not critical. For example, it's sometimes important to simply deliver data that will then be ordered by the storage system. + +Although message ordering is not important in this case, the protocol used for writing messages to a persistent queue requires that its source ID be specified. {{ydb-short-name}} remembers the source ID link and the partition that the message was written to. If the source ID is selected randomly, this will cause a great number of different source IDs and, hence, a large amount of stored data by the links between the sources and partitions, which may overload {{ydb-short-name}}. + +{% note warning %} + +We strongly recommend that you don't use random or pseudo-random source IDs. We recommend using a maximum of 100 thousand different source IDs per partition. + +{% endnote %} + +{% endcut %} + +#### Source ID {source-id} + +A source ID is an arbitrary string up to 2048 characters long. This is usually the ID of a file server or some other ID. + +#### Sample source IDs {#source-id-examples} + +| Type | ID | Description | +--- | --- | --- +| File | Server ID | Files are used to store application logs. In this case, it's convenient to use the server ID as a source ID. | +| User actions | ID of the class of user actions, such as "viewing a page", "making a purchase", and so on. | It's important to handle user actions in the order they were performed by the user. At the same time, there is no need to handle every single user action in one application. In this case, it's convenient to group user actions by class. | + +### Message group ID {#group-id} + +A message group ID is an arbitrary string up to 2048 characters long. This is usually a file name or user ID. + +#### Sample message group IDs {#group-id-examples} + +| Type | ID | Description | +--- | --- | --- +| File | Full file path | All data from the server and the file it hosts will be sent to the same partition. | +| User actions | User ID | It's important to handle user actions in the order they were performed. In this case, it's convenient to use the user ID as a source ID. | + +## Message sequence numbers {#seqno} + +All messages from the same source have a [`sequence number`](#seqno) used for their deduplication. A message sequence number should monotonically increase within a `topic`, `source` pair. If the server receives a message whose sequence number is less than or equal to the maximum number written for the `topic`, `source` pair, the message will be skipped as a duplicate. Some sequence numbers in the sequence may be skipped. Message sequence numbers must be unique within the `topic`, `source` pair. + +### Sample message sequence numbers {#seqno-examples} + +| Type | Example | Description | +--- | --- | --- +| File | Offset of transferred data from the beginning of a file | You can't delete lines from the beginning of a file, since this will lead to skipping some data as duplicates or losing some data. | +| DB table | Auto-increment record ID | + +## Message retention period { #retention-time } + +The message retention period is set for each topic. After it expires, messages are automatically deleted. An exception is data that hasn't been read by an [important](#important-consumer) consumer: this data will be stored until it's read. + +## Data compression { #message-codec } + +When transferring data, the producer app indicates that a message can be compressed using one of the supported codecs. The codec name is passed while writing a message, saved along with it, and returned when reading the message. Compression applies to each individual message, no batch message compression is supported. Data is compressed and decompressed on the producer and consumer apps end. + +Supported codecs are explicitly listed in each topic. When making an attempt to write data to a topic with a codec that is not supported, a write error occurs. + +| Codec | Description | +--- | --- +| `raw` | No compression. | +| `gzip` | [Gzip](https://en.wikipedia.org/wiki/Gzip) compression. | +{% if audience != "external" %} +`lzop` | [lzop](https://en.wikipedia.org/wiki/Lzop) compression. +{% endif %} +`zstd` | [zstd](https://en.wikipedia.org/wiki/Zstd) compression. + +## Consumer { #consumer } + +A consumer is a named entity that reads data from a topic. A consumer contains committed consumer offsets for each topic read on their behalf. + +### Consumer offset { #consumer-offset } + +A consumer offset is a saved [offset](#offset) of a consumer by each topic partition. It's saved by a consumer after sending commits of the data read. When a new read session is established, messages are delivered to the consumer starting with the saved consumer offset. This lets users avoid saving the consumer offset on their end. + +### Important consumer { #important-consumer } + +A consumer may be flagged as "important". This flag indicates that messages in a topic won't be removed until the consumer reads and confirms them. You can set this flag for most critical consumers that need to handle all data even if there's a long idle time. + +{% note warning %} + +As a long timeout of an important consumer may result in full use of all available free space by unread messages, be sure to monitor important consumers' data read lags. + +{% endnote %} diff --git a/ydb/docs/en/core/faq/_includes/common.md b/ydb/docs/en/core/faq/_includes/common.md index f80e16ed010..231499bb6c8 100644 --- a/ydb/docs/en/core/faq/_includes/common.md +++ b/ydb/docs/en/core/faq/_includes/common.md @@ -22,7 +22,7 @@ To read data, {{ ydb-short-name }} uses a model of strict data consistency. To design a primary key properly, follow the rules below. -* Avoid situations where the main load falls on a single [partition](../../concepts/datamodel.md#partitioning) of a table. With even load distribution, it's easier to achieve high overall performance. +* Avoid situations where the main load falls on a single [partition](../../concepts/datamodel/table.md#partitioning) of a table. With even load distribution, it's easier to achieve high overall performance. This rule implies that you shouldn't use a monotonically increasing sequence, such as timestamp, as a table's primary key. diff --git a/ydb/docs/en/core/reference/ydb-cli/_includes/commands.md b/ydb/docs/en/core/reference/ydb-cli/_includes/commands.md index feac8b9e77c..5a82582d0cb 100644 --- a/ydb/docs/en/core/reference/ydb-cli/_includes/commands.md +++ b/ydb/docs/en/core/reference/ydb-cli/_includes/commands.md @@ -8,7 +8,7 @@ General syntax for calling {{ ydb-short-name }} CLI commands: , where: -- `{{ ydb-cli}}` is the command to run the {{ ydb-short-name }} CLI from the OS command line. +- `{{ ydb-cli}}` is the command to run the {{ ydb-short-name }}CLI from the OS command line. - `[global options]` are [global options](../commands/global-options.md) that are common for all {{ ydb-short-name }} CLI commands. - `<command>` is the command. - `[<subcomand> ...]` are subcommands specified if the selected command contains subcommands. @@ -18,10 +18,10 @@ General syntax for calling {{ ydb-short-name }} CLI commands: You can learn about the necessary commands by selecting the subject section in the menu on the left or using the alphabetical list below. -Any command can be run from the command line with the `--help` option to get help on it. You can get a list of all supported {{ ydb-short-name }} CLI of commands by running the {{ ydb-short-name }} CLI with the `--help` option [with no command specified](../commands/service.md). +Any command can be run from the command line with the `--help` option to get help on it. You can get a list of all commands supported by the {{ ydb-short-name }} CLI by running the {{ ydb-short-name }} CLI with the `--help` option, but [without any command](../commands/service.md). | Command / subcommand | Brief description | -| --- | --- | +--- | --- | [config profile activate](../profile/activate.md) | Activating a [profile](../profile/index.md) | | [config profile create](../profile/create.md) | Creating a [profile](../profile/index.md) | | [config profile delete](../profile/create.md) | Deleting a [profile](../profile/index.md) | @@ -35,9 +35,9 @@ Any command can be run from the command line with the `--help` option to get hel | [import file tsv](../export_import/import-file.md) | Importing data from a TSV file | | [import s3](../export_import/s3_import.md) | Importing data from S3 storage | | [init](../profile/create.md) | Initializing the CLI, creating a [profile](../profile/index.md) | -| operation cancel | Aborting a background operation | -| operation forget | Removing a background operation from history | -| operation get | Background operation status | +| operation cancel | Aborting background operations | +| operation forget | Deleting background operations from history | +| operation get | Status of background operations | | operation list | List of background operations | | [scheme describe](../commands/scheme-describe.md) | Description of a data schema object | | [scheme ls](../commands/scheme-ls.md) | List of data schema objects | @@ -66,9 +66,16 @@ Any command can be run from the command line with the `--help` option to get hel | [tools dump](../export_import/tools_dump.md) | Dumping a directory or table to the file system | | [tools rename](../commands/tools/rename.md) | Renaming tables | | [tools restore](../export_import/tools_restore.md) | Restoring data from the file system | +| [topic create](../topic-create.md) | Creating a topic | +| [topic alter](../topic-alter.md) | Updating topic parameters and consumers | +| [topic drop](../topic-drop.md) | Deleting a topic | +| [topic consumer add](../topic-consumer-add.md) | Adding a consumer to a topic | +| [topic consumer drop](../topic-consumer-drop.md) | Deleting a consumer from a topic | +| [topic read](../topic-read.md) | Reading messages from a topic | +| [topic write](../topic-write.md) | Writing messages to a topic | {% if ydb-cli == "ydb" %} -[update](../commands/service.md) | Updating the {{ ydb-short-name }} CLI -[version](../commands/service.md) | Displaying the version of the {{ ydb-short-name }} CLI +[update](../commands/service.md) | Update the {{ ydb-short-name }} CLI +[version](../commands/service.md) | Output details about the {{ ydb-short-name }} CLI version {% endif %} -[workload](../commands/workload/index.md) | Generating YQL load | Running a YQL script (with streaming support) +[workload](../commands/workload/index.md) | Generate the yql workload | Execute a YQL script (with streaming support) diff --git a/ydb/docs/en/core/reference/ydb-cli/commands/_includes/dir.md b/ydb/docs/en/core/reference/ydb-cli/commands/_includes/dir.md index bd19db854c8..029e0bf89b0 100644 --- a/ydb/docs/en/core/reference/ydb-cli/commands/_includes/dir.md +++ b/ydb/docs/en/core/reference/ydb-cli/commands/_includes/dir.md @@ -1,6 +1,6 @@ # Directories -The {{ ydb-short-name }} database maintains an internal hierarchical structure of [directories](../../../../concepts/datamodel.md#dir) that can host database objects. +The {{ ydb-short-name }} database maintains an internal hierarchical structure of [directories](../../../../concepts/datamodel/dir.md) that can host database objects. {{ ydb-short-name }} CLI supports operations to change the directory structure and to access schema objects by their directory name. diff --git a/ydb/docs/en/core/reference/ydb-cli/commands/_includes/tools/rename.md b/ydb/docs/en/core/reference/ydb-cli/commands/_includes/tools/rename.md index ee82fd9b59e..b6eb2fca7db 100644 --- a/ydb/docs/en/core/reference/ydb-cli/commands/_includes/tools/rename.md +++ b/ydb/docs/en/core/reference/ydb-cli/commands/_includes/tools/rename.md @@ -1,6 +1,6 @@ # Renaming a table -Using the `tools rename` subcommand, you can [rename](../../../../../concepts/datamodel.md#rename) one or more tables at the same time, move a table to another directory within the same database, replace one table with another one within the same transaction. +Using the `tools rename` subcommand, you can [rename](../../../../../concepts/datamodel/table.md#rename) one or more tables at the same time, move a table to another directory within the same database, replace one table with another one within the same transaction. General command format: diff --git a/ydb/docs/en/core/reference/ydb-cli/toc_i.yaml b/ydb/docs/en/core/reference/ydb-cli/toc_i.yaml index 93b858516a1..919da924171 100644 --- a/ydb/docs/en/core/reference/ydb-cli/toc_i.yaml +++ b/ydb/docs/en/core/reference/ydb-cli/toc_i.yaml @@ -1,7 +1,7 @@ items: - - name: Install + - name: Installation href: install.md - - name: Structure of YDB CLI commands + - name: All commands in alphabetical order href: commands.md - name: Service commands href: commands/service.md @@ -25,7 +25,7 @@ items: items: - name: Making a DB query href: commands/query.md - - name: Query execution plan + - name: Getting a query execution plan and AST href: commands/explain-plan.md - name: Streaming table reads href: commands/readtable.md @@ -33,6 +33,26 @@ items: href: commands/scan-query.md - name: Importing and exporting data include: { mode: link, path: export_import/toc_p.yaml } + - name: Working with topics + items: + - name: Commands for topics + href: topic-overview.md + - name: Creating a topic + href: topic-create.md + - name: Updating a topic + href: topic-alter.md + - name: Deleting a topic + href: topic-drop.md + - name: Adding a topic consumer + href: topic-consumer-add.md + - name: Deleting a topic consumer + href: topic-consumer-drop.md + - name: Reading messages from a topic + href: topic-read.md + - name: Writing messages to a topic + href: topic-write.md + - name: Message pipeline processing + href: topic-pipeline.md # - name: Utilities # items: # - name: Copy tables @@ -49,11 +69,13 @@ items: href: commands/discovery-list.md - name: Authentication href: commands/discovery-whoami.md - - name: YDB CLI version output + - name: Getting the YDB CLI version href: version.md - name: Load testing items: - name: Overview href: commands/workload/index.md - name: Stock load - href: commands/workload/stock.md
\ No newline at end of file + href: commands/workload/stock.md + + diff --git a/ydb/docs/en/core/reference/ydb-cli/topic-alter.md b/ydb/docs/en/core/reference/ydb-cli/topic-alter.md new file mode 100644 index 00000000000..4b3f0ce609f --- /dev/null +++ b/ydb/docs/en/core/reference/ydb-cli/topic-alter.md @@ -0,0 +1,56 @@ +# Updating a topic + +You can use the `topic alter` subcommand to update a [previously created](topic-create.md) topic. + +General format of the command: + +```bash +{{ ydb-cli }} [global options...] topic alter [options...] <topic-path> +``` + +* `global options`: [Global parameters](commands/global-options.md). +* `options`: [Parameters of the subcommand](#options). +* `topic-path`: Topic path. + +View the description of the update topic command: + +```bash +{{ ydb-cli }} topic alter --help +``` + +## Parameters of the subcommand {#options} + +The command changes the values of parameters specified in the command line. The other parameter values remain unchanged. + +| Name | Description | +---|--- +| `--partitions-count VAL` | The number of topic [partitions](../../concepts/topic.md#partitioning). You can only increase the number of partitions. | +| `--retention-period-hours VAL` | The retention period for topic data, in hours. | +| `--supported-codecs STRING` | Supported data compression methods. <br>Possible values:<ul><li>`RAW`: Without compression.</li><li>`ZSTD`: [zstd](https://ru.wikipedia.org/wiki/Zstandard) compression.</li><li>`GZIP`: [gzip](https://ru.wikipedia.org/wiki/Gzip) compression.</li><li>`LZOP`: [lzop](https://ru.wikipedia.org/wiki/Lzop) compression.</li></ul> | + +## Examples {#examples} + +{% include [ydb-cli-profile](../../_includes/ydb-cli-profile.md) %} + +Add a partition and the `lzop` compression method to the [previously created](topic-create.md) topic: + +```bash +{{ ydb-cli }} -p db1 topic alter \ + --partitions-count 3 \ + --supported-codecs raw,gzip,lzop \ + my-topic +``` + +Make sure that the topic parameters have been updated: + +```bash +{{ ydb-cli }} -p db1 scheme describe my-topic +``` + +Result: + +```text +RetentionPeriod: 2 hours +PartitionsCount: 3 +SupportedCodecs: RAW, GZIP, LZOP +``` diff --git a/ydb/docs/en/core/reference/ydb-cli/topic-consumer-add.md b/ydb/docs/en/core/reference/ydb-cli/topic-consumer-add.md new file mode 100644 index 00000000000..cb453f46249 --- /dev/null +++ b/ydb/docs/en/core/reference/ydb-cli/topic-consumer-add.md @@ -0,0 +1,60 @@ +# Adding a topic consumer + +You can use the `topic consumer add` command to add a consumer for a [previously created](topic-create.md) topic. + +General format of the command: + +```bash +{{ ydb-cli }} [global options...] topic consumer add [options...] <topic-path> +``` + +* `global options`: [Global parameters](commands/global-options.md). +* `options`: [Parameters of the subcommand](#options). +* `topic-path`: Topic path. + +View the description of the add consumer command: + +```bash +{{ ydb-cli }} topic consumer add --help +``` + +## Parameters of the subcommand {#options} + +| Name | Description | +---|--- +| `--consumer-name VAL` | Name of the consumer to be added. | +| `--starting-message-timestamp VAL` | Time in [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) format. Consumption starts as soon as the first [message](../../concepts/topic.md#message) is received after the specified time. If the time is not specified, consumption will start from the oldest message in the topic. | + +## Examples {#examples} + +{% include [ydb-cli-profile](../../_includes/ydb-cli-profile.md) %} + +Create a consumer with the `my-consumer` name for the [previously created](topic-create.md) `my-topic` topic. Consumption will start as soon as the first message is received after August 15, 2022 13:00:00 GMT: + +```bash +{{ ydb-cli }} -p db1 topic consumer add \ + --consumer-name my-consumer \ + --starting-message-timestamp 1660568400 \ + my-topic +``` + +Make sure the consumer was created: + +```bash +{{ ydb-cli }} -p db1 scheme describe my-topic +``` + +Result: + +```text +RetentionPeriod: 2 hours +PartitionsCount: 2 +SupportedCodecs: RAW, GZIP + +Consumers: +┌──────────────┬─────────────────┬───────────────────────────────┬───────────┐ +| ConsumerName | SupportedCodecs | ReadFrom | Important | +├──────────────┼─────────────────┼───────────────────────────────┼───────────┤ +| my-consumer | RAW, GZIP | Mon, 15 Aug 2022 16:00:00 MSK | 0 | +└──────────────┴─────────────────┴───────────────────────────────┴───────────┘ +``` diff --git a/ydb/docs/en/core/reference/ydb-cli/topic-consumer-drop.md b/ydb/docs/en/core/reference/ydb-cli/topic-consumer-drop.md new file mode 100644 index 00000000000..bc230dcb1e4 --- /dev/null +++ b/ydb/docs/en/core/reference/ydb-cli/topic-consumer-drop.md @@ -0,0 +1,37 @@ +# Deleting a topic consumer + +You can use the `topic consumer drop` command to delete a [previously added](topic-consumer-add.md) consumer. + +General format of the command: + +```bash +{{ ydb-cli }} [global options...] topic consumer drop [options...] <topic-path> +``` + +* `global options`: [Global parameters](commands/global-options.md). +* `options`: [Parameters of the subcommand](#options). +* `topic-path`: Topic path. + +View the description of the delete consumer command: + +```bash +{{ ydb-cli }} topic consumer drop --help +``` + +## Parameters of the subcommand {#options} + +| Name | Description | +---|--- +| `--consumer-name VAL` | Name of the consumer to be deleted. | + +## Examples {#examples} + +{% include [ydb-cli-profile](../../_includes/ydb-cli-profile.md) %} + +Delete the [previously created](#consumer-add) consumer with the `my-consumer` name for the `my-topic` topic: + +```bash +{{ ydb-cli }} -p db1 topic consumer drop \ + --consumer-name my-consumer \ + my-topic +``` diff --git a/ydb/docs/en/core/reference/ydb-cli/topic-create.md b/ydb/docs/en/core/reference/ydb-cli/topic-create.md new file mode 100644 index 00000000000..061dab9f763 --- /dev/null +++ b/ydb/docs/en/core/reference/ydb-cli/topic-create.md @@ -0,0 +1,55 @@ +# Creating a topic + +You can use the `topic create` subcommand to create a new topic. + +General format of the command: + +```bash +{{ ydb-cli }} [global options...] topic create [options...] <topic-path> +``` + +* `global options`: [Global parameters](commands/global-options.md). +* `options`: [Parameters of the subcommand](#options). +* `topic-path`: Topic path. + +View the description of the create topic command: + +```bash +{{ ydb-cli }} topic create --help +``` + +## Parameters of the subcommand {#options} + +| Name | Description | +---|--- +| `--partitions-count VAL` | The number of topic [partitions](../../concepts/topic.md#partitioning).<br>The default value is `1`. | +| `--retention-period-hours VAL` | Data retention time in a topic, set in hours.<br>The default value is `18`. | +| `--supported-codecs STRING` | Supported data compression methods.<br>The default value is `raw,zstd,gzip,lzop`.<br>Possible values:<ul><li>`RAW`: Without compression.</li><li>`ZSTD`: [zstd](https://ru.wikipedia.org/wiki/Zstandard) compression.</li><li>`GZIP`: [gzip](https://ru.wikipedia.org/wiki/Gzip) compression.</li><li>`LZOP`: [lzop](https://ru.wikipedia.org/wiki/Lzop) compression.</li></ul> | + +## Examples {examples} + +{% include [ydb-cli-profile](../../_includes/ydb-cli-profile.md) %} + +Create a topic with 2 partitions, `RAW` and `GZIP` compression methods, message retention time of 2 hours, and the `my-topic` path: + +```bash +{{ ydb-cli }} -p db1 topic create \ + --partitions-count 2 \ + --supported-codecs raw,gzip \ + --retention-period-hours 2 \ + my-topic +``` + +View parameters of the created topic: + +```bash +{{ ydb-cli }} -p db1 scheme describe my-topic +``` + +Result: + +```text +RetentionPeriod: 2 hours +PartitionsCount: 2 +SupportedCodecs: RAW, GZIP +``` diff --git a/ydb/docs/en/core/reference/ydb-cli/topic-drop.md b/ydb/docs/en/core/reference/ydb-cli/topic-drop.md new file mode 100644 index 00000000000..e3bce7884f3 --- /dev/null +++ b/ydb/docs/en/core/reference/ydb-cli/topic-drop.md @@ -0,0 +1,34 @@ +# Deleting a topic + +You can use the `topic drop` subcommand to delete a [previously created](topic-create.md) topic. + +{% note info %} + +Deleting a topic also deletes all the consumers added for it. + +{% endnote %} + +General format of the command: + +```bash +{{ ydb-cli }} [global options...] topic drop <topic-path> +``` + +* `global options`: [Global parameters](commands/global-options.md). +* `topic-path`: Topic path. + +View the description of the delete topic command: + +```bash +{{ ydb-cli }} topic drop --help +``` + +## Examples {#examples} + +{% include [ydb-cli-profile](../../_includes/ydb-cli-profile.md) %} + +Delete the [previously created](topic-create.md) topic: + +```bash +{{ ydb-cli }} -p db1 topic drop my-topic +``` diff --git a/ydb/docs/en/core/reference/ydb-cli/topic-overview.md b/ydb/docs/en/core/reference/ydb-cli/topic-overview.md new file mode 100644 index 00000000000..f6cb2d50382 --- /dev/null +++ b/ydb/docs/en/core/reference/ydb-cli/topic-overview.md @@ -0,0 +1,11 @@ +# Commands for topics + +Using {{ ydb-short-name }} CLI commands, you can perform the following operations: + +* [{#T}](topic-create.md). +* [{#T}](topic-alter.md). +* [{#T}](topic-drop.md). +* [{#T}](topic-consumer-add.md). +* [{#T}](topic-consumer-drop.md). +* [{#T}](topic-read.md). +* [{#T}](topic-write.md). diff --git a/ydb/docs/en/core/reference/ydb-cli/topic-pipeline.md b/ydb/docs/en/core/reference/ydb-cli/topic-pipeline.md new file mode 100644 index 00000000000..618f7a6625e --- /dev/null +++ b/ydb/docs/en/core/reference/ydb-cli/topic-pipeline.md @@ -0,0 +1,40 @@ +# Message pipeline processing + +The use of the `topic read` and `topic write` commands with standard I/O devices and support for reading messages in streaming mode lets you build full-featured integration scenarios with message transfer across topics and their conversion. This section describes a number of these scenarios. + +{% include [ydb-cli-profile](../../_includes/ydb-cli-profile.md) %} + +* Transferring a single message from `topic1` in `db1` to `topic2` in `db2`, waiting for it to appear in the source topic + + ```bash + {{ ydb-cli }} -p db1 topic read topic1 -c c1 -w | {{ ydb-cli }} -p db2 topic write topic2 + ``` + +* Transferring all one-line messages that appear in `topic1` in `db1` to `topic2` in `db2` in background mode. You can use this scenario if it's guaranteed that there are no `0x0A` bytes (newline) in source messages. + + ```bash + {{ ydb-cli }} -p db1 topic read topic1 -c c1 --format newline-delimited -w | \ + {{ ydb-cli }} -p db2 topic write topic2 --format newline-delimited + ``` + +* Transferring an exact binary copy of all messages that appear in `topic1` in `db1` to `topic2` in `db2` in background mode with base64-encoding of messages in the transfer stream. + + ```bash + {{ ydb-cli }} -p db1 topic read topic1 -c c1 --format newline-delimited -w --transform base64 | \ + {{ ydb-cli }} -p db2 topic write topic2 --format newline-delimited --transform base64 + ``` + +* Transferring a limited batch of one-line messages filtered by the `ERROR` substring + + ```bash + {{ ydb-cli }} -p db1 topic read topic1 -c c1 --format newline-delimited | \ + grep ERROR | \ + {{ ydb-cli }} -p db2 topic write topic2 --format newline-delimited + ``` + +* Writing YQL query results as messages to `topic1` + + ```bash + {{ ydb-cli }} -p db1 yql -s "select * from series" --format json-unicode | \ + {{ ydb-cli }} -p db1 topic write topic1 --format newline-delimited + ``` diff --git a/ydb/docs/en/core/reference/ydb-cli/topic-read.md b/ydb/docs/en/core/reference/ydb-cli/topic-read.md new file mode 100644 index 00000000000..606a8ca547e --- /dev/null +++ b/ydb/docs/en/core/reference/ydb-cli/topic-read.md @@ -0,0 +1,117 @@ +# Reading messages from a topic + +The `topic read` command reads messages from a topic and outputs them to a file or the command-line terminal: + +```bash +{{ ydb-cli }} [connection options] topic read <topic-path> --consumer-name STR \ + [--format STR] [--wait] [--limit INT] \ + [--transform STR] [--file STR] [--commit BOOL] \ + [additional parameters...] +``` + +{% include [conn_options_ref.md](commands/_includes/conn_options_ref.md) %} + +Three command modes are supported: + +1. **Single message**. No more than one message is read from a topic. +2. **Batch mode**. Messages are read from a topic until it runs out of messages for processing or their number exceeds the limit that must be set. +3. **Streaming mode**. Messages are read from a topic as they appear while waiting for new messages to arrive until you terminate the command with `Ctrl+C` or the number of messages exceeds the limit that is set optionally. + +## Parameters {#options} + +### Required parameters + +| Name | Description | +---|--- +| `<topic-path>` | Topic path | +| `-c VAL`, `--consumer-name VAL` | Topic consumer name.<br>Message consumption starts from the current offset for this consumer (if the `--timestamp` parameter is not specified).<br>The current offset is shifted as messages are consumed and output (if `--commit=false` is not set). | + +### Basic optional parameters + +`--format STR`: Output format. + +- Specifies how to format messages at the output. Some formats don't support streaming mode. +- List of supported formats: + + | Name | Description | Is<br>streaming mode supported? | + ---|---|--- + | `single-message`<br>(default) | The contents of no more than one message are output without formatting. | - | + | `pretty` | Output to a pseudo-graphic table with columns containing message metadata. The message itself is output to the `body` column. | No | + | `newline-delimited` | Messages are output with a delimiter (`0x0A` newline character) added after each message. | Yes | + | `concatenated` | Messages are output one after another with no delimiter added. | Yes | + +`--wait` (`-w`): Waiting for new messages to arrive. + +- Enables waiting for the first message to appear in a topic. If not set and the topic has no messages to handle, the command is terminated once started. If the flag is set, the started read message command waits for the first message to arrive to be processed. +- Enables streaming selection mode for the formats that support it, or else batch mode is used. + +`--limit INT`: The maximum number of messages that can be consumed from a topic. + +- The default and acceptable values depend on the selected output format: + + | Does the format<br>support streaming selection mode? | Default limit value | Acceptable values | + ---|---|--- + | No | 10 | 1-500 | + | Yes | 0 (no limit) | 0-500 | + +`--transform VAL`: Method for transforming messages. + +- Defaults to `none`. +- Possible values: + `base64`: A message is transformed into [Base64](https://ru.wikipedia.org/wiki/Base64) + `none`: The contents of a message are output byte by byte without transforming them. + +`--file VAL` (`-f VAL`): Write the messages read to the specified file. If not set, messages are output to `stdout`. + +`--commit BOOL`: Commit message reads. + +1. If `true` (by default), a consumer's current offset is shifted as topic messages are consumed. +2. Possible values: `true` or `false`. + +### Other optional parameters + +| Name | Description | +---|--- +| `--idle-timeout VAL` | Timeout for deciding if a topic is empty, meaning that it contains no messages for processing. <br>The time is counted from the point when a connection is established once the command is run or when the last message is received. If no new messages arrive from the server during the specified timeout, the topic is considered to be empty.<br>Defaults to `1s` (1 second). | +| `--timestamp VAL` | Message consumption starts from the point in time specified in [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) format.<br>If not set, messages are consumed starting from the consumer's current offset in the topic.<br>If set, consumption starts from the first [message](../../concepts/topic.md#message) received after the specified time. | +| `--with-metadata-fields VAL` | List of [message attributes](../../concepts/topic.md#message) whose values should be output in columns with metadata in `pretty` format. If not set, columns with all attributes are output. <br>Possible values:<ul><li>`write_time`: The time a message is written to the server in [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) format.</li><li>`meta`: Message metadata.</li><li>`create_time`: The time a message is created by the source in [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) format.</li><li>`seq_no`: Message [sequence number](../../concepts/topic.md#seqno).</li><li>`offset`: [Message sequence number within a partition](../../concepts/topic.md#offset).</li><li>`message_group_id`: [Message group ID](../../concepts/topic.md#producer-id).</li><li>`body`: Message body.</li></ul> | + +## Examples {#examples} + +{% include [ydb-cli-profile](../../_includes/ydb-cli-profile.md) %} + +In all the examples below, a topic named `topic1` and a consumer named `c1` are used. + +* Reading a single message with output to the terminal: If the topic doesn't contain new messages for this consumer, the command terminates with no data output: + ```bash + {{ ydb-cli }} -p db1 topic read topic1 -c c1 + ``` + +* Waiting for and reading a single message written to a file named `message.bin`. The command keeps running until new messages appear in the topic for this consumer. However, you can terminate it with `Ctrl+C`: + ```bash + {{ ydb-cli }} -p db1 topic read topic1 -c c1 -w -f message.bin + ``` + +* Viewing information about messages waiting to be handled by the consumer without committing them. Up to 10 first messages are output: + ```bash + {{ ydb-cli }} -p db1 topic read topic1 -c c1 --format pretty --commit false + ``` + +* Output messages to the terminal as they appear, using newline delimiter characters and transforming messages into Base64. The command will be running until you terminate it with `Ctrl+C`: + ```bash + {{ ydb-cli }} -p db1 topic read topic1 -c c1 -w --format newline-delimited --transform base64 + ``` + +* Track when new messages with the `ERROR` text appear in the topic and output them to the terminal once they arrive: + ```bash + {{ ydb-cli }} -p db1 topic read topic1 -c c1 --format newline-delimited -w | grep ERROR + ``` + +* Receive another non-empty batch of no more than 150 messages transformed into base64, delimited with newline characters, and written to the `batch.txt` file: + ```bash + {{ ydb-cli }} -p db1 topic read topic1 -c c1 \ + --format newline-delimited -w --limit 150 \ + --transform base64 -f batch.txt + ``` + +* [Examples of YDB CLI command integration](topic-pipeline.md) diff --git a/ydb/docs/en/core/reference/ydb-cli/topic-write.md b/ydb/docs/en/core/reference/ydb-cli/topic-write.md new file mode 100644 index 00000000000..449545e84c5 --- /dev/null +++ b/ydb/docs/en/core/reference/ydb-cli/topic-write.md @@ -0,0 +1,70 @@ +# Writing messages to a topic + +The `topic write` command writes messages to a topic from a file or `stdin`: + +```bash +{{ ydb-cli }} [connection options] topic write <topic-path> \ + [--file STR] [--format STR] [--transform STR] \ + [additional parameters...] +``` + +{% include [conn_options_ref.md](commands/_includes/conn_options_ref.md) %} + +## Parameters {#options} + +### Basic parameters + +`<topic-path>`: Topic path, the only required parameter. + +`--file VAL` (`-f VAL`): Read a stream of incoming messages and write them to a topic from the specified file. If not set, messages are read from `stdin`. + +`--format STR`: Format of the incoming message stream. +* Supported formats + + | Name | Description | + ---|--- + | `single-message`<br>(default) | The entire input stream is treated as a single message to be written to the topic. | + | `newline-delimited` | A stream at the input contains multiple messages delimited with the `0x0A` newline character. | + +`--transform VAL`: Method for transforming messages. + +- Defaults to `none`. +- Possible values: + `base64`: Decode each message in the input stream from [Base64](https://ru.wikipedia.org/wiki/Base64) and write the output to the topic. If decoding fails, the command is aborted with an error. + `none`: Write the contents of a message from the input stream to the topic byte by byte without transforming them. + +### Additional parameters + +| Name | Description | +---|--- +| `--delimiter STR` | Delimiter byte. The input stream is delimited into messages with the specified byte. Specified only if no `--format` is set. Specified as an escaped string. | +| `--message-group-id STR` | Message group string ID. If not set, all messages generated from the input stream are assigned the same ID value as a hexadecimal string representation of a random three-byte integer. | +| `--codec STR` | Codec used for message compression on the client before sending them to the server. Possible values: `RAW` (no compression, default), `GZIP`, and `ZSTD`. Compression causes higher CPU utilization on the client when reading and writing messages, but usually lets you reduce the volume of data transferred over the network and stored. When consumers read messages, they're automatically decompressed with the codec used when writing them, without specifying any special options. Make sure the specified codec is listed in the [topic parameters](topic-create.md#create-options) as supported. | + +## Examples {#examples} + +{% include [ydb-cli-profile](../../_includes/ydb-cli-profile.md) %} + +All the examples given below use a topic named `topic1`. + +* Writing a terminal input to a single message Once the command is run, you can type any multi-line text and press `Ctrl+D` to input it. + ```bash + {{ ydb-cli }} -p db1 topic write topic1 + ``` + +* Writing the contents of the `message.bin` file to a single message compressed with the GZIP codec + ```bash + {{ ydb-cli }} -p db1 topic write topic1 -f message.bin --codec GZIP + ``` + +* Writing the contents of the `example.txt` file delimited into messages line by line + ```bash + {{ ydb-cli }} -p db1 topic write topic1 -f example.txt --format newline-delimited + ``` + +* Writing a resource downloaded via HTTP and delimited into messages with tab characters + ```bash + curl http://example.com/resource | {{ ydb-cli }} -p db1 topic write topic1 --delimiter "\t" + ``` + +* [Examples of YDB CLI command integration](topic-pipeline.md) diff --git a/ydb/docs/en/core/reference/ydb-cli/topic.md b/ydb/docs/en/core/reference/ydb-cli/topic.md new file mode 100644 index 00000000000..4e597bf2c63 --- /dev/null +++ b/ydb/docs/en/core/reference/ydb-cli/topic.md @@ -0,0 +1,293 @@ +# Working with topics + +You can use the `topic` subcommand to create, update, or delete a [topic](../../concepts/topic.md) as well as to create or delete a [consumer](../../concepts/topic.md#consumer). + +The examples use the `db1` profile. To learn more, see [{#T}](../../getting_started/cli.md#profile). + +## Creating a topic {#topic-create} + +You can use the `topic create` subcommand to create a new topic. + +General format of the command: + +```bash +{{ ydb-cli }} [global options...] topic create [options...] <topic-path> +``` + +* `global options`: [Global parameters](commands/global-options.md). +* `options`: [Parameters of the subcommand](#options). +* `topic-path`: Topic path. + +View the description of the create topic command: + +```bash +{{ ydb-cli }} topic create --help +``` + +### Parameters of the subcommand {#topic-create-options} + +| Name | Description | +---|--- +| `--partitions-count VAL` | The number of topic [partitions](../../concepts/topic.md#partitioning).<br>The default value is `1`. | +| `--retention-period-hours VAL` | Data retention time in a topic, set in hours.<br>The default value is `18`. | +| `--supported-codecs STRING` | Supported data compression methods.<br>The default value is `raw,zstd,gzip,lzop`.<br>Possible values:<ul><li>`RAW`: Without compression.</li><li>`ZSTD`: [zstd](https://ru.wikipedia.org/wiki/Zstandard) compression.</li><li>`GZIP`: [gzip](https://ru.wikipedia.org/wiki/Gzip) compression.</li><li>`LZOP`: [lzop](https://ru.wikipedia.org/wiki/Lzop) compression.</li></ul> | + +### Examples {#topic-create-examples} + +Create a topic with 2 partitions, `RAW` and `GZIP` compression methods, message retention time of 2 hours, and the `my-topic` path: + +```bash +{{ ydb-cli }} -p db1 topic create \ + --partitions-count 2 \ + --supported-codecs raw,gzip \ + --retention-period-hours 2 \ + my-topic +``` + +View parameters of the created topic: + +```bash +{{ ydb-cli }} -p db1 scheme describe my-topic +``` + +Result: + +```text +RetentionPeriod: 2 hours +PartitionsCount: 2 +SupportedCodecs: RAW, GZIP +``` + +## Updating a topic {#topic-alter} + +You can use the `topic alter` subcommand to update a [previously created](#topic-create) topic. + +General format of the command: + +```bash +{{ ydb-cli }} [global options...] topic alter [options...] <topic-path> +``` + +* `global options`: [Global parameters](commands/global-options.md). +* `options`: [Parameters of the subcommand](#options). +* `topic-path`: Topic path. + +View the description of the update topic command: + +```bash +{{ ydb-cli }} topic alter --help +``` + +### Parameters of the subcommand {#topic-alter-options} + +| Name | Description | +---|--- +| `--partitions-count VAL` | The number of topic [partitions](../../concepts/topic.md#partitioning).<br>The default value is `1`. | +| `--retention-period-hours VAL` | Data retention time in a topic, set in hours.<br>The default value is `18`. | +| `--supported-codecs STRING` | Supported data compression methods.<br>The default value is `raw,zstd,gzip,lzop`.<br>Possible values:<ul><li>`RAW`: Without compression.</li><li>`ZSTD`: [zstd](https://ru.wikipedia.org/wiki/Zstandard) compression.</li><li>`GZIP`: [gzip](https://ru.wikipedia.org/wiki/Gzip) compression.</li><li>`LZOP`: [lzop](https://ru.wikipedia.org/wiki/Lzop) compression.</li></ul> | + +### Examples {#topic-alter-examples} + +Add a partition and the `lzop` compression method to the [previously created](#topic-create) topic: + +```bash +{{ ydb-cli }} -p db1 topic alter \ + --partitions-count 3 \ + --supported-codecs raw,gzip,lzop \ + my-topic +``` + +Make sure that the topic parameters have been updated: + +```bash +{{ ydb-cli }} -p db1 scheme describe my-topic +``` + +Result: + +```text +RetentionPeriod: 2 hours +PartitionsCount: 3 +SupportedCodecs: RAW, GZIP, LZOP +``` + +## Deleting a topic {#topic-drop} + +You can use the `topic drop` subcommand to delete a [previously created](#topic-create) topic. + +General format of the command: + +```bash +{{ ydb-cli }} [global options...] topic drop <topic-path> +``` + +* `global options`: [Global parameters](commands/global-options.md). +* `topic-path`: Topic path. + +View the description of the delete topic command: + +```bash +{{ ydb-cli }} topic drop --help +``` + +### Examples {#topic-drop-examples} + +Delete the [previously created](#topic-create) topic: + +```bash +{{ ydb-cli }} -p db1 topic drop my-topic +``` + +## Adding a consumer for a topic {#consumer-add} + +You can use the `topic consumer add` subcommand to create a consumer for a [previously created](#topic-create) topic. + +General format of the command: + +```bash +{{ ydb-cli }} [global options...] topic consumer add [options...] <topic-path> +``` + +* `global options`: [Global parameters](commands/global-options.md). +* `options`: [Parameters of the subcommand](#options). +* `topic-path`: Topic path. + +View the description of the add consumer command: + +```bash +{{ ydb-cli }} topic consumer add --help +``` + +### Parameters of the subcommand {#consumer-add-options} + +| Name | Description | +---|--- +| `--consumer-name VAL` | Name of the consumer to be created. | +| `--starting-message-timestamp VAL` | Time in [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) format. Consumption starts as soon as the first [message](../../concepts/topic.md#message) is received after the specified time. If the time is not specified, consumption will start from the oldest message in the topic. | + +### Examples {#consumer-add-examples} + +Create a consumer with the `my-consumer` name for the [previously created](#topic-create) `my-topic` topic. Consumption will start as soon as the first message is received after August 15, 2022 13:00:00 GMT: + +```bash +{{ ydb-cli }} -p db1 topic consumer add \ + --consumer-name my-consumer \ + --starting-message-timestamp 1660568400 \ + my-topic +``` + +Make sure the consumer was created: + +```bash +{{ ydb-cli }} -p db1 scheme describe my-topic +``` + +Result: + +```text +RetentionPeriod: 2 hours +PartitionsCount: 2 +SupportedCodecs: RAW, GZIP + +Consumers: +┌──────────────┬─────────────────┬───────────────────────────────┬───────────┐ +| ConsumerName | SupportedCodecs | ReadFrom | Important | +├──────────────┼─────────────────┼───────────────────────────────┼───────────┤ +| my-consumer | RAW, GZIP | Mon, 15 Aug 2022 16:00:00 MSK | 0 | +└──────────────┴─────────────────┴───────────────────────────────┴───────────┘ +``` + +## Deleting a consumer {#consumer-drop} + +You can use the `topic consumer drop` subcommand to delete a [previously created](#consumer-add) consumer. + +General format of the command: + +```bash +{{ ydb-cli }} [global options...] topic consumer drop [options...] <topic-path> +``` + +* `global options`: [Global parameters](commands/global-options.md). +* `options`: [Parameters of the subcommand](#options). +* `topic-path`: Topic path. + +View the description of the delete consumer command: + +```bash +{{ ydb-cli }} topic consumer drop --help +``` + +### Parameters of the subcommand {#consumer-drop-options} + +| Name | Description | +---|--- +| `--consumer-name VAL` | Name of the consumer to be deleted. | + +### Examples {#consumer-drop-examples} + +Delete the [previously created](#consumer-add) consumer with the `my-consumer` name for the `my-topic` topic: + +```bash +{{ ydb-cli }} -p db1 topic consumer drop \ + --consumer-name my-consumer \ + my-topic +``` + +## Reading data from a topic {#topic-read} + +Use the `topic read` subcommand to read messages from a topic. + +Before reading, [create a topic](#topic-create) and [add a consumer](#consumer-add). + +General format of the command: + +```bash +{{ ydb-cli }} [global options...] topic read [options...] <topic-path> +``` + +* `global options`: [Global parameters](commands/global-options.md). +* `options`: [Parameters of the subcommand](#options). +* `topic-path`: Topic path. + +View a description of the read command from the topic: + +```bash +{{ ydb-cli }} topic read --help +``` + +### Parameters of the subcommand {#topic-read} + +| Name | Description | +---|--- +| `-c VAL`, `--consumer-name VAL` | Topic consumer name. | +| `--format STRING` | Result format.<br>Possible values:<ul><li>`pretty`: Result is printed to a pseudo-graphic table.</li><li>`newline-delimited`: The `0x0A` control character is printed at the end of each message.</li><li>`concatenated`: Result is printed without separators.</li></ul> | +| `-f VAL`, `--file VAL` | Write readable data to the specified file.<br>If the parameter is not specified, messages are printed to `stdout`. | +| `--idle-timeout VAL` | Maximum waiting time for a new message.<br>If no messages are received during the waiting time, reading stops.<br>The default value is `1s` (1 second). | +| `--commit VAL` | Sending confirmation for message processing.<br>The default value is `true`.<br>Possible values: `true`, `false`. | +| `--limit VAL` | The number of messages to be read.<br>The default value is `0` (no limits). | +| `-w`, `--wait` | Endless wait for the first message.<br>If the parameter is not specified, the first message is waited for for no more than `--idle-timeout`. | +| `--timestamp VAL` | Time in [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) format. Consumption starts as soon as the first [message](../../concepts/topic.md#message) is received after the specified time. | +| `--with-metadata-fields VAL` | A list of [message attributes](../../concepts/topic.md#message) whose values are to be printed.<br>Possible values:<ul><li>`write_time`: The time a message is written to the server in [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) format.</li><li>`meta`: Message metadata.</li><li>`create_time`: The time a message is created by the source in [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) format.</li><li>`seq_no`: Message [sequence number](../../concepts/topic.md#seqno).</li><li>`offset`: [Message sequence number within a partition](../../concepts/topic.md#offset).</li><li>`message_group_id`: [Message group ID](../../concepts/topic.md#producer-id).</li><li>`body`: Message body.</li></ul> | +| `--transform VAL` | Specify the format of the message body to be converted.<br>The default value is `none`.<br>Possible values:<ul><li>`base64`: Convert to [Base64](https://en.wikipedia.org/wiki/Base64).</li><li>`none`: Do not convert.</li></ul> | + +### Examples {#topic-read} + +Read all messages from the `my-topic` topic through the `my-consumer` consumer and print each of them on a separate line: + +```bash +{{ ydb-cli }} topic read \ + --consumer-name my-consumer \ + --format newline-delimited \ + my-topic +``` + +The following command will read the first 10 messages from the `my-topic` topic through the `my-consumer` consumer and print each of them on a separate line. Before that, the message body will be converted to Base64: + +```bash +{{ ydb-cli }} topic read \ + --consumer-name my-consumer \ + --format newline-delimited + --limit 10 \ + --transform base64 \ + my-topic +``` diff --git a/ydb/docs/en/core/reference/ydb-sdk/overview-grpc-api.md b/ydb/docs/en/core/reference/ydb-sdk/overview-grpc-api.md index b1b1eaa08fb..286bc6e904e 100644 --- a/ydb/docs/en/core/reference/ydb-sdk/overview-grpc-api.md +++ b/ydb/docs/en/core/reference/ydb-sdk/overview-grpc-api.md @@ -1,6 +1,6 @@ # gRPC API overview -{{ ydb-short-name }} provides the gRPC API, which you can use to manage your DB [resources](../../concepts/datamodel.md) and data. API methods and data structures are described using [Protocol Buffers](https://developers.google.com/protocol-buffers/docs/proto3) (proto 3). For more information, see [.proto specifications with comments](https://github.com/ydb-platform/ydb-api-protos). +{{ ydb-short-name }} provides the gRPC API, which you can use to manage your DB [resources](../../concepts/datamodel/index.md) and data. API methods and data structures are described using [Protocol Buffers](https://developers.google.com/protocol-buffers/docs/proto3) (proto 3). For more information, see [.proto specifications with comments](https://github.com/ydb-platform/ydb-api-protos). The following services are available: diff --git a/ydb/docs/en/core/reference/ydb-sdk/topic.md b/ydb/docs/en/core/reference/ydb-sdk/topic.md new file mode 100644 index 00000000000..ad4d6a3ff67 --- /dev/null +++ b/ydb/docs/en/core/reference/ydb-sdk/topic.md @@ -0,0 +1,248 @@ +# Working with topics + +This article provides examples of how to use {{ ydb-short-name }} SDK to work with [topics](../../concepts/topic.md). + +Before performing the examples, [create a topic](../ydb-cli/topic-create.md) and [add a consumer](../ydb-cli/topic-consumer-add.md). + +## Connecting to a topic {#start-reader} + +To create a connection to the existing `my-topic` topic via the added `my-consumer` consumer, use the following code: + +{% list tabs %} + +- Go + + ```go + reader, err := db.Topic().StartReader("my-consumer", topicoptions.ReadTopic("my-topic")) + if err != nil { + return err + } + ``` + +{% endlist %} + +You can also use the advanced connection creation option to specify multiple topics and set reading parameters. The following code will create a connection to the `my-topic` and `my-specific-topic` topics via the `my-consumer` consumer and also set the time to start reading messages from: + +{% list tabs %} + +- Go + + ```go + reader, err := db.Topic().StartReader("my-consumer", []topicoptions.ReadSelector{ + { + Path: "my-topic", + }, + { + Path: "my-specific-topic", + ReadFrom: time.Date(2022, 7, 1, 10, 15, 0, 0, time.UTC), + }, + }, + ) + if err != nil { + return err + } + ``` + +{% endlist %} + +## Reading messages {#reading-messages} + +The server stores the [consumer offset](../../concepts/topic.md#consumer-offset). After reading a message, the client can [send a processing confirmation to the server](#commit). The consumer offset will change and only unconfirmed messages will be read in case of a new connection. + +You can read messages without a [processing confirmation](#no-commit) as well. In this case, all unconfirmed messages, including those processed, will be read if there is a new connection. + +Information about which messages have already been processed can be [saved on the client side](#client-commit) by sending the starting consumer offset to the server when creating a new connection. This does not change the consumer offset on the server. + +The SDK receives data from the server in batches and buffers it. Depending on the task, the client code can read messages from the buffer one by one or in batches. + +### Reading without a message processing confirmation {#no-commit} + +To read messages one by one, use the following code: + +{% list tabs %} + +- Go + + ```go + func SimpleReadMessages(ctx context.Context, r *topicreader.Reader) error { + for { + mess, err := r.ReadMessage(ctx) + if err != nil { + return err + } + processMessage(mess) + } + } + ``` + +{% endlist %} + +To read message batches, use the following code: + +{% list tabs %} + +- Go + + ```go + func SimpleReadBatches(ctx context.Context, r *topicreader.Reader) error { + for { + batch, err := r.ReadMessageBatch(ctx) + if err != nil { + return err + } + processBatch(batch) + } + } + ``` + +{% endlist %} + +### Reading with a message processing confirmation {#commit} + +To confirm the processing of messages one by one, use the following code: + +{% list tabs %} + +- Go + + ```go + func SimpleReadMessages(ctx context.Context, r *topicreader.Reader) error { + for { + mess, err := r.ReadMessage(ctx) + if err != nil { + return err + } + processMessage(mess) + r.Commit(mess.Context(), mess) + } + } + ``` + +{% endlist %} + +To confirm the processing of message batches, use the following code: + +{% list tabs %} + +- Go + + ```go + func SimpleReadMessageBatch(ctx context.Context, r *topicreader.Reader) error { + for { + batch, err := r.ReadMessageBatch(ctx) + if err != nil { + return err + } + processBatch(batch) + r.Commit(batch.Context(), batch) + } + } + ``` + +{% endlist %} + +### Reading with consumer offset storage on the client side {#client-commit} + +When reading starts, the client code must transmit the starting consumer offset to the server: + +{% list tabs %} + +- Go + + ```go + func ReadWithExplicitPartitionStartStopHandlerAndOwnReadProgressStorage(ctx context.Context, db ydb.Connection) error { + readContext, stopReader := context.WithCancel(context.Background()) + defer stopReader() + + readStartPosition := func( + ctx context.Context, + req topicoptions.GetPartitionStartOffsetRequest, + ) (res topicoptions.GetPartitionStartOffsetResponse, err error) { + offset, err := readLastOffsetFromDB(ctx, req.Topic, req.PartitionID) + res.StartFrom(offset) + + // Reader will stop if return err != nil + return res, err + } + + r, err := db.Topic().StartReader("my-consumer", topicoptions.ReadTopic("my-topic"), + topicoptions.WithGetPartitionStartOffset(readStartPosition), + ) + if err != nil { + return err + } + + go func() { + <-readContext.Done() + _ = r.Close(ctx) + }() + + for { + batch, err := r.ReadMessageBatch(readContext) + if err != nil { + return err + } + + processBatch(batch) + _ = externalSystemCommit(batch.Context(), batch.Topic(), batch.PartitionID(), batch.EndOffset()) + } + } + ``` + +{% endlist %} + +## Processing a server read interrupt {#stop} + +{{ ydb-short-name }} uses server-based partition balancing between clients. This means that the server can interrupt the reading of messages from random partitions. + +In case of a _soft interruption_, the client receives a notification that the server has finished sending messages from the partition and messages will no longer be read. The client can finish message processing and send a confirmation to the server. + +In case of a _hard interruption_, the client receives a notification that it is no longer possible to work with partitions. The client must stop processing the read messages. Unconfirmed messages will be transferred to another consumer. + +### Soft reading interruption {#soft-stop} + +{% list tabs %} + +- Go + + The client code immediately receives all messages from the buffer (on the SDK side) even if they are not enough to form a batch during batch processing. + + ```go + r, _ := db.Topic().StartReader("my-consumer", nil, + topicoptions.WithBatchReadMinCount(1000), + ) + + for { + batch, _ := r.ReadMessageBatch(ctx) // <- if partition soft stop batch can be less, then 1000 + processBatch(batch) + _ = r.Commit(batch.Context(), batch) + } + + ``` + +{% endlist %} + +### Hard reading interruption {#hard-stop} + +{% list tabs %} + +- Go + + When reading is interrupted, the message or message batch context is canceled. + + ```go + ctx := batch.Context() // batch.Context() will cancel if partition revoke by server or connection broke + if len(batch.Messages) == 0 { + return + } + + buf := &bytes.Buffer{} + for _, mess := range batch.Messages { + buf.Reset() + _, _ = buf.ReadFrom(mess) + _, _ = io.Copy(buf, mess) + writeMessagesToDB(ctx, buf.Bytes()) + } + ``` + +{% endlist %} diff --git a/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/alter_table.md b/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/alter_table.md index f57d319135d..85a69ea2bdb 100644 --- a/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/alter_table.md +++ b/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/alter_table.md @@ -12,7 +12,7 @@ ALTER TABLE table_name action1, action2, ..., actionN; {{ backend_name }} lets you add columns to a table and delete non-key columns from it. -```ADD COLUMN```: Adds a column with the specified name and type. The code below adds the ```is_deleted``` column with the ```Bool data``` type to the ```episodes``` table. +```ADD COLUMN```: Adds a column with the specified name and type. The code below adds the ```is_deleted``` column with the ```Bool``` data type to the ```episodes``` table. ```sql ALTER TABLE episodes ADD COLUMN is_deleted Bool; @@ -44,7 +44,44 @@ Deleting an index: ALTER TABLE `series` DROP INDEX `title_index`; ``` +You can also add or remove a secondary index using the {{ ydb-short-name }} CLI [table index](https://ydb.tech/en/docs/reference/ydb-cli/commands/secondary_index) command. + +{% endif %} + +{% if feature_changefeed %} + +## Adding and deleting a changefeed {#changefeed} + +`ADD CHANGEFEED <name> WITH (option = value[, ...])` adds a [changefeed](../../../../concepts/cdc) with the specified name and parameters. + +### Changefeed parameters {#changefeed-options} + +* `MODE`: Operation mode. Specifies what exactly is to be written to a changefeed each time the table data is altered. + * `KEYS_ONLY`: Only the primary key components and change flag are written. + * `UPDATES`: Updated column values that result from updates are written. + * `NEW_IMAGE`: Any column values resulting from updates are written. + * `OLD_IMAGE`: Any column values before updates are written. + * `NEW_AND_OLD_IMAGES`: A combination of `NEW_IMAGE` and `OLD_IMAGE` modes. Any column values _prior to_ and _resulting from_ updates are written. +* `FORMAT`: Data write format. + * `JSON`: The record structure is given on the [changefeed description](../../../../concepts/cdc#record-structure) page. + +The code below adds a changefeed named `updates_feed` where the values of updated table columns will be exported in JSON format: + +```sql +ALTER TABLE `series` ADD CHANGEFEED `updates_feed` WITH ( + FORMAT = 'JSON', + MODE = 'UPDATES' +); +``` + +`DROP CHANGEFEED`: Deletes the changefeed with the specified name. The code below deletes the `updates_feed` changefeed: + +```sql +ALTER TABLE `series` DROP CHANGEFEED `updates_feed`; +``` + {% endif %} + {% if feature_map_tables %} ## Renaming a table {#rename} @@ -97,11 +134,18 @@ Using the ```ALTER FAMILY``` command, you can change the parameters of the colum ALTER TABLE series_with_families ALTER FAMILY default SET DATA "hdd"; ``` -You can specify any column family parameters from the [`CREATE TABLE`](create_table#column-family) command. +{% note info %} + +Available types of storage devices depend on the {{ ydb-short-name }} cluster configuration. + +{% endnote %} + +You can specify any parameters of a group of columns from the [`CREATE TABLE`](create_table#column-family) command. + ## Changing additional table parameters {#additional-alter} -Most of the table parameters in YDB described on the [table description]({{ concept_table }}) page can be changed with the ```ALTER``` command. +Most of the table parameters in YDB specified on the [table description]({{ concept_table }}) page can be changed with the ```ALTER``` command. In general, the command to change any table parameter looks like this: @@ -134,6 +178,4 @@ For example, this command resets (deletes) TTL settings for the table: ```sql ALTER TABLE series RESET (TTL); ``` - {% endif %} - diff --git a/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/create_table.md b/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/create_table.md index 33bd86b8253..429102c139c 100644 --- a/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/create_table.md +++ b/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/create_table.md @@ -10,7 +10,7 @@ The `CREATE TABLE` call creates a {% if concept_table %}[table]({{ concept_table CREATE TABLE table_name ( column1 type1, -{% if feature_not_null == true %} column2 type2 NOT NULL,{% else %} column2 type2,{% endif %} +{% if feature_not_null == true %} column2 type2 NOT NULL,{% else %} column2 type2,{% endif %} ... columnN typeN, {% if feature_secondary_index == true %} @@ -18,8 +18,8 @@ The `CREATE TABLE` call creates a {% if concept_table %}[table]({{ concept_table INDEX index2_name GLOBAL ON ( column1, column2, ... ), {% endif %} {% if feature_map_tables %} - PRIMARY KEY (column, ...), - FAMILY column_family () + PRIMARY KEY ( column, ... ), + FAMILY column_family ( family_options, ... ) {% else %} ... {% endif %} @@ -33,13 +33,15 @@ The `CREATE TABLE` call creates a {% if concept_table %}[table]({{ concept_table {% if feature_column_container_type == true %} In non-key columns, you can use any data types, but for key columns, only [primitive ones](../../types/primitive.md). When specifying complex types (for example, `List<String>`), the type is enclosed in double quotes. {% else %} -For key columns and non-key columns, only [primitive](../../types/primitive.md) data types are allowed. {% endif %} +For the key and non-key columns, you can only use [primitive](../../types/primitive.md) data types. +{% endif %} {% if feature_not_null == true %} -Without additional modifiers, the column is assigned the [optional type](../../types/optional.md) and can accept `NULL` values. To create a non-optional type, use `NOT NULL`. +Without additional modifiers, a column gets an [optional type](../../types/optional.md) and allows `NULL` values to be written. To create a non-optional type, use `NOT NULL`. {% else %} + {% if feature_not_null_for_pk %} -By default, all columns are [optional](../../types/optional.md) and can accept `NULL` values. `NOT NULL` constraint is supported only for primary keys. +All columns are [optional](../../types/optional.md) by default and can be assigned NULL values. The `NOT NULL` limit can only be specified for columns that are part of the primary key.. {% else %} All columns allow writing `NULL` values, that is, they are [optional](../../types/optional.md). {% endif %} @@ -48,6 +50,7 @@ All columns allow writing `NULL` values, that is, they are [optional](../../type It is mandatory to specify the `PRIMARY KEY` with a non-empty list of columns. Those columns become part of the key in the listed order. {% endif %} + **Example** CREATE TABLE my_table ( @@ -60,14 +63,15 @@ It is mandatory to specify the `PRIMARY KEY` with a non-empty list of columns. T {% endif %} ) -{% if feature_secondary_index %} + +{% if feature_secondary_index %} ## Secondary indexes {#secondary_index} The INDEX construct is used to define a {% if concept_secondary_index %}[secondary index]({{ concept_secondary_index }}){% else %}secondary index{% endif %} in a table: ```sql -CREATE TABLE table_name ( +CREATE TABLE table_name ( ... INDEX <Index_name> GLOBAL [SYNC|ASYNC] ON ( <Index_columns> ) COVER ( <Cover_columns> ), ... @@ -75,7 +79,6 @@ CREATE TABLE table_name ( ``` where: - * **Index_name** is the unique name of the index to be used to access data. * **SYNC/ASYNC** indicates synchronous/asynchronous data writes to the index. If not specified, synchronous. * **Index_columns** is a list of comma-separated names of columns in the created table to be used for a search in the index. @@ -94,11 +97,9 @@ CREATE TABLE my_table ( PRIMARY KEY (a) ) ``` - {% endif %} {% if feature_map_tables and concept_table %} - ## Additional parameters {#additional} You can also specify a number of {{ backend_name }}-specific parameters for the table. When creating a table using YQL, such parameters are listed in the ```WITH``` section: @@ -114,7 +115,7 @@ WITH ( Here, key is the name of the parameter and value is its value. -For a list of possible parameter names and their values, see [{{ backend_name }} table description]({{ concept_table }}). +For a list of possible parameter names and their values, see the [{{ backend_name }} table description]({{ concept_table }}). For example, this code will create a table with enabled automatic partitioning by partition size and the preferred size of each partition is 512 MB: @@ -136,12 +137,12 @@ WITH ( Columns of the same table can be grouped to set the following parameters: -* `DATA`: A storage type for the data in this column group. Acceptable values: ```ssd```, ```hdd```. +* `DATA`: A storage device type for the data in this column group. Acceptable values: ```ssd```, ```hdd```. * `COMPRESSION`: A data compression codec. Acceptable values: ```off```, ```lz4```. By default, all columns are in the same group named ```default```. If necessary, the parameters of this group can also be redefined. -In the example below, for the created table, the ```family_large``` group of columns is added and set for the ```series_info``` column, and the parameters for the ```default``` group, which is set by default for all other columns, are also redefined. +In the example below, for the created table, the ```family_large``` group of columns is added and set for the ```series_info``` column, and the parameters for the default group, which is set by ```default``` for all other columns, are also redefined. ```sql CREATE TABLE series_with_families ( @@ -161,7 +162,14 @@ CREATE TABLE series_with_families ( ); ``` -{% endif %} +{% note info %} + +Available types of storage devices depend on the {{ ydb-short-name }} cluster configuration. + +{% endnote %} {% endif %} + + +{% endif %}
\ No newline at end of file |