diff options
author | pseudolukian <[email protected]> | 2023-10-04 12:37:55 +0300 |
---|---|---|
committer | pseudolukian <[email protected]> | 2023-10-04 13:21:51 +0300 |
commit | f195b66080a41cc1c0683bc2ddaa0fc44345227d (patch) | |
tree | ee93b234a9155330c18b95ae3a81a6acdfd8ed87 | |
parent | 9612d0efbc1b15eb7424b03733876d090ec11a21 (diff) |
I merged two articles about tables and corrected the related YQL article.
3 files changed, 119 insertions, 71 deletions
diff --git a/ydb/docs/en/core/concepts/datamodel/_includes/table.md b/ydb/docs/en/core/concepts/datamodel/_includes/table.md index 12b1f474e8e..e6a402e435b 100644 --- a/ydb/docs/en/core/concepts/datamodel/_includes/table.md +++ b/ydb/docs/en/core/concepts/datamodel/_includes/table.md @@ -1,18 +1,38 @@ # Table -A table in {{ ydb-short-name }} is a [relational table](https://en.wikipedia.org/wiki/Table_(database)) containing a set of related data and made up of rows and columns. Each row is a set of cells that are used for storing specific types of values according to the data schema. The data schema defines the names and types of table columns. An example of the data schema is shown below. The `Series` table consists of four columns named `SeriesId`, `ReleaseDate`, `SeriesInfo`, and `Title` and holding data of type `Uint64?` for the first two and `String?` for the latter two. The `SeriesId` column is declared the primary key. +A table is a relational [table](https://en.wikipedia.org/wiki/Table_(database)) containing a set of related data, composed of rows and columns. Tables represent entities. For instance, a blog article can be represented by a table named article with columns: `id`, `date_create`, `title`, `author`, `body` and so on. - +Rows in the table hold the data, while columns define the data types. For example, the id column cannot be empty (`NOT NULL`) and should contain only unique integer values. A record in YQL might look like this: +```sql +CREATE TABLE article ( + id Int64 NOT NULL, + date_create Date, + author String, + title String, + body String, + PRIMARY KEY (id) +) +``` -{{ ydb-short-name }} uses [YQL](../../datatypes.md) data types. [Simple YQL data types](../../../yql/reference/types/primitive.md) can be used as column types. All columns are [optional](../../../yql/reference/types/optional.md) by default and can be assigned `NULL` values. When creating a table, you can set `NOT NULL` for columns included into the primary key. Such columns won't accept NULL in this case. +Please note that currently, the `NOT NULL` constraint can only be applied to columns that are part of the primary key. -{{ ydb-short-name }} tables always have one or more columns that make up the key ([primary key](https://en.wikipedia.org/wiki/Unique_key)). Each table row has a unique key value, so there can be no more than one row per key value. {{ ydb-short-name }} tables are always ordered by key. This means that you can efficiently make point reads by key and range-based queries by key or key prefix (actually using an index). In the example above, the key columns are highlighted in gray and marked with a special sign. Tables consisting only of key columns are supported. However, you can't create tables without a primary key. +YDB supports the creation of both row-oriented and column-oriented tables. The primary difference between them lies in their use-cases and how data is stored on the disk drive. In row-oriented tables, data is stored sequentially in the form of rows, while in column-oriented tables, data is stored in the form of columns. Each table type has its own specific purpose. -Often, when you design a table schema, you already have a set of fields, which can naturally serve as the primary key. Be careful when selecting the key to avoid hotspots. For example, if you insert data into a table with a monotonically increasing key, you write the data to the end of the table. But since {{ ydb-short-name }} splits table data by key range, your inserts are always processed by the same server, so you lose the main benefits of a distributed database. To distribute the load evenly across different servers and to avoid hotspots when processing large tables, we recommend hashing the natural key and using the hash as the first component of the primary key as well as changing the order of the primary key components. +## Row-oriented tables {#row-orineted_table} -## Partitioning {#partitioning} +Row-oriented tables are well-suited for transactional queries generated by Online Transaction Processing (OLTP) systems, such as weather service backends or online stores. Row-oriented tables offer efficient access to a large number of columns simultaneously. Lookups in row-oriented tables are optimized due to the utilization of indexes. -A database table can be sharded by primary key value ranges. Each shard of the table is responsible for a specific range of primary keys. Key ranges maintained by different shards do not overlap. Different table shards can be served by different distributed database servers (including ones in different locations). They can also move independently between servers to enable rebalancing or ensure shard operability if servers or network equipment goes offline. +An index is a data structure that improves the speed of data retrieval operations based on one or several columns. It's analogous to an index in a book: instead of scanning every page of the book to find a specific chapter, you can refer to the index at the back of the book and quickly navigate to the desired page. + +Searching using an index allows you to swiftly locate the required rows without scanning through all the data. For instance, if you have an index on the “author” column and you're looking for articles written by “Gray”, the DBMS leverages this index to quickly identify all rows associated with that surname. + +You can create a row-oriented table through the YDB web interface, CLI, or SDK. Regardless of the method you choose to interact with YDB, it's important to keep in mind the following rule: the table must have at least one primary key column, and it's permissible to create a table consisting solely of primary key columns. + +By default, when creating a row-oriented table, all columns are optional and can have `NULL` values. This behavior can be modified by setting the `NOT NULL` conditions for key columns that are part of the primary key. Primary keys are unique, and row-oriented tables are always sorted by this key. This means that point reads by the key, as well as range queries by key or key prefix, are efficiently executed (essentially using an index). It's permissible to create a table consisting solely of key columns. When choosing a key, it's crucial to be careful, so we recommend reviewing the article: ["Choosing a Primary Key for Maximum Performance"](../../../best_practices/pk_scalability.md). + +### Partitioning row-oriented tables {#partitioning} + +A row-oriented database table can be partitioned by primary key value ranges. Each partition of the table is responsible for the specific section of primary keys. Key ranges served by different partitions do not overlap. Different table partitions can be served by different cluster nodes (including ones in different locations). Partitions can also move independently between servers to enable rebalancing or ensure partition operability if servers or network equipment goes offline. If there is not a lot of data or load, the table may consist of a single shard. As the amount of data served by the shard or the load on the shard grows, {{ ydb-short-name }} automatically splits this shard into two shards. The data is split by the median value of the primary key if the shard size exceeds the threshold. If partitioning by load is used, the shard first collects a sample of the requested keys (that can be read, written, and deleted) and, based on this sample, selects a key for partitioning to evenly distribute the load across new shards. So in the case of load-based partitioning, the size of new shards may significantly vary. @@ -20,109 +40,102 @@ The size-based shard split threshold and automatic splitting can be configured ( In addition to automatically splitting shards, you can create an empty table with a predefined number of shards. You can manually set the exact shard key split range or evenly split it into a predefined number of shards. In this case, ranges are created based on the first component of the primary key. You can set even splitting for tables that have a `Uint64` or `Uint32` integer as the first component of the primary key. -Partitioning parameters refer to the table itself rather than to secondary indexes built from its data. Each index is served by its own set of shards and decisions to split or merge its partitions are made independently based on the default settings. These settings may become available to users in the future like the settings of the main table. +Partitioning parameters refer to the table itself rather than to secondary indexes built on its data. Each index is served by its own set of shards and decisions to split or merge its partitions are made independently based on the default settings. These settings may become available to users in the future like the settings of the main table. A split or a merge usually takes about 500 milliseconds. During this time, the data involved in the operation becomes temporarily unavailable for reads and writes. Without raising it to the application level, special wrapper methods in the {{ ydb-short-name }} SDK make automatic retries when they discover that a shard is being split or merged. Please note that if the system is overloaded for some reason (for example, due to a general shortage of CPU or insufficient DB disk throughput), split and merge operations may take longer. The following table partitioning parameters are defined in the data schema: -#### AUTO_PARTITIONING_BY_SIZE +#### AUTO_PARTITIONING_BY_SIZE {#auto_partitioning_by_size} * Type: `Enum` (`ENABLED`, `DISABLED`). * Default value: `ENABLED`. -Automatic partitioning by partition size. If a partition size exceeds the value specified by the [AUTO_PARTITIONING_PARTITION_SIZE_MB](#auto_partitioning_partition_size_mb) parameter, it is enqueued for splitting. If the total size of two or more adjacent partitions is less than 50% of the [AUTO_PARTITIONING_PARTITION_SIZE_MB](#auto_partitioning_partition_size_mb) value, they are enqueued for merging. +Automatic partitioning by partition size. If a partition size exceeds the value specified by the [`AUTO_PARTITIONING_PARTITION_SIZE_MB`](#auto_partitioning_partition_size_mb) parameter, it is enqueued for splitting. If the total size of two or more adjacent partitions is less than 50% of the [`AUTO_PARTITIONING_PARTITION_SIZE_MB`](#auto_partitioning_partition_size_mb) value, they are enqueued for merging. -#### AUTO_PARTITIONING_BY_LOAD +#### AUTO_PARTITIONING_BY_LOAD {#auto_partitioning_by_load} * Type: `Enum` (`ENABLED`, `DISABLED`). * Default value: `DISABLED`. Automatic partitioning by load. If a shard consumes more than 50% of the CPU for a few dozens of seconds, it is enqueued for splitting. If the total load on two or more adjacent shards uses less than 35% of a single CPU core within an hour, they are enqueued for merging. -Performing split or merge operations uses the CPU and takes time. Therefore, when dealing with a variable load, we recommend both enabling this mode and setting [AUTO_PARTITIONING_MIN_PARTITIONS_COUNT](#auto_partitioning_min_partitions_count) to a value other than 1 so that the number of partitions doesn't drop below this value as the load decreases, and YDB doesn't have to split them again when the load is back. +Performing split or merge operations uses the CPU and takes time. Therefore, when dealing with a variable load, we recommend both enabling this mode and setting [`AUTO_PARTITIONING_MIN_PARTITIONS_COUNT`](#auto_partitioning_min_partitions_count) to a value other than 1. This ensures that a decreased load does not cause the number of partitions to drop below the required value, resulting in a need to split them again when the load increases. When choosing the minimum number of partitions, it makes sense to consider that one table partition can only be hosted on one server and use no more than 1 CPU core for data update operations. Hence, you can set the minimum number of partitions for a table on which a high load is expected to at least the number of nodes (servers) or, preferably, to the number of CPU cores allocated to the database. -#### AUTO_PARTITIONING_PARTITION_SIZE_MB +#### AUTO_PARTITIONING_PARTITION_SIZE_MB {##auto_partitioning_by_size_mb} * Type: `Uint64`. * Default value: `2000 MB` ( `2 GB` ). -Partition size threshold in MB. If exceeded, a shard splits. Takes effect when the [AUTO_PARTITIONING_BY_SIZE](#auto_partitioning_by_size) is enabled. +Partition size threshold in MB. If exceeded, a shard splits. Takes effect when the [`AUTO_PARTITIONING_BY_SIZE`](#auto_partitioning_by_size) mode is enabled. -#### AUTO_PARTITIONING_MIN_PARTITIONS_COUNT +#### AUTO_PARTITIONING_MIN_PARTITIONS_COUNT {#auto_partitioning_min_partitions_count} * Type: `Uint64`. -* The default value is `1`. +* Default value: `1`. Partitions are only merged if their actual number exceeds the value specified by this parameter. When using automatic partitioning by load, we recommend that you set this parameter to a value other than 1, so that periodic load drops don't lead to a decrease in the number of partitions below the required one. -#### AUTO_PARTITIONING_MAX_PARTITIONS_COUNT +#### AUTO_PARTITIONING_MAX_PARTITIONS_COUNT {#auto_partitioning_max_parririons_count} * Type: `Uint64`. -* The default value is `50`. +* Default value: `50`. -Partitions are only split if their number doesn't exceed the value specified by this parameter. With any automatic partitioning mode enabled, we recommend that you set a meaningful value for this parameter and monitor when the actual number of partitions approaches this value, otherwise splitting of partitions will sooner or later stop under an increase in data or load, which will lead to a failure. +Partitions are only split if their number doesn't exceed the value specified by this parameter. With any automatic partitioning mode enabled, we recommend that you set a meaningful value for this parameter and monitor when the actual number of partitions approaches this value, otherwise splitting of partitions will stop sooner or later under an increase in data or load, which will lead to a failure. -#### UNIFORM_PARTITIONS +#### UNIFORM_PARTITIONS {uniform_partitions} * Type: `Uint64`. -* Default value: Not applicable +* Default value: Not applicable. The number of partitions for uniform initial table partitioning. The primary key's first column must have type `Uint64` or `Uint32`. A created table is immediately divided into the specified number of partitions. -When automatic partitioning is enabled, make sure to set a correct value for [AUTO_PARTITIONING_MIN_PARTITIONS_COUNT](#auto_partitioning_min_partitions_count) so as not to merge all partitions into one immediately after creating the table. +When automatic partitioning is enabled, make sure to set the correct value for [AUTO_PARTITIONING_MIN_PARTITIONS_COUNT](#auto_partitioning_min_partitions_count) to avoid merging all partitions into one immediately after creating the table. -#### PARTITION_AT_KEYS +#### PARTITION_AT_KEYS {#partition_at_keys} * Type: `Expression`. -* Default value: Not applicable +* Default value: Not applicable. Boundary values of keys for initial table partitioning. It's a list of boundary values separated by commas and surrounded with brackets. Each boundary value can be either a set of values of key columns (also separated by commas and surrounded with brackets) or a single value if only the values of the first key column are specified. Examples: `(100, 1000)`, `((100, "abc"), (1000, "cde"))`. -When automatic partitioning is enabled, make sure to set a correct value for [AUTO_PARTITIONING_MIN_PARTITIONS_COUNT](#auto_partitioning_min_partitions_count) so as not to merge all partitions into one immediately after creating the table. +When automatic partitioning is enabled, make sure to set the correct value for [AUTO_PARTITIONING_MIN_PARTITIONS_COUNT](#auto_partitioning_min_partitions_count) to avoid merging all partitions into one immediately after creating the table. -## Reading data from replicas {#read_only_replicas} +### Reading data from replicas {#read_only_replicas} When making queries in {{ ydb-short-name }}, the actual execution of a query to each shard is performed at a single point serving the distributed transaction protocol. By storing data in shared storage, you can run one or more shard followers without allocating additional storage space: the data is already stored in replicated format, and you can serve more than one reader (but there is still only one writer at any given moment). Reading data from followers allows you: -* To serve clients demanding minimal delay, which is otherwise unachievable in a multi-DC cluster. This is accomplished by executing queries soon after they are formulated, which eliminates the delay associated with inter-DC transfers. As a result, you can both preserve all the storage reliability guarantees of a multi-DC cluster and respond to occasional read queries in milliseconds. +* To serve clients demanding minimal delay, which is otherwise unachievable in a multi-DC cluster. This is accomplished by executing queries soon after they are formulated, which eliminates the delay associated with inter-DC transfers. As a result, you can both preserve all the storage reliability guarantees of a multi-DC cluster and respond to point read queries in milliseconds. * To handle read queries from followers without affecting modifying queries running on a shard. This can be useful both for isolating different scenarios and for increasing the partition bandwidth. -* To ensuring continued service when moving a partition leader (both in a planned manner for load balancing and in an emergency). It lets the processes in the cluster survive without affecting the reading clients. -* To increasing the overall shard read performance if many read queries access the same keys. +* To ensure continued service when moving a partition leader (both in a planned manner for load balancing and in an emergency). It lets the processes in the cluster survive without affecting the reading clients. +* To increase the overall shard read performance if many read queries access the same keys. You can enable running read replicas for each shard of the table in the table data schema. The read replicas (followers) are typically accessed without leaving the data center network, which ensures response delays in milliseconds. -| Option name | Description | Type | Acceptable values | Update<br>capability | Reset<br>capability | +| Parameter name | Description | Type | Acceptable values | Update capability | Reset capability | | ------------- | --------- | --- | ------------------- | --------------------- | ------------------ | | `READ_REPLICAS_SETTINGS` | `PER_AZ` means using the specified number of replicas in each AZ and `ANY_AZ` in all AZs in total. | String | `"PER_AZ:<count>"`, `"ANY_AZ:<count>"`, where `<count>` is the number of replicas | Yes | No | The internal state of each of the followers is restored exactly and fully consistently from the leader state. -Besides the data status in storage, followers also receive a stream of updates from the leader. Updates are sent in real time, immediately after the commit to the log. However, they are sent asynchronously, resulting in some delay (usually no more than dozens of milliseconds, but sometimes longer in the event of cluster connectivity issues) in applying updates to followers relative to their commit on the leader. Therefore, reading data from followers is only supported in the`StaleReadOnly()` [transaction mode](../../transactions.md#modes). - +Besides the data state in storage, followers also receive a stream of updates from the leader. Updates are sent in real time, immediately after the commit to the log. However, they are sent asynchronously, resulting in some delay (usually no more than dozens of milliseconds, but sometimes longer in the event of cluster connectivity issues) in applying updates to followers relative to their commit on the leader. Therefore, reading data from followers is only supported in the [transaction mode](../../transactions.md#modes) `StaleReadOnly()`. If there are multiple followers, their delay from the leader may vary: although each follower of each of the shards retains internal consistency, artifacts may be observed from shard to shard. Please provide for this in your application code. For that same reason, it's currently impossible to perform cross-shard transactions from followers. -## Deleting expired data (TTL) {#ttl} +### Deleting expired data (TTL) {#ttl} -{{ ydb-short-name }} supports automatic background deletion of expired data. A table data schema may define a column of the [appropriate type](../../../concepts/ttl.md#restrictions). The column value for all rows will be compared with the current time in the background. Rows for which the current time becomes greater than the column value, factoring in the specified delay, will be deleted. +{{ ydb-short-name }} supports automatic background deletion of expired data. A table data schema may define a column containing a `Datetime` or a `Timestamp` value. A comparison of this value with the current time for all rows will be performed in the background. Rows for which the current time becomes greater than the column value plus specified delay, will be deleted. -| Option name | Type | Acceptable values | Update<br>capability | Reset<br>capability | +| Parameter name | Type | Acceptable values | Update capability | Reset capability | | ------------- | --- | ------------------- | --------------------- | ------------------ | -| `TTL` | Expression | `Interval("<literal>") ON <column> [AS <unit>]` | Yes | Yes | - -Where `<unit>`: -* `SECONDS`; -* `MILLISECONDS`; -* `MICROSECONDS`; -* `NANOSECONDS`. +| `TTL` | Expression | `Interval("<literal>") ON <column>` | Yes | Yes | For more information about deleting expired data, see [Time to Live (TTL)](../../../concepts/ttl.md). -## Renaming {#rename} +### Renaming a table {#rename} {{ ydb-short-name }} lets you rename an existing table, move it to another directory of the same database, or replace one table with another, deleting the data in the replaced table. Only the metadata of the table is changed by operations (for example, its path and name). The table data is neither moved nor overwritten. @@ -133,45 +146,82 @@ The speed of renaming is determined by the type of data transactions currently r * [Renaming a table in YQL](../../../yql/reference/syntax/alter_table.md#rename) * [Renaming a table via the CLI](../../../reference/ydb-cli/commands/tools/rename.md) -## Bloom filter {#bloom-filter} +### Bloom filter {#bloom-filter} -With a [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter), you can more efficiently determine if some keys are missing in a table when making multiple single queries by the primary key. This reduces the number of required disk I/O operations but increases the amount of memory consumed. +Using a [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) lets you more efficiently determine if some keys are missing in a table when making multiple point queries by primary key. This reduces the number of required disk I/O operations but increases the amount of memory consumed. -| Option name | Type | Acceptable values | Update<br>capability | Reset<br>capability | +| Parameter name | Type | Acceptable values | Update capability | Reset capability | | ------------- | --- | ------------------- | --------------------- | ------------------ | | `KEY_BLOOM_FILTER` | Enum | `ENABLED`, `DISABLED` | Yes | No | -## Column groups {#column-groups} +## Column-oriented tables {#olap-data-types} +{% note warning %} + +Column-oriented {{ ydb-short-name }} tables are in the Preview mode. -You can group table columns to optimize their storage and use in {{ ydb-short-name }}. Column grouping enables you to improve the performance of data selections by introducing grouped column storage. The most commonly used strategy is to create a separate column group for rarely used attributes (possibly also with compression and on a slower storage). +{% endnote %} -Each column group has a unique name within a table. You can set the composition of column groups when [creating a table](../../../yql/reference/syntax/create_table.md#column-family) and [change](../../../yql/reference/syntax/alter_table.md#column-family) it later. You cannot remove column groups from an existing table. +YDB's column-oriented tables stores data of each column separately (independently) from each other. This data storage principle is optimized for handling Online Analytical Processing (OLAP) workloads, as only the columns directly involved in the query are read during its execution. One of the key advantages of this approach is the high data compression ratios since columns often contain repetitive or similar data. A downside, however, is that operations on whole rows become more resource-intensive. -A column family may contain any number of columns of its table, including none. Each table column can belong to a single column group (that is, column groups can't overlap). Column groups are set up when creating a table, but can be modified later. +At the moment, the main use case for YDB column-oriented tables is writing data with an increasing primary key (for example, event time), analyzing this data, and deleting outdated data based on TTL. The optimal way to add data to YDB column-oriented tables is [batch upload](../../../best_practices/batch_upload.md), performed in MB-sized blocks. Data packet insertion is atomic: data will be written either to all partitions or none. -Each table has a `default` column group that includes all the columns that don't belong to any other column group. Primary-key columns are always in the default column group and can't be moved to another group. +In most cases, working with YDB column-oriented tables is similar to working with row tables, but there are differences: +* Only `NOT NULL` columns can be used as the primary key. +* Data is partitioned not by the primary key, but by the hash of the partitioning columns, to evenly distribute the data across the hosts. +* Column-oriented tables support a limited set of data types: + + Available in both the primary key and other columns: `Date`, `Datetime`, `Timestamp`, `Int32`, `Int64`, `Uint8`, `Uint16`, `Uint32`, `Uint64`, `Utf8`, `String`; + + Available only in columns not included in the primary key: `Bool`, `Decimal`, `Double`, `Float`, `Int8`, `Int16`, `Interval`, `JsonDocument`, `Json`, `Uuid`, `Yson`. -Column groups are assigned attributes that affect data storage: +Let's recreate the "article" table, this time in column-oriented format, using the following YQL command: +```sql +CREATE TABLE article_column_table ( + id Int64 NOT NULL, + author String, + title String, + body String, + PRIMARY KEY (id) +) +WITH (STORE = COLUMN); +``` +At the moment, not all functionality of column-oriented tables is implemented. The following features are not currently supported: -* The type of the data storage device used (SSD or HDD, availability depends on the {{ ydb-short-name }} cluster configuration). -* Data compression mode (without compression or compression using the [LZ4](https://en.wikipedia.org/wiki/LZ4) algorithm). +* Reading from replicas. +* Secondary indexes. +* Bloom filters. +* Change Data Capture. +* Table renaming. +* Custom table attributes. +* Modifying the list of columns. +* Adding data to column-oriented tables using the SQL INSERT statement. +* Deleting data from column-oriented tables using the SQL DELETE statement. In fact, deletion is only possible after the TTL data retention time has expired. -Attributes for a column group are set when creating a table (for example, they can be explicitly set for a default column group) and changed afterwards. Changes in storage attributes aren't applied to the data immediately, but later, at manual or automatic LSM compaction. +## Partitioning of a column-oriented table {#olap-tables-partitioning} -The data stored in the fields of the default column group is accessed faster: this requires less resources than accessing the fields from additional column groups of the same table row. When you search by the primary key, the default column group is always used. When accessing fields from other column groups, besides searching by the primary key, you need additional search operations to determine specific storage positions for these fields. +Unlike row-oriented {{ ydb-short-name }} tables, you cannot partition column-oriented tables by primary keys but only by specially designated partitioning keys. Partitioning keys constitute a subset of the table's primary keys. -This way, by creating a separate column group for certain table columns, you can accelerate read operations for the most important frequently used columns of the default column group by slightly slowing down access to other columns. Furthermore, at the column group level, you can control the data storage parameters: for example, you can select the storage device type and data compression mode. +Example of column-oriented partitioning: +```sql +CREATE TABLE article_column_table ( + id Int64 NOT NULL, + author String, + title String, + body String, + PRIMARY KEY (id) +) +PARTITION BY HASH(id) +WITH (STORE = COLUMN); +``` -## Custom attributes {#users-attr} +Unlike data partitioning in row-oriented {{ ydb-short-name }} tables, key values are not used to partition data in column-oriented tables. This way, you can uniformly distribute data across all your existing partitions. This kind of partitioning enables you to avoid hotspots at data inserta and speeding up analytical queries that process (that is, read) large amounts of data. -By using custom attributes, you can add any data to your table metadata. This data isn't interpreted by the server, but it can be used by a DB client (a human or, most frequently, a program). -The attributes are key-value pairs. As the attribute keys and values, you can only use strings or other data types that can be cast into strings (for example, using BASE64 encoding). +How you select partitioning keys substantially affects the performance of queries to your column-oriented tables. Learn more in [{#T}](../../../best_practices/pk-olap-scalability.md). -The following restrictions apply to keys and values of custom attributes: +To manage data partitioning, use the `AUTO_PARTITIONING_MIN_PARTITIONS_COUNT` additional parameter. The system ignores other partitioning parameters for column-oriented tables. -* Key length: 1–100 bytes. -* Value length: 1–4096 bytes. -* The maximum total attribute size (a total length of all keys and values): 10,240 bytes. +`AUTO_PARTITIONING_MIN_PARTITIONS_COUNT` sets the minimum physical number of partitions used to store data. + +* Type: `Uint64`. +* The default value is `1`. -To learn how to add, edit, or get the current values of attributes or drop attributes, see [{#T}](../../../operations/manage-users-attr.md). +Because it ignores all the other partitioning parameters, the system uses the same value as the upper partition limit. diff --git a/ydb/docs/en/core/concepts/datamodel/toc_i.yaml b/ydb/docs/en/core/concepts/datamodel/toc_i.yaml index f46cb49ccc8..0f42349ff3f 100644 --- a/ydb/docs/en/core/concepts/datamodel/toc_i.yaml +++ b/ydb/docs/en/core/concepts/datamodel/toc_i.yaml @@ -2,5 +2,4 @@ items: - { name: Overview, href: index.md } - { name: Directory, href: dir.md } - { name: Table, href: table.md } -- { name: Сolumn-oriented table, href: ../column-table.md } - { name: Topic, href: ../topic.md } diff --git a/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/create_table.md b/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/create_table.md index 28374cc6e38..41b62a6e2bf 100644 --- a/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/create_table.md +++ b/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/create_table.md @@ -5,7 +5,7 @@ {{ ydb-short-name }} supports two types of tables: * [Row-oriented](../../../../concepts/datamodel/table.md) -* [Column-oriented](../../../../concepts/column-table.md). +* [Column-oriented](../../../../concepts/datamodel/table.md#olap-data-types). When you create a table, the table type is specified by the `STORE` parameter, with `ROW` creating a [row-oriented table](#row) and `COLUMN` creating a [column](#olap-tables)-oriented table. If the `STORE` parameter is omitted, a row-oriented table is created by default. @@ -79,7 +79,6 @@ It is mandatory to specify the `PRIMARY KEY` with a non-empty list of columns. T ) - {% if feature_secondary_index %} {% if feature_olap_tables %}#{% endif %}## Secondary indexes {#secondary_index} @@ -197,7 +196,7 @@ Column-oriented {{ ydb-short-name }} tables are in the Preview mode. {% endnote %} -The `CREATE TABLE` statement creates a [column-oriented](../../../../concepts/column-table.md) table with the specified data schema and key columns (`PRIMARY KEY`). +The `CREATE TABLE` statement creates a [column-oriented](../../../../concepts/datamodel/table.md#olap-data-types) table with the specified data schema and key columns (`PRIMARY KEY`). ```sql CREATE TABLE table_name ( @@ -219,7 +218,7 @@ WITH ( ### Columns {#olap-columns} -Data types supported by column-oriented tables and constraints imposed on data types in primary keys or data columns are described in the [supported data types](../../../../concepts/column-table.md#olap-data-types) section for column-oriented tables. +Data types supported by column-oriented tables and constraints imposed on data types in primary keys or data columns are described in the [supported data types](../../../../concepts/datamodel/table.md#olap-data-types) section for column-oriented tables. Make sure to add the `PRIMARY KEY` and `PARTITION BY` clauses with a non-empty list of columns. @@ -257,7 +256,7 @@ Here, `key` is the name of the parameter and `value` is its value. Supported parameters in column-oriented tables: -* `AUTO_PARTITIONING_MIN_PARTITIONS_COUNT` sets the minimum physical number of partitions used to store data (see [{#T}](../../../../concepts/column-table.md#olap-tables-partitioning)). +* `AUTO_PARTITIONING_MIN_PARTITIONS_COUNT` sets the minimum physical number of partitions used to store data (see [{#T}](../../../../concepts/datamodel/table.md#olap-tables-partitioning)). For example, the following code creates a column-oriented table with ten partitions: |