diff options
author | alextarazanov <alextarazanov@yandex-team.com> | 2023-05-15 15:09:55 +0300 |
---|---|---|
committer | alextarazanov <alextarazanov@yandex-team.com> | 2023-05-15 15:09:55 +0300 |
commit | c4c3c0bdf21defb26ef61e48455e78ea27161792 (patch) | |
tree | 1727d79a4f2b41f1f6013265126f4bd794d07eff | |
parent | dad0c8aa91c4dc2a764dc9585b7efbf1b789b98e (diff) | |
download | ydb-c4c3c0bdf21defb26ef61e48455e78ea27161792.tar.gz |
&translations
17 files changed, 409 insertions, 165 deletions
diff --git a/ydb/docs/en/core/_includes/olap-data-types.md b/ydb/docs/en/core/_includes/olap-data-types.md new file mode 100644 index 00000000000..c3869d86b50 --- /dev/null +++ b/ydb/docs/en/core/_includes/olap-data-types.md @@ -0,0 +1,24 @@ +| Data type | Can be used in<br>column-oriented tables | Can be used<br>as primary key | +---|---|--- +| `Bool` | ☓ | ☓ | +| `Int8` | ✓ | ☓ | +| `Int16` | ✓ | ☓ | +| `Int32` | ✓ | ✓ | +| `Int64` | ✓ | ✓ | +| `Uint8` | ✓ | ✓ | +| `Uint16` | ✓ | ✓ | +| `Uint32` | ✓ | ✓ | +| `Uint64` | ✓ | ✓ | +| `Float` | ✓ | ☓ | +| `Double` | ✓ | ☓ | +| `Decimal` | ☓ | ☓ | +| `String` | ✓ | ✓ | +| `Utf8` | ✓ | ✓ | +| `Json` | ✓ | ☓ | +| `JsonDocument` | ✓ | ☓ | +| `Yson` | ✓ | ☓ | +| `Uuid` | ☓ | ☓ | +| `Date` | ✓ | ✓ | +| `Datetime` | ✓ | ✓ | +| `Timestamp` | ✓ | ✓ | +| `Interval` | ☓ | ☓ | diff --git a/ydb/docs/en/core/_includes/storage-device-requirements.md b/ydb/docs/en/core/_includes/storage-device-requirements.md index 823f961d4f0..0d99109dacb 100644 --- a/ydb/docs/en/core/_includes/storage-device-requirements.md +++ b/ydb/docs/en/core/_includes/storage-device-requirements.md @@ -6,6 +6,6 @@ The minimum disk size is 80 GB, otherwise the {{ ydb-short-name }} node won't be Configurations with disks less than 800 GB or any types of storage system virtualization cannot be used for production services or system performance testing. -We don't recommend storing {{ ydb-short-name }} data on disks used by other processes (including the operating system). +We don't recommend storing {{ ydb-short-name }} data on disks shared with other processes (for example, the operating system). {% endnote %} diff --git a/ydb/docs/en/core/best_practices/pk-olap-scalability.md b/ydb/docs/en/core/best_practices/pk-olap-scalability.md new file mode 100644 index 00000000000..67a62060611 --- /dev/null +++ b/ydb/docs/en/core/best_practices/pk-olap-scalability.md @@ -0,0 +1,23 @@ +# Selecting a primary key for maximum column-oriented table performance + +Unlike row-oriented {{ ydb-short-name }} tables, you can't partition column-oriented tables by primary keys but only by specially designated partitioning keys. Inside each partition, data is distributed by the table's primary key. + +In this context, to ensure high performance of column-oriented {{ ydb-short-name }} tables, you need to properly select both the primary key and partitioning key. + +Because the partitioning key determines the partition where the data is stored, select it so that the Hash(partition_key) enables uniform distribution of data across partitions. The optimal situation is when the partitioning key includes 100 to 1,000 times more unique values than the number of partitions in the table. + +Keys with many unique values have high cardinality. On the other hand, keys with a few unique values have low cardinality. Using a partitioning key with low cardinality might result in uneven distribution of data across partitions, and some partitions might become overloaded. Overloading of partitions might result in suboptimal query performance and/or capping of the maximum stream of inserted data. + +The primary key determines how the data will be stored inside the partition. That's why, when selecting a primary key, you need to keep in mind both the effectiveness of reading data from the partition and the effectiveness of inserting data into the partition. The optimum insert use case is to write data to the beginning or end of the table, making rare local updates of previously inserted data. For example, an effective use case would be to store application logs by timestamps, adding records to the end of the partition using the current time in the primary key. + +{% note warning %} + +Currently, you need to select the primary key so that the first column in the primary key has high cardinality. An example of an effective first column is the column with the `Timestamp` type. For example, the first column is ineffective if it has the `Uint16` type and 1,000 unique values. + +{% endnote %} + +Column-oriented tables do not support automatic repartitioning at the moment. That's why it's important to specify a realistic number of partitions at table creation. You can evaluate the number of partitions you need based on the expected data amounts you are going to add to the table. The average insert throughput for a partition is 1 MB/s. The throughput is mostly affected by the selected primary keys (the need to sort data inside the partition when inserting data). We do not recommend setting up more than 128 partitions for small data streams. + +Example: + +When your data stream is 1 GB per second, an analytical table with 1,000 partitions is an optimal choice. Nevertheless, it is not advisable to create tables with an excessive number of partitions: this could raise resource consumption in the cluster and negatively impact the query rate. diff --git a/ydb/docs/en/core/best_practices/toc_i.yaml b/ydb/docs/en/core/best_practices/toc_i.yaml index a3f13c2f008..2f92347f1a5 100644 --- a/ydb/docs/en/core/best_practices/toc_i.yaml +++ b/ydb/docs/en/core/best_practices/toc_i.yaml @@ -3,6 +3,8 @@ items: href: index.md - name: Selecting a primary key for maximum performance href: pk_scalability.md +- name: Selecting a primary key for maximum analytical table performance + href: pk-olap-scalability.md - name: Schema design href: schema_design.md hidden: true diff --git a/ydb/docs/en/core/cluster/system-requirements.md b/ydb/docs/en/core/cluster/system-requirements.md index 7ee4013ec02..e9ff0dba651 100644 --- a/ydb/docs/en/core/cluster/system-requirements.md +++ b/ydb/docs/en/core/cluster/system-requirements.md @@ -26,10 +26,12 @@ The number of servers and disks is determined by the fault-tolerance requirement {{ ydb-short-name }} health and performance weren't tested on any types of virtual or network storage devices. - When planning space, remember that {{ ydb-short-name }} uses some disk space for its own internal needs. For example, on a medium-sized cluster of 8 nodes, you can expect approximately 100 GB to be consumed for a static group on the whole cluster. On a large cluster with >1500 nodes, this will be about 200 GB. There are also logs of 25.6 GB on each Pdisk and a system area on each Pdisk. Its size depends on the size of the Pdisk, but is no less than 0.2 GB. + When planning space, remember that {{ ydb-short-name }} uses some disk space for its own internal needs. For example, on a medium-sized cluster of 8 nodes, you can expect approximately 100 GB to be consumed for a static group on the whole cluster. On a large cluster with more than 1500 nodes, this will be about 200 GB. There are also logs of 25.6 GB on each Pdisk and a system area on each Pdisk. Its size depends on the size of the Pdisk, but is no less than 0.2 GB. ## Software configuration {#software} -A {{ ydb-short-name }} server can be run on servers running a Linux operating system with kernel 4.19 and higher and libc 2.30 (Ubuntu 20.04, Debian 11, Fedora34). +A {{ ydb-short-name }} server can be run on servers running a Linux operating system with kernel 4.19 and higher and libc 2.30 (Ubuntu 20.04, Debian 11, Fedora34). We recommend enabling hugepages or transparent hugepages. + +If the server hosts more than 32 CPU cores, to increase YDB performance, it makes sense to run each dynamic node in a separate taskset/cpuset of 10 to 32 cores. For example, in the case of 128 CPU cores, the best choice is to run four 32-CPU dynamic nodes, each in its taskset. MacOS and Windows operating systems are currently not supported. diff --git a/ydb/docs/en/core/cluster/topology.md b/ydb/docs/en/core/cluster/topology.md index 563010ca027..16aee6b285e 100644 --- a/ydb/docs/en/core/cluster/topology.md +++ b/ydb/docs/en/core/cluster/topology.md @@ -1,9 +1,9 @@ # Topology -{{ ydb-short-name }} cluster is built from nodes of two types - static and dynamic: +A {{ ydb-short-name }} cluster consists of static and dynamic nodes. -* static nodes store data, implementing one of the supported redundancy modes depending on the operating mode configured; -* dynamic nodes execute queries, handle transaction coordination and perform other data management functions. +* Static nodes enable data storage, implementing one of the supported redundancy schemes depending on the established operating mode. +* Dynamic nodes enable query execution, transaction coordination, and other data control functionality. Cluster topology is determined by the fault tolerance requirements. The following operating modes are available: diff --git a/ydb/docs/en/core/concepts/_includes/limits-ydb.md b/ydb/docs/en/core/concepts/_includes/limits-ydb.md index bd538cd82f0..6147597ba6b 100644 --- a/ydb/docs/en/core/concepts/_includes/limits-ydb.md +++ b/ydb/docs/en/core/concepts/_includes/limits-ydb.md @@ -2,12 +2,12 @@ This section describes the parameters of limits set in {{ ydb-short-name }}. -## Schema object limits +## Schema object limits {#schema-object} The table below shows the limits that apply to schema objects: tables, databases, and columns. The _Object_ column specifies the type of schema object that the limit applies to. The _Error type_ column shows the status that the query ends with if an error occurs. For more information about statuses, see [Error handling in the API](../../reference/ydb-sdk/error_handling.md). -| Object | Limit | Value | Explanation | Internal<br>name | Error<br>type | +| Objects | Limit | Value | Explanation | Internal<br>name | Error<br>type | | :--- | :--- | :--- | :--- | :---: | :---: | | Database | Maximum path depth | 32 | Maximum number of nested path elements (directories, tables). | MaxDepth | SCHEME_ERROR | | Database | Maximum number of paths (schema objects) | 10,000 | Maximum number of path elements (directories, tables) in a database. | MaxPaths | GENERIC_ERROR | @@ -23,27 +23,34 @@ The _Error type_ column shows the status that the query ends with if an error oc | Table | Maximum number of followers | 3 | Maximum number of read-only replicas that can be specified when creating a table with followers. | MaxFollowersCount | GENERIC_ERROR | | Table | Maximum number of tables to copy | 10,000 | Limit on the size of the table list for persistent table copy operations | MaxConsistentCopyTargets | GENERIC_ERROR | -## Size limits for stored data +## Size limits for stored data {#data-size} | Parameter | Value | Error type | | :--- | :--- | :---: | | Maximum total size of all columns in a primary key | 1 MB | GENERIC_ERROR | | Maximum size of a string column value | 8 MB | GENERIC_ERROR | -## Limits that apply when executing queries +## Analytical table limits + +| Parameter | Value | +:--- | :--- +| Maximum row size | 8 MB | +| Maximum size of an inserted data block | 8 MB | + +## Limits on query execution {#query} The table below lists the limits that apply to query execution. The _Call_ column specifies the public API call that will end with the error status specified in the _Status_ column. | Parameter | Value | Call | Explanation | Status<br>in case of<br>a violation<br>of the limit | | :--- | :--- | :--- | :--- | :---: | -| Maximum number of rows in query results | 1000 | ExecuteDataQuery | Complete results of some queries executed using the `ExecuteDataQuery` method may contain more rows than allowed. In this case, a query will return the maximum number of rows allowed, and the result will have the `truncated` flag set. | SUCCESS | +| Maximum number of rows in query results | 1,000 | ExecuteDataQuery | Complete results of some queries executed using the `ExecuteDataQuery` method may contain more rows than allowed. In this case, a query will return the maximum number of rows allowed, and the result will have the `truncated` flag set. | SUCCESS | | Maximum query result size | 50 MB | ExecuteDataQuery | Complete results of some queries may exceed the set limit. In this case, a query will fail returning no data. | PRECONDITION_FAILED | -| Maximum number of sessions per cluster node | 1000 | CreateSession | Using the library for working with {{ ydb-short-name }}, an application can create sessions within a connection. Sessions are linked to a node. You can create a limited number of sessions with a single node. | OVERLOADED | +| Maximum number of sessions per cluster node | 1,000 | CreateSession | Using the library for working with {{ ydb-short-name }}, an application can create sessions within a connection. Sessions are linked to a node. You can create a limited number of sessions with a single node. | OVERLOADED | | Maximum query text length | 10 KB | ExecuteDataQuery | Limit on the length of YQL query text. | BAD_REQUEST | | Maximum size of parameter values | 50 MB | ExecuteDataQuery | Limit on the total size of the parameters passed when executing a previously prepared query. | BAD_REQUEST | -## Topics limits +## Topic limits {#topic} -| Parameter | Value | -| :--- | :--- | -| Maximum size of the transmitted message | 12 MB | +| Parameter | Value | +| :--- | :--- | +| Maximum size of a transmitted message | 12 MB | diff --git a/ydb/docs/en/core/concepts/column-table.md b/ydb/docs/en/core/concepts/column-table.md new file mode 100644 index 00000000000..7951a01864c --- /dev/null +++ b/ydb/docs/en/core/concepts/column-table.md @@ -0,0 +1,81 @@ +# Сolumn-oriented table + +{% note warning %} + +Column-oriented {{ ydb-short-name }} tables are in the Preview mode. + +{% endnote %} + +A column-oriented table in {{ ydb-short-name }} is a relational table containing a set of related data and made up of rows and columns. Unlike regular [row-oriented {{ ydb-short-name }} tables](#table) designed for [OLTP loads](https://ru.wikipedia.org/wiki/OLTP), column-oriented tables are optimized for data analytics and [OLAP loads](https://ru.wikipedia.org/wiki/OLAP). + +The current primary use case for column-oriented tables is writing data with the increasing primary key, for example, event time, analyzing this data, and deleting expired data based on TTL. The optimal method of inserting data to column-oriented tables is batch writing in blocks of several megabytes. + +The data batches are inserted atomically: the data will be written either to all partitions or to none of them. Read operations analyze only the data fully written to your column-oriented tables. + +In most cases, working with column-oriented {{ ydb-short-name }} tables is similar to row-oriented tables. However, there are the following distinctions: + +* You can only use NOT NULL columns as your key columns. +* Data is not partitioned by the primary key but by the hash from the [partitioning columns](#olap-tables-partitioning). +* A [limited set](#olap-data-types) of data types is supported. + +What's currently not supported: + +* Reading data from replicas +* Secondary indexes +* Bloom filters +* Change Data Capture +* Renaming tables +* Custom attributes in tables +* Updating data column lists in column-oriented tables +* Adding data to column-oriented tables by the SQL `INSERT` operator +* Deleting data from column-oriented tables using the SQL `DELETE` operator The data is actually deleted on TTL expiry. + +## Supported data types {#olap-data-types} + +| Data type | Can be used in<br>column-oriented tables | Can be used<br>as primary key | +---|---|--- +| `Bool` | ☓ | ☓ | +| `Date` | ✓ | ✓ | +| `Datetime` | ✓ | ✓ | +| `Decimal` | ☓ | ☓ | +| `Double` | ✓ | ☓ | +| `Float` | ✓ | ☓ | +| `Int16` | ✓ | ☓ | +| `Int32` | ✓ | ✓ | +| `Int64` | ✓ | ✓ | +| `Int8` | ✓ | ☓ | +| `Interval` | ☓ | ☓ | +| `JsonDocument` | ✓ | ☓ | +| `Json` | ✓ | ☓ | +| `String` | ✓ | ✓ | +| `Timestamp` | ✓ | ✓ | +| `Uint16` | ✓ | ✓ | +| `Uint32` | ✓ | ✓ | +| `Uint64` | ✓ | ✓ | +| `Uint8` | ✓ | ✓ | +| `Utf8` | ✓ | ✓ | +| `Uuid` | ☓ | ☓ | +| `Yson` | ✓ | ☓ | + +Learn more in [{#T}](../yql/reference/types/index.md). + +## Partitioning {#olap-tables-partitioning} + +Unlike row-oriented {{ ydb-short-name }} tables, you cannot partition column-oriented tables by primary keys but only by specially designated partitioning keys. Partitioning keys constitute a subset of the table's primary keys. + +Unlike data partitioning in row-oriented {{ ydb-short-name }} tables, key values are not used to partition data in column-oriented tables. Hash values from keys are used instead. This way, you can uniformly distribute data across all your existing partitions. This kind of partitioning enables you to avoid hotspots at data insert, streamlining analytical queries that process (that is, read) large data amounts. + +How you select partitioning keys substantially affects the performance of your column-oriented tables. Learn more in [{#T}](../best_practices/pk-olap-scalability.md). + +To manage data partitioning, use the `AUTO_PARTITIONING_MIN_PARTITIONS_COUNT` additional parameter. The system ignores other partitioning parameters for column-oriented tables. + +`AUTO_PARTITIONING_MIN_PARTITIONS_COUNT` sets the minimum physical number of partitions used to store data. + +* Type: `Uint64`. +* The default value is `1`. + +Because it ignores all the other partitioning parameters, the system uses the same value as the upper partition limit. + +## See also {#see-also} + +* [{#T}](../yql/reference/syntax/create_table.md#olap-tables) diff --git a/ydb/docs/en/core/concepts/datamodel/_includes/table.md b/ydb/docs/en/core/concepts/datamodel/_includes/table.md index f5e2a860a23..2fdea68b072 100644 --- a/ydb/docs/en/core/concepts/datamodel/_includes/table.md +++ b/ydb/docs/en/core/concepts/datamodel/_includes/table.md @@ -26,7 +26,7 @@ A split or a merge usually takes about 500 milliseconds. During this time, the d The following table partitioning parameters are defined in the data schema: -#### AUTO_PARTITIONING_BY_SIZE {#auto-part-by-load} +#### AUTO_PARTITIONING_BY_SIZE * Type: `Enum` (`ENABLED`, `DISABLED`). * Default value: `ENABLED`. @@ -148,7 +148,7 @@ Each table has a `default` column group that includes all the columns that don't Column groups are assigned attributes that affect data storage: * The type of the data storage device used (SSD or HDD, availability depends on the {{ ydb-short-name }} cluster configuration). -* Data compression mode (without compression or compression using the [LZ4](https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)) algorithm). +* Data compression mode (without compression or compression using the [LZ4](https://en.wikipedia.org/wiki/LZ4) algorithm). Attributes for a column group are set when creating a table (for example, they can be explicitly set for a default column group) and changed afterwards. Changes in storage attributes aren't applied to the data immediately, but later, at manual or automatic LSM compaction. diff --git a/ydb/docs/en/core/concepts/datamodel/toc_i.yaml b/ydb/docs/en/core/concepts/datamodel/toc_i.yaml index 0f42349ff3f..f46cb49ccc8 100644 --- a/ydb/docs/en/core/concepts/datamodel/toc_i.yaml +++ b/ydb/docs/en/core/concepts/datamodel/toc_i.yaml @@ -2,4 +2,5 @@ items: - { name: Overview, href: index.md } - { name: Directory, href: dir.md } - { name: Table, href: table.md } +- { name: Сolumn-oriented table, href: ../column-table.md } - { name: Topic, href: ../topic.md } diff --git a/ydb/docs/en/core/deploy/manual/_includes/prepare-configs.md b/ydb/docs/en/core/deploy/manual/_includes/prepare-configs.md index 39cabfd4010..ca254caf421 100644 --- a/ydb/docs/en/core/deploy/manual/_includes/prepare-configs.md +++ b/ydb/docs/en/core/deploy/manual/_includes/prepare-configs.md @@ -42,10 +42,10 @@ Prepare a configuration file for {{ ydb-short-name }}: rack: '1' ``` -1. In the `blob_storage_config` section, update the FQDN of each node used to store the static storage group: +1. Under `blob_storage_config`, edit the FQDNs of all the nodes accommodating your static storage group: - * in the `mirror-3-dc` mode, FQDNs for 9 nodes are needed; - * in the `block-4-2` mode, FQDNs for 8 nodes are needed. + * For the `mirror-3-dc` scheme, specify FQDNs for nine nodes. + * For the `block-4-2` scheme, specify FQDNs for eight nodes. 1. Enable user authentication (optional). diff --git a/ydb/docs/en/core/deploy/manual/deploy-ydb-on-premises.md b/ydb/docs/en/core/deploy/manual/deploy-ydb-on-premises.md index 043835eab5b..719313b8a53 100644 --- a/ydb/docs/en/core/deploy/manual/deploy-ydb-on-premises.md +++ b/ydb/docs/en/core/deploy/manual/deploy-ydb-on-premises.md @@ -2,7 +2,7 @@ This document describes how to deploy a multi-tenant {{ ydb-short-name }} cluster on multiple bare-metal or virtual servers. -## Before you begin {#before-start} +## Getting started {#before-start} ### Prerequisites {#requirements} @@ -10,51 +10,51 @@ Review the [system requirements](../../cluster/system-requirements.md) and the [ Make sure you have SSH access to all servers. This is required to install artifacts and run the {{ ydb-short-name }} executable. -The network configuration must allow TCP connections on the following ports (by default, can be changed if necessary): +The network configuration must allow TCP connections on the following ports (these are defaults, but you can change them by settings): -* 22: SSH service. +* 22: SSH service * 2135, 2136: GRPC for client-cluster interaction. -* 19001, 19002: Interconnect for intra-cluster node interaction. -* 8765, 8766: The HTTP interface of {{ ydb-short-name }} Embedded UI. +* 19001, 19002: Interconnect for intra-cluster node interaction +* 8765, 8766: HTTP interface of {{ ydb-short-name }} Embedded UI. -Ensure the clock synchronization for the servers within the cluster, using `ntpd` or `chrony` tools. Ideally all servers should be synced to the same time source, to ensure that leap seconds are handled in the same way. +Make sure that the system clocks running on all the cluster's servers are synced by `ntpd` or `chrony`. We recommend using the same time source for all servers in the cluster to maintain consistent leap seconds processing. -If your servers' Linux flavor uses `syslogd` for logging, configure logfiles rotation using the `logrotate` or similar tools. {{ ydb-short-name }} services may generate a significant amount of log data, specifically when the logging level is increased for diagnostical purposes, so system log files rotation is important to avoid the overflows of the `/var` filesystem. +If the Linux flavor run on the cluster servers uses `syslogd` for logging, set up log file rotation using`logrotate` or similar tools. {{ ydb-short-name }} services can generate substantial amounts of system logs, particularly when you elevate the logging level for diagnostic purposes. That's why it's important to enable system log file rotation to prevent the `/var` file system overflow. Select the servers and disks to be used for storing data: -* Use the `block-4-2` fault tolerance model for cluster deployment in one availability zone (AZ). Use at least 8 nodes to be able to withstand the loss of 2 of them. -* Use the `mirror-3-dc` fault tolerance model for cluster deployment in three availability zones (AZ). To survive the loss of a single AZ and of 1 node in another AZ, use at least 9 nodes. The number of nodes in each AZ should be the same. +* Use the `block-4-2` fault tolerance model for cluster deployment in one availability zone (AZ). Use at least eight servers to safely survive the loss of two servers. +* Use the `mirror-3-dc` fault tolerance model for cluster deployment in three availability zones (AZ). To survive the loss of one AZ and one server in another AZ, use at least nine servers. Make sure that the number of servers running in each AZ is the same. {% note info %} -Run each static node on a separate server. Static and dynamic nodes may run on the same server. Multiple dynamic nodes may run on the same server, provided that it has sufficient compute resources. +Run each static node (data node) on a separate server. Both static and dynamic nodes can run together on the same server. A server can also run multiple dynamic nodes if it has enough computing power. {% endnote %} -For more information about the hardware requirements, see [{#T}](../../cluster/system-requirements.md). +For more information about hardware requirements, see [{#T}](../../cluster/system-requirements.md). -### TLS keys and certificates preparation {#tls-certificates} +### Preparing TLS keys and certificates {#tls-certificates} -Traffic protection and {{ ydb-short-name }} server node authentication is implemented using the TLS protocol. Before installing the cluster, the list of nodes, their naming scheme and particular names should be defined, and used to prepare the TLS keys and certificates. +The TLS protocol provides traffic protection and authentication for {{ ydb-short-name }} server nodes. Before you install your cluster, determine which servers it will host, establish the node naming convention, come up with node names, and prepare your TLS keys and certificates. -The existing or new TLS certificates can be used. The following PEM-encoded key and certificate files are needed to run the cluster: -* `ca.crt` - public certificate of the Certification Authority (CA), used to sign all other TLS certificate (same file on all servers in the cluster); -* `node.key` - secret keys for each of the cluster nodes (separate key for each server); -* `node.crt` - public certificate for each of the cluster nodes (the certificate for the corresponding private key); -* `web.pem` - node secret key, node public certificate and Certification Authority certificate concatenation, to be used by the internal HTTP monitoring service (separate file for each server). +You can use existing certificates or generate new ones. Prepare the following files with TLS keys and certificates in the PEM format: +* `ca.crt`: CA-issued certificate used to sign the other TLS certificates (these files are the same on all the cluster nodes). +* `node.key`: Secret TLS keys for each cluster node (one key per cluster server). +* `node.crt`: TLS certificates for each cluster node (each certificate corresponds to a key). +* `web.pem`: Concatenation of the node secret key, node certificate, and the CA certificate needed for the monitoring HTTP interface (a separate file is used for each server in the cluster). -Certificate parameters are typically defined by the organizational policies. Typically {{ ydb-short-name }} certificates are generated with the following parameters: -* 2048 or 4096 bit RSA keys; -* SHA-256 with RSA encryption algorithm for certificate signing; -* node certificates validity period - 1 year; -* CA certificate validity period - 3 years or more. +Your organization should define the parameters required for certificate generation in its policy. The following parameters are commonly used for generating certificates and keys for {{ ydb-short-name }}: +* 2048-bit or 4096-bit RSA keys +* Certificate signing algorithm: SHA-256 with RSA encryption +* Validity period of node certificates: at least 1 year +* CA certificate validity period: at least 3 years. -The CA certificate must be marked appropriately: it needs the CA sign, and the usage for "Digital Signature, Non Repudiation, Key Encipherment, Certificate Sign" enabled. +Make sure that the CA certificate is appropriately labeled, with the CA property enabled along with the "Digital Signature, Non Repudiation, Key Encipherment, Certificate Sign" usage types. -For node certificates, it is important that the actual host name (or names) matches the values specified in the "Subject Alternative Name" field. Node certificates should have "Digital Signature, Key Encipherment" usage enabled, as well as "TLS Web Server Authentication, TLS Web Client Authentication" extended usage. Node certificates should support both server and client authentication (`extendedKeyUsage = serverAuth,clientAuth` option in the OpenSSL settings). +For node certificates, it's key that the actual host name (or names) match the values in the "Subject Alternative Name" field. Enable both the regular usage types ("Digital Signature, Key Encipherment") and advanced usage types ("TLS Web Server Authentication, TLS Web Client Authentication") for the certificates. Node certificates must support both server authentication and client authentication (the `extendedKeyUsage = serverAuth,clientAuth` option in the OpenSSL settings). -{{ ydb-short-name }} repository on Github contains the [sample script](https://github.com/ydb-platform/ydb/blob/main/ydb/deploy/tls_cert_gen/) which can be used to automate the batch generation or renewal of TLS certificates for the whole cluster. The script can build the key and certificate files for the list of cluster nodes in a single operation, which simplifies the installation preparation. +For batch generation or update of {{ ydb-short-name }} cluster certificates by OpenSSL, you can use the [sample script](https://github.com/ydb-platform/ydb/blob/main/ydb/deploy/tls_cert_gen/) from the {{ ydb-short-name }} GitHub repository. Using the script, you can streamline preparation for installation, automatically generating all the key files and certificate files for all your cluster nodes in a single step. ## Create a system user and a group to run {{ ydb-short-name }} {#create-user} @@ -65,22 +65,22 @@ sudo groupadd ydb sudo useradd ydb -g ydb ``` -To make sure that {{ ydb-short-name }} has access to block disks to run, the new system user needs to be added to the `disk` group: +To ensure that {{ ydb-short-name }} can access block disks, add the user that will run {{ ydb-short-name }} processes, to the `disk` group: ```bash sudo usermod -aG disk ydb ``` -## Install {{ ydb-short-name }} software on each server {#install-binaries} +## Install {{ ydb-short-name }} software on each {#install-binaries} server -1. Download and unpack the archive with the `ydbd` executable and the required libraries: +1. Download and unpack an archive with the `ydbd` executable and the libraries required for {{ ydb-short-name }} to run: ```bash mkdir ydbd-stable-linux-amd64 curl -L https://binaries.ydb.tech/ydbd-stable-linux-amd64.tar.gz | tar -xz --strip-component=1 -C ydbd-stable-linux-amd64 ``` -1. Create the directories to install the {{ ydb-short-name }} binaries: +1. Create directories for {{ ydb-short-name }} software: ```bash sudo mkdir -p /opt/ydb /opt/ydb/cfg @@ -93,51 +93,51 @@ sudo usermod -aG disk ydb sudo cp -iR ydbd-stable-linux-amd64/lib /opt/ydb/ ``` -1. Set the file and directory ownership: +1. Set the owner of files and folders: - ```bash - sudo chown -R root:bin /opt/ydb - ``` + ```bash + sudo chown -R root:bin /opt/ydb + ``` ## Prepare and format disks on each server {#prepare-disks} {% include [_includes/storage-device-requirements.md](../../_includes/storage-device-requirements.md) %} -1. Create a partition on the selected disk: +1. Create partitions on the selected disks: {% note alert %} - The following step will delete all partitions on the specified disks. Make sure that you specified the disks that have no other data! + The next operation will delete all partitions on the specified disk. Make sure that you specified a disk that contains no external data. {% endnote %} ```bash - DISK=/dev/nvme0n1 - sudo parted ${DISK} mklabel gpt -s - sudo parted -a optimal ${DISK} mkpart primary 0% 100% - sudo parted ${DISK} name 1 ydb_disk_ssd_01 - sudo partx --u ${DISK} + DISK=/dev/nvme0n1 + sudo parted ${DISK} mklabel gpt -s + sudo parted -a optimal ${DISK} mkpart primary 0% 100% + sudo parted ${DISK} name 1 ydb_disk_ssd_01 + sudo partx --u ${DISK} ``` - As a result, a disk labeled `/dev/disk/by-partlabel/ydb_disk_ssd_01` will appear in the system. + As a result, a disk labeled `/dev/disk/by-partlabel/ydb_disk_ssd_01` will appear on the system. - If you plan to use more than one disk on each server, replace `ydb_disk_ssd_01` with a unique label for each one. Disk labels must be unique within a single server, and are used in the configuration files, as shown in the subsequent instructions. + If you plan to use more than one disk on each server, replace `ydb_disk_ssd_01` with a unique label for each one. Disk labels should be unique within each server. They are used in configuration files, see the following guides. - For cluster servers having similar disk configuration it is convenient to use exacty the same disk labels, to simplify the subsequent configuration. + To streamline the next setup step, it makes sense to use the same disk labels on cluster servers having the same disk configuration. -2. Format the disk with the builtin command below: +2. Format the disk by this command built-in the `ydbd` executable: ```bash sudo LD_LIBRARY_PATH=/opt/ydb/lib /opt/ydb/bin/ydbd admin bs disk obliterate /dev/disk/by-partlabel/ydb_disk_ssd_01 ``` - Perform this operation for each disk that will be used to store {{ ydb-short-name }} data. + Perform this operation for each disk to be used for {{ ydb-short-name }} data storage. ## Prepare configuration files {#config} {% include [prepare-configs.md](_includes/prepare-configs.md) %} -When TLS traffic protection is to be used (which is the default), ensure that {{ ydb-short-name }} configuration file contains the proper paths to key and certificate files in the `interconnect_config` and `grpc_config` sections, as shown below: +In the traffic encryption mode, make sure that the {{ ydb-short-name }} configuration file specifies paths to key files and certificate files under `interconnect_config` and `grpc_config`: ```json interconnect_config: @@ -154,13 +154,13 @@ grpc_config: - legacy ``` -Save the {{ ydb-short-name }} configuration file as `/opt/ydb/cfg/config.yaml` on each server of the cluster. +Save the {{ ydb-short-name }} configuration file as `/opt/ydb/cfg/config.yaml` on each cluster node. -For more detailed information about creating configurations, see [Cluster configurations](../configuration/config.md). +For more detailed information about creating the configuration file, see [Cluster configurations](../configuration/config.md). -## Copy TLS keys and certificates to each server {#tls-copy-cert} +## Copy the TLS keys and certificates to each server {#tls-copy-cert} -The TLS keys and certificates prepared need to be copied into the protected directory on each node of the {{ ydb-short-name }} cluster. An example of commands to create of the protected directory and copy the key and certificate files into it is shown below. +Make sure to copy the generated TLS keys and certificates to a protected folder on each {{ ydb-short-name }} cluster node. Below are sample commands that create a protected folder and copy files with keys and certificates. ```bash sudo mkdir -p /opt/ydb/certs @@ -178,7 +178,7 @@ sudo chmod 700 /opt/ydb/certs - Manually - Run {{ ydb-short-name }} storage service on each static node: + Run a {{ ydb-short-name }} data storage service on each static cluster node: ```bash sudo su - ydb @@ -190,7 +190,7 @@ sudo chmod 700 /opt/ydb/certs - Using systemd - On each static node, create a `/etc/systemd/system/ydbd-storage.service` systemd configuration file with the following contents. Sample file is also available [in the repository](https://github.com/ydb-platform/ydb/blob/main/ydb/deploy/systemd_services/ydbd-storage.service). + On each server that will host a static cluster node, create a systemd `/etc/systemd/system/ydbd-storage.service` configuration file by the template below. You can also [download](https://github.com/ydb-platform/ydb/blob/main/ydb/deploy/systemd_services/ydbd-storage.service) the sample file from the repository. ```text [Unit] @@ -223,7 +223,7 @@ sudo chmod 700 /opt/ydb/certs WantedBy=multi-user.target ``` - Run {{ ydb-short-name }} storage service on each static node: + Run the service on each static {{ ydb-short-name }} node: ```bash sudo systemctl start ydbd-storage @@ -233,17 +233,17 @@ sudo chmod 700 /opt/ydb/certs ## Initialize a cluster {#initialize-cluster} -Cluster initialization configures the set of static nodes defined in the cluster configuration file to store {{ ydb-short-name }} data. +The cluster initialization operation sets up static nodes listed in the cluster configuration file, for storing {{ ydb-short-name }} data. -To perform the cluster initialization, the path to the `ca.crt` file containing the Certification Authority certificate has to be specified in the corresponding commands. Copy the `ca.crt` file to the host where those commands will be executed. +To initialize the cluster, you'll need the `ca.crt` file issued by the Certificate Authority. Use its path in the initialization commands. Before running the commands, copy `ca.crt` to the server where you will run the commands. -Cluster initialization actions sequence depends on whether user authentication mode is enabled in the {{ ydb-short-name }} configuration file. +Cluster initialization actions depend on whether the user authentication mode is enabled in the {{ ydb-short-name }} configuration file. {% list tabs %} - Authentication enabled - To execute the administrative commands (including cluster initialization, database creation, disk management, and others) in a cluster with user authentication enabled, an authentication token has to be obtained using the {{ ydb-short-name }} CLI client version 2.0.0 or higher. The {{ ydb-short-name }} CLI client can be installed on any computer with network access to the cluster nodes (for example, on one of the cluster nodes) by following the [installation instructions](../../reference/ydb-cli/install.md). + To execute administrative commands (including cluster initialization, database creation, disk management, and others) in a cluster with user authentication mode enabled, you must first get an authentication token using the {{ ydb-short-name }} CLI client version 2.0.0 or higher. You must install the {{ ydb-short-name }} CLI client on any computer with network access to the cluster nodes (for example, on one of the cluster nodes) by following the [installation instructions](../../reference/ydb-cli/install.md). When the cluster is first installed, it has a single `root` account with a blank password, so the command to get the token is the following: @@ -252,9 +252,9 @@ Cluster initialization actions sequence depends on whether user authentication m --user root --no-password auth get-token --force >token-file ``` - Any static node's address can be specified as the endpoint (the `-e` or `--endpoint` parameter). + You can specify any storage server in the cluster as an endpoint (the `-e` or `--endpoint` parameter). - If the command above is executed successfully, the authentication token will be written to `token-file`. This token file needs to be copied to one of the cluster storage nodes. Next, run the following commands on this cluster node: + If the command above is executed successfully, the authentication token will be written to `token-file`. Copy the token file to one of the storage servers in the cluster, then run the following commands on the server: ```bash export LD_LIBRARY_PATH=/opt/ydb/lib @@ -265,7 +265,7 @@ Cluster initialization actions sequence depends on whether user authentication m - Authentication disabled - On one of the cluster storage nodes, run the commands: + On one of the storage servers in the cluster, run these commands: ```bash export LD_LIBRARY_PATH=/opt/ydb/lib @@ -276,60 +276,60 @@ Cluster initialization actions sequence depends on whether user authentication m {% endlist %} -Upon successful cluster initialization, the command execution status code shown on the screen should be zero. +You will see that the cluster was initialized successfully when the cluster initialization command returns a zero code. ## Create a database {#create-db} -To work with tables, you need to create at least one database and run a process (or processes) to service this database (a dynamic node). +To work with tables, you need to create at least one database and run a process (or processes) to serve this database (dynamic nodes): -In order to run the database creation administrative command, the `ca.crt` file with the CA certificate is needed, similar to the cluster initialization steps shown above. +To execute the administrative command for database creation, you will need the `ca.crt` certificate file issued by the Certificate Authority (see the above description of cluster initialization). -On database creation the initial number of storage groups is configured, which determines the available input/output throughput and data storage capacity. The number of storage groups can be increased after the database creation, if needed. +When creating your database, you set an initial number of storage groups that determine the available input/output throughput and maximum storage. For an existing database, you can increase the number of storage groups when needed. -Database creation actions sequence depends on whether user authentication mode is enabled in the {{ ydb-short-name }} configuration file. +The database creation procedure depends on whether you enabled user authentication in the {{ ydb-short-name }} configuration file. {% list tabs %} - Authentication enabled - The authentication token is needed. The existing token file obtained at [cluster initialization stage](#initialize-cluster) can be used, or the new token can be obtained. + Get an authentication token. Use the authentication token file that you obtained when [initializing the cluster](#initialize-cluster) or generate a new token. - The authentication token file needs to be copied to one of the static nodes. Next, run the following commands on this cluster node: + Copy the token file to one of the storage servers in the cluster, then run the following commands on the server: - ```bash - export LD_LIBRARY_PATH=/opt/ydb/lib - /opt/ydb/bin/ydbd -f token-file --ca-file ca.crt -s grpcs://`hostname -s`:2135 \ - admin database /Root/testdb create ssd:1 - echo $? - ``` + ```bash + export LD_LIBRARY_PATH=/opt/ydb/lib + /opt/ydb/bin/ydbd -f token-file --ca-file ca.crt -s grpcs://`hostname -s`:2135 \ + admin database /Root/testdb create ssd:1 + echo $? + ``` - Authentication disabled - On one of the static nodes, run the commands: + On one of the storage servers in the cluster, run these commands: - ```bash - export LD_LIBRARY_PATH=/opt/ydb/lib - /opt/ydb/bin/ydbd --ca-file ca.crt -s grpcs://`hostname -s`:2135 \ - admin database /Root/testdb create ssd:1 - echo $? - ``` + ```bash + export LD_LIBRARY_PATH=/opt/ydb/lib + /opt/ydb/bin/ydbd --ca-file ca.crt -s grpcs://`hostname -s`:2135 \ + admin database /Root/testdb create ssd:1 + echo $? + ``` {% endlist %} -The command examples above use the following parameters: -* `/Root`: The name of the root domain, must match the `domains_config`.`domain`.`name` setting in the cluster configuration file. -* `testdb`: The name of the created database. -* `ssd:1`: The name of the storage pool and the number of the storage groups to be used by the database. The pool name usually means the type of data storage devices and must match the `storage_pool_types`.`kind` setting inside the `domains_config`.`domain` element of the configuration file. +You will see that the database was created successfully when the command returns a zero code. -Upon successful database creation, the command execution status code shown on the screen should be zero. +The command example above uses the following parameters: +* `/Root`: Name of the root domain, must match the `domains_config`.`domain`.`name` setting in the cluster configuration file. +* `testdb`: Name of the created database. +* `ssd:1`: Name of the storage pool and the number of storage groups allocated. The pool name usually means the type of data storage devices and must match the `storage_pool_types`.`kind` setting inside the `domains_config`.`domain` element of the configuration file. -## Start the dynamic nodes {#start-dynnode} +## Run dynamic nodes {#start-dynnode} {% list tabs %} - Manually - Start the {{ ydb-short-name }} dynamic node for the `/Root/testdb` database: + Run the {{ ydb-short-name }} dynamic node for the `/Root/testdb` database: ```bash sudo su - ydb @@ -344,11 +344,11 @@ Upon successful database creation, the command execution status code shown on th --node-broker grpcs://<ydb3>:2135 ``` - In the command shown above `<ydbN>` entries correspond to the FQDNs of any three servers running the static nodes. + In the command example above, `<ydbN>` is replaced by FQDNs of any three servers running the cluster's static nodes. - Using systemd - Create a systemd configuration file named `/etc/systemd/system/ydbd-testdb.service` with the following content. Sample file is also available [in the repository](https://github.com/ydb-platform/ydb/blob/main/ydb/deploy/systemd_services/ydbd-testdb.service). + Create a systemd configuration file named `/etc/systemd/system/ydbd-testdb.service` by the following template: You can also [download](https://github.com/ydb-platform/ydb/blob/main/ydb/deploy/systemd_services/ydbd-testdb.service) the sample file from the repository. ```text [Unit] @@ -385,9 +385,9 @@ Upon successful database creation, the command execution status code shown on th WantedBy=multi-user.target ``` - In the file shown above `<ydbN>` entries correspond to the FQDNs of any three servers running the static nodes. + In the file example above, `<ydbN>` is replaced by FQDNs of any three servers running the cluster's static nodes. - Start the {{ ydb-short-name }} dynamic node for the `/Root/testdb` database: + Run the {{ ydb-short-name }} dynamic node for the `/Root/testdb` database: ```bash sudo systemctl start ydbd-testdb @@ -395,15 +395,15 @@ Upon successful database creation, the command execution status code shown on th {% endlist %} -Start the additional dynamic nodes on other servers to scale and to ensure database and availability. +Run additional dynamic nodes on other servers to ensure database scalability and fault tolerance. -## Initial user accounts setup {#security-setup} +## Initial account setup {#security-setup} -If authentication mode is enabled in the cluster configuration file, initial user accounts setup must be done before working with the {{ ydb-short-name }} cluster. +If authentication mode is enabled in the cluster configuration file, initial account setup must be done before working with the {{ ydb-short-name }} cluster. The initial installation of the {{ ydb-short-name }} cluster automatically creates a `root` account with a blank password, as well as a standard set of user groups described in the [Access management](../../cluster/access.md) section. -To perform the initial user accounts setup in the created {{ ydb-short-name }} cluster, run the following operations: +To perform initial account setup in the created {{ ydb-short-name }} cluster, run the following operations: 1. Install the {{ ydb-short-name }} CLI as described in the [documentation](../../reference/ydb-cli/install.md). @@ -416,21 +416,21 @@ To perform the initial user accounts setup in the created {{ ydb-short-name }} c Replace the `passw0rd` value with the required password. -1. Create the additional accounts: +1. Create additional accounts: ```bash ydb --ca-file ca.crt -e grpcs://<node.ydb.tech>:2136 -d /Root/testdb --user root \ yql -s 'CREATE USER user1 PASSWORD "passw0rd"' ``` -1. Set the account permissions by including it into the security groups: +1. Set the account rights by including them in the integrated groups: ```bash ydb --ca-file ca.crt -e grpcs://<node.ydb.tech>:2136 -d /Root/testdb --user root \ yql -s 'ALTER GROUP `ADMINS` ADD USER user1' ``` -In the command examples above, `<node.ydb.tech>` is the FQDN of the server running the dynamic node that supports the `/Root/testdb` database. +In the command examples above, `<node.ydb.tech>` is the FQDN of the server running any dynamic node that serves the `/Root/testdb` database. When running the account creation and group assignment commands, the {{ ydb-short-name }} CLI client will request the `root` user's password. You can avoid multiple password entries by creating a connection profile as described in the [{{ ydb-short-name }} CLI documentation](../../reference/ydb-cli/profile/index.md). @@ -445,19 +445,19 @@ When running the account creation and group assignment commands, the {{ ydb-shor yql -s 'CREATE TABLE `testdir/test_table` (id Uint64, title Utf8, PRIMARY KEY (id));' ``` - Where `<node.ydb.tech>` is the FQDN of the server running the dynamic node that supports the `/Root/testdb` database. + Here, `<node.ydb.tech>` is the FQDN of the server running the dynamic node that serves the `/Root/testdb` database. -## Validate the access to the embedded UI +## Checking access to the built-in web interface -To validate the access to {{ ydb-short-name }} embedded UI a Web browser should be used, opening the address `https://<node.ydb.tech>:8765`, where `<node.ydb.tech>` should be replaced with the FQDN of any static node server. +To check access to the {{ ydb-short-name }} built-in web interface, open in the browser the `https://<node.ydb.tech>:8765` URL, where `<node.ydb.tech>` is the FQDN of the server running any static {{ ydb-short-name }} node. -Web browser should be configured to trust the CA used to generate the cluster node certificates, otherwise a warning will be shown that the certificate is not trusted. +In the web browser, set as trusted the certificate authority that issued certificates for the {{ ydb-short-name }} cluster. Otherwise, you will see a warning about an untrusted certificate. -In case the authentication is enabled, the Web browser will display the login and password prompt. After entering the correct credentials, the initial {{ ydb-short-name }} embedded UI page will be shown. The available functions and user interface are described in the following document: [{#T}](../../maintenance/embedded_monitoring/index.md). +If authentication is enabled in the cluster, the web browser should prompt you for a login and password. Enter your credentials, and you'll see the built-in interface welcome page. The user interface and its features are described in [{#T}](../../maintenance/embedded_monitoring/index.md). {% note info %} -Highly available HTTP load balancer, based on `haproxy`, `nginx` or similar software, is typically used to enable access to the {{ ydb-short-name }} embedded UI. The configuration details for HTTP load balancer are out of scope for the basic {{ ydb-short-name }} installation instruction. +A common way to provide access to the {{ ydb-short-name }} built-in web interface is to set up a fault-tolerant HTTP balancer running `haproxy`, `nginx`, or similar software. A detailed description of the HTTP balancer is beyond the scope of the standard {{ ydb-short-name }} installation guide. {% endnote %} @@ -466,39 +466,39 @@ Highly available HTTP load balancer, based on `haproxy`, `nginx` or similar soft {% note warning %} -We DO NOT recommend to run {{ ydb-short-name }} in the unprotected mode for any purpose. +We do not recommend using the unprotected {{ ydb-short-name }} mode for development or production environments. {% endnote %} -The installation procedure described above assumes that {{ ydb-short-name }} runs in its default protected mode. +The above installation procedure assumes that {{ ydb-short-name }} was deployed in the standard protected mode. -The unprotected {{ ydb-short-name }} mode is also available, and is intended for internal purposes, mainly for the development and testing of {{ ydb-short-name }} software. When running in the unprotected mode: -* all traffic is passed in the clear text, including the intra-cluster communications and cluster-client communications; -* user authentication is not used (enabling authentication without TLS traffic protection does not make much sense, as login and password are both passed unprotected through the network). +The unprotected {{ ydb-short-name }} mode is primarily intended for test scenarios associated with {{ ydb-short-name }} software development and testing. In the unprotected mode: +* Traffic between cluster nodes and between applications and the cluster runs over an unencrypted connection. +* Users are not authenticated (it doesn't make sense to enable authentication when the traffic is unencrypted because the login and password in such a configuration would be transparently transmitted across the network). -Installing {{ ydb-short-name }} for the unprotected mode is performed according with the general procedure described above, with the exceptions listed below: +When installing {{ ydb-short-name }} to run in the unprotected mode, follow the above procedure, with the following exceptions: -1. TLS keys and certificates generation is skipped. No need to copy the key and certificate files to cluster servers. +1. When preparing for the installation, you do not need to generate TLS certificates and keys and copy the certificates and keys to the cluster nodes. -1. Subsection `security_config` of section `domains_config` is excluded from the configuration file. Sections `interconnect_config` and `grpc_config` are excluded, too. +1. In the configuration files, remove the `security_config` subsection under `domains_config`. Remove the `interconnect_config` and `grpc_config` sections entirely. -1. The syntax of commands to start static and dynamic nodes is reduced: the options referring to TLS key and certificate files are excluded, `grpc` protocol name is used instead of `grpcs` for connection points. +1. Use simplified commands to run static and dynamic cluster nodes: omit the options that specify file names for certificates and keys; use the `grpc` protocol instead of `grpcs` when specifying the connection points. -1. The step to obtain the authentication token before cluster initialization and database creation is skipped. +1. Skip the step of obtaining an authentication token before cluster initialization and database creation because it's not needed in the unprotected mode. -1. Cluster initialization is performed with the following command: +1. Cluster initialization command has the following format: - ```bash - export LD_LIBRARY_PATH=/opt/ydb/lib - /opt/ydb/bin/ydbd admin blobstorage config init --yaml-file /opt/ydb/cfg/config.yaml - echo $? - ``` + ```bash + export LD_LIBRARY_PATH=/opt/ydb/lib + /opt/ydb/bin/ydbd admin blobstorage config init --yaml-file /opt/ydb/cfg/config.yaml + echo $? + ``` -1. Database creation is performed with the following command: +1. Database creation command has the following format: - ```bash - export LD_LIBRARY_PATH=/opt/ydb/lib - /opt/ydb/bin/ydbd admin database /Root/testdb create ssd:1 - ``` + ```bash + export LD_LIBRARY_PATH=/opt/ydb/lib + /opt/ydb/bin/ydbd admin database /Root/testdb create ssd:1 + ``` -1. `grpc` protocol is used instead of `grpcs` when configuring the connections to the database in {{ ydb-short-name }} CLI and applications. Authentication is not used. +1. When accessing your database from the {{ ydb-short-name }} CLI and applications, use grpc instead of grpcs and skip authentication. diff --git a/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/create_table.md b/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/create_table.md index 1374eaabeff..d5605c195b1 100644 --- a/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/create_table.md +++ b/ydb/docs/en/core/yql/reference/yql-core/syntax/_includes/create_table.md @@ -1,5 +1,22 @@ # CREATE TABLE +{% if feature_olap_tables %} + +{{ ydb-short-name }} supports two types of tables: + +* [Row-oriented](../../../../concepts/datamodel/table.md) +* [Column-oriented](../../../../concepts/column-table.md). + +When you create a table, the table type is specified by the `STORE` parameter, with `ROW` creating a [row-oriented table](#row) and `COLUMN` creating a [column](#olap-tables)-oriented table. If the `STORE` parameter is omitted, a row-oriented table is created by default. + +{% endif %} + +{% if feature_olap_tables %} + +## Row-oriented tables {#row} + +{% endif %} + {% if feature_bulk_tables %} The table is created automatically during the first [INSERT INTO](insert_into.md){% if feature_mapreduce %} in the database specified in [USE](../use.md){% endif %}. The schema is determined automatically. @@ -28,7 +45,7 @@ The `CREATE TABLE` call creates a {% if concept_table %}[table]({{ concept_table WITH ( key = value, ... ) {% endif %} -## Columns {#columns} +{% if feature_olap_tables %}#{% endif %}## Columns {#row-columns} {% if feature_column_container_type == true %} In non-key columns, you can use any data types, but for key columns, only [primitive ones](../../types/primitive.md). When specifying complex types (for example, `List<String>`), the type is enclosed in double quotes. @@ -64,19 +81,19 @@ It is mandatory to specify the `PRIMARY KEY` with a non-empty list of columns. T {% if feature_secondary_index %} -## Secondary indexes {#secondary_index} +{% if feature_olap_tables %}#{% endif %}## Secondary indexes {#secondary_index} The INDEX construct is used to define a {% if concept_secondary_index %}[secondary index]({{ concept_secondary_index }}){% else %}secondary index{% endif %} in a table: ```sql -CREATE TABLE table_name ( +CREATE TABLE table_name ( ... INDEX <index_name> GLOBAL [SYNC|ASYNC] ON ( <index_columns> ) COVER ( <cover_columns> ), ... ) ``` -where: +Where: * **Index_name** is the unique name of the index to be used to access data. * **SYNC/ASYNC** indicates synchronous/asynchronous data writes to the index. If not specified, synchronous. * **Index_columns** is a list of comma-separated names of columns in the created table to be used for a search in the index. @@ -95,13 +112,12 @@ CREATE TABLE my_table ( PRIMARY KEY (a) ) ``` - {% endif %} {% if feature_map_tables and concept_table %} -## Additional parameters {#additional} +{% if feature_olap_tables %}#{% endif %}## Additional parameters {#row-additional} -You can also specify a number of {{ backend_name }}-specific parameters for the table. When creating a table using YQL, such parameters are listed in the ```WITH``` section: +You can also specify a number of {{ backend_name }}-specific parameters for the table. When you create a table, those parameters are listed in the ```WITH``` clause: ```sql CREATE TABLE table_name (...) @@ -132,7 +148,7 @@ WITH ( ); ``` -## Column groups {#column-family} +{% if feature_olap_tables %}#{% endif %}## Column groups {#column-family} Columns of the same table can be grouped to set the following parameters: @@ -170,3 +186,91 @@ Available types of storage devices depend on the {{ ydb-short-name }} cluster co {% endif %} {% endif %} + +{% if feature_olap_tables %} + +## Сolumn-oriented tables {#olap-tables} + +{% note warning %} + +Column-oriented {{ ydb-short-name }} tables are in the Preview mode. + +{% endnote %} + +The `CREATE TABLE` statement creates a [column-oriented](../../../../concepts/column-table.md) table with the specified data schema and key columns (`PRIMARY KEY`). + +```sql +CREATE TABLE table_name ( + column1 type1, + column2 type2 NOT NULL, + column2 type2, + ... + columnN typeN, + PRIMARY KEY ( column, ... ), + ... +) +PARTITION BY HASH(column1, column2, ...) +WITH ( + STORE = COLUMN, + key = value, + ... +) +``` + +### Columns {#olap-columns} + +Data types supported by column-oriented tables and constraints imposed on data types in primary keys or data columns are described in the [supported data types](../../../../concepts/column-table.md#olap-data-types) section for column-oriented tables. + +Make sure to add the `PRIMARY KEY` and `PARTITION BY` clauses with a non-empty list of columns. + +If you omit modifiers, a column is assigned an [optional](../../types/optional.md) type and can accept `NULL` values. To create a non-optional type, use `NOT NULL`. + +**Example** + +```sql +CREATE TABLE my_table ( + a Uint64 NOT NULL, + b String, + c Float, + PRIMARY KEY (b, a) +) +PARTITION BY HASH(b) +WITH ( +STORE = COLUMN +) +``` + +### Additional parameters {#olap-additional} + +You can also specify a number of {{ backend_name }}-specific parameters for the table. When you create a table, those parameters are listed in the ```WITH``` clause: + +```sql +CREATE TABLE table_name (...) +WITH ( + key1 = value1, + key2 = value2, + ... +) +``` + +Here, `key` is the name of the parameter and `value` is its value. + +Supported parameters in column-oriented tables: + +* `AUTO_PARTITIONING_MIN_PARTITIONS_COUNT` sets the minimum physical number of partitions used to store data (see [{#T}](../../../../concepts/column-table.md#olap-tables-partitioning)). + +For example, the following code creates a column-oriented table with ten partitions: + +```sql +CREATE TABLE my_table ( + id Uint64, + title Utf8, + PRIMARY KEY (id) +) +PARTITION BY HASH(id) +WITH ( + AUTO_PARTITIONING_MIN_PARTITIONS_COUNT = 10 +); +``` + +{% endif %} diff --git a/ydb/docs/ru/core/best_practices/pk-olap-scalability.md b/ydb/docs/ru/core/best_practices/pk-olap-scalability.md index c49f157de84..181a9e777c1 100644 --- a/ydb/docs/ru/core/best_practices/pk-olap-scalability.md +++ b/ydb/docs/ru/core/best_practices/pk-olap-scalability.md @@ -16,8 +16,8 @@ {% endnote %} -В настоящий момент колоночные таблицы не поддерживают автоматического репартицирования, поэтому важно указывать правильное число партиций при создании таблицы. Оценить необходимое количество партиций можно по предполагаемому объему добавляемых в таблицу данных. Средняя пропускная способность партции на добавление данных — 1 МБ/c. При этом пропускная способность в первую очередь определяется выбранным первичным ключом (необходимостью сортировки данных внутри партиции при добавлении новых данных). Для небольших потоков данных не рекомендуется задавать больше 128 партиций. +В настоящий момент колоночные таблицы не поддерживают автоматического репартицирования, поэтому важно указывать правильное число партиций при создании таблицы. Оценить необходимое количество партиций можно по предполагаемому объему добавляемых в таблицу данных. Средняя пропускная способность партиции на добавление данных — 1 МБ/c. При этом пропускная способность в первую очередь определяется выбранным первичным ключом (необходимостью сортировки данных внутри партиции при добавлении новых данных). Для небольших потоков данных не рекомендуется задавать больше 128 партиций. Пример: -При потоке данных в 1 ГБ/с оптимально использовать аналитическую таблицу с 1000 партиций. При этом создавать таблицы с значительным запасом партиций не стоит, т.к. это приведет к росту потребляемых ресурсов в кластере и итоговому замедлению скорости выполнения запросов. +При потоке данных в 1 ГБ/с оптимально использовать аналитическую таблицу с 1000 партиций. При этом создавать таблицы со значительным запасом партиций не стоит, т.к. это приведет к росту потребляемых ресурсов в кластере и итоговому замедлению скорости выполнения запросов. diff --git a/ydb/docs/ru/core/concepts/column-table.md b/ydb/docs/ru/core/concepts/column-table.md index 0f291bc3447..8318dac8a38 100644 --- a/ydb/docs/ru/core/concepts/column-table.md +++ b/ydb/docs/ru/core/concepts/column-table.md @@ -12,7 +12,7 @@ Вставка пакетов данных производится атомарно: данные будут записаны или во все партиции, или ни в одну. При чтении анализируются только полностью записанные в колоночные таблицы данные. -В большинстве случаев работа со колоночными таблицами {{ ydb-short-name }} аналогична работе со строковыми. Имеются следующие отличия: +В большинстве случаев работа с колоночными таблицами {{ ydb-short-name }} аналогична работе со строковыми. Имеются следующие отличия: * В качестве ключевых колонок можно использовать только NOT NULL колонки. * Данные партицируются не по первичному ключу, а по Hash от колонок [партицирования](#olap-tables-partitioning). @@ -25,7 +25,7 @@ * Фильтры Блума. * Change Data Capture. * Переименование таблиц. -* Пользовательские аттрибуты таблиц. +* Пользовательские атрибуты таблиц. * Изменение списка колонок данных в колоночных таблицах. * Добавление данных в колоночные таблицы с помощью SQL-оператора `INSERT`. * Удаление данных из колоночных таблиц с помощью SQL-оператора `DELETE`. Фактически, удаление возможно только по истечению TTL времени хранения данных. @@ -63,7 +63,7 @@ В отличие от строковых таблиц {{ ydb-short-name }}, колоночные таблицы партицируют данные не по первичным ключам, а по специально выделенным ключам — ключам партицирования. Ключи партицирования являются подмножеством первичных ключей таблицы. -В отличие от партицирования данных в строковых таблиц {{ ydb-short-name }}, партицирование данных для колоночных таблиц выполняется не по значениям ключей, а по hash-значениям от ключей, что позволяет равномерно распределить данные во все существующие партиции. Такое партицирование позволяет избежать хотспотов при вставке и ускоряет аналитические запросы, обрабатывающие (считывающие) большие объемы данных. +В отличие от партицирования данных в строковых таблицах {{ ydb-short-name }}, партицирование данных для колоночных таблиц выполняется не по значениям ключей, а по hash-значениям от ключей, что позволяет равномерно распределить данные во все существующие партиции. Такое партицирование позволяет избежать хотспотов при вставке и ускоряет аналитические запросы, обрабатывающие (считывающие) большие объемы данных. Выбор ключей партицирования существенно влияет на производительность колоночных таблиц. Подробнее смотрите [{#T}](../best_practices/pk-olap-scalability.md). diff --git a/ydb/docs/ru/core/deploy/manual/deploy-ydb-on-premises.md b/ydb/docs/ru/core/deploy/manual/deploy-ydb-on-premises.md index a6ab806af22..679ffaee73a 100644 --- a/ydb/docs/ru/core/deploy/manual/deploy-ydb-on-premises.md +++ b/ydb/docs/ru/core/deploy/manual/deploy-ydb-on-premises.md @@ -54,7 +54,7 @@ Для сертификатов узлов важно соответствие фактического имени хоста (или имён хостов) значениям, указанным в поле "Subject Alternative Name". Для сертификатов должны быть включены виды использования "Digital Signature, Key Encipherment" и расширенные виды использования "TLS Web Server Authentication, TLS Web Client Authentication". Необходимо, чтобы сертификаты узлов поддерживали как серверную, так и клиентскую аутентификацию (опция `extendedKeyUsage = serverAuth,clientAuth` в настройках OpenSSL). -Для пакетной генерации или обновления сертификатов кластера {{ ydb-short-name }} с помощью программного обеспечения OpenSSL можно воспользоваться [примером скрипта](https://github.com/ydb-platform/ydb/blob/main/ydb/deploy/tls_cert_gen/), размещённым в репозитории {{ ydb-short-name }} на Github. Скрипт позволяет автоматически сформировать необходимые файлы ключей и сертификатов для всего набора узлов кластера за одну операцию, облегчая подготовку к установке. +Для пакетной генерации или обновления сертификатов кластера {{ ydb-short-name }} с помощью программного обеспечения OpenSSL можно воспользоваться [примером скрипта](https://github.com/ydb-platform/ydb/blob/main/ydb/deploy/tls_cert_gen/), размещённым в репозитории {{ ydb-short-name }} на GitHub. Скрипт позволяет автоматически сформировать необходимые файлы ключей и сертификатов для всего набора узлов кластера за одну операцию, облегчая подготовку к установке. ## Создайте системного пользователя и группу, от имени которых будет работать {{ ydb-short-name }} {#create-user} @@ -474,7 +474,7 @@ sudo chmod 700 /opt/ydb/certs Незащищённый режим работы {{ ydb-short-name }} предназначен для решения тестовых задач, преимущественно связанных с разработкой и тестированием программного обеспечения {{ ydb-short-name }}. В незащищенном режиме: * трафик между узлами кластера, а также между приложениями и кластером использует незашифрованные соединения; -* не используется аутентификация пользователей (включение аутентификации при отсутстви шифрования трафика не имеет смысла, поскольку логин и пароль в такой конфигурации передавались бы через сеть в открытом виде). +* не используется аутентификация пользователей (включение аутентификации при отсутствии шифрования трафика не имеет смысла, поскольку логин и пароль в такой конфигурации передавались бы через сеть в открытом виде). Установка {{ ydb-short-name }} для работы в незащищенном режиме производится в порядке, описанном выше, со следующими исключениями: diff --git a/ydb/docs/ru/core/how_to_edit_docs/_includes/content.md b/ydb/docs/ru/core/how_to_edit_docs/_includes/content.md index 448c788f08b..e87e5c2c7bb 100644 --- a/ydb/docs/ru/core/how_to_edit_docs/_includes/content.md +++ b/ydb/docs/ru/core/how_to_edit_docs/_includes/content.md @@ -264,10 +264,10 @@ OpenSource документация по {{ ydb-short-name }} поддержив Текст внутри квадратных скобок, отображаемый при рендеринге документации, должен быть достаточно длинным, чтобы в него можно было легко попасть мышью или пальцем при клике. -Существуют ситуации, когда URL ресурса имеет самостоятельную ценность, и должен быть отображен в документации, например, в случае публикации ссылок на репозиторий в github. В таких случаях его необходимо дублировать как внутри квадратных скобок, так и внутри обычных, так как YFM, в отличие от стандартного Markdown, не распознает автоматом URL в тексте: +Существуют ситуации, когда URL ресурса имеет самостоятельную ценность, и должен быть отображен в документации, например, в случае публикации ссылок на репозиторий в GitHub. В таких случаях его необходимо дублировать как внутри квадратных скобок, так и внутри обычных, так как YFM, в отличие от стандартного Markdown, не распознает автоматом URL в тексте: ``` md -Github репозиторий {{ ydb-short-name }}: [{{ ydb-doc-repo }}]({{ ydb-doc-repo }}) +GitHub репозиторий {{ ydb-short-name }}: [{{ ydb-doc-repo }}]({{ ydb-doc-repo }}) ``` ## Картинки {#pictures} |