summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorgithub-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>2026-07-01 06:38:52 +0000
committerGitHub <[email protected]>2026-07-01 06:38:52 +0000
commitef2d3050e43ca4bdd410fdedb2aa69a456e5bb7a (patch)
tree0b9f1000f85356b55287ab55a7e229e9fcdeb816
parent013d0f2d43328d5cd91c81a281fef74e922a03b2 (diff)
Critic fixes for #43860 (#43866)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
-rw-r--r--ydb/docs/en/core/dev/secondary-indexes.md96
1 files changed, 91 insertions, 5 deletions
diff --git a/ydb/docs/en/core/dev/secondary-indexes.md b/ydb/docs/en/core/dev/secondary-indexes.md
index d0fb72803d3..8b65d925a2d 100644
--- a/ydb/docs/en/core/dev/secondary-indexes.md
+++ b/ydb/docs/en/core/dev/secondary-indexes.md
@@ -4,7 +4,7 @@
Data in a {{ ydb-short-name }} table is always sorted by the primary key. That means that retrieving any entry from the table with specified field values comprising the primary key always takes the minimum fixed time, regardless of the total number of table entries. Indexing by the primary key makes it possible to retrieve any consecutive range of entries in ascending or descending order of the primary key. Execution time for this operation depends only on the number of retrieved entries rather than on the total number of table records.
-To use a similar feature with any field or combination of fields, additional indexes called **secondary indexes** can be created for them
+To use a similar feature with any field or combination of fields, additional indexes called **secondary indexes** can be created for them.
In transactional systems, indexes are used to limit or avoid performance degradation and increase of query cost as your data grows.
@@ -16,7 +16,7 @@ A secondary index is a data schema object that can be defined when creating a ta
The [`table index add` command](../reference/ydb-cli/commands/secondary_index.md#add) is supported in the {{ ydb-short-name }} CLI.
-Since an index contains its own data derived from table data, when creating an index on an existing table with data, an operation is performed to initially build an index. This may take a long time. This operation is executed in the background and you can keep working with the table while it's in progress. However, you can't use the new index until it's build is completed.
+Since an index contains its own data derived from table data, when creating an index on an existing table with data, an operation is performed to initially build an index. This may take a long time. This operation is executed in the background and you can keep working with the table while it's in progress. However, you can't use the new index until its build is completed.
An index can only be used in the order of the fields included in it. If an index contains two fields, such as `a` and `b`, you can effectively use it for queries such as:
@@ -34,8 +34,7 @@ This index can't be used in the following queries:
To effectively limit the result of a query using the [`LIMIT` command](../yql/reference/syntax/select/limit_offset.md), you must use one-way sorting in the [`ORDER BY` command](../yql/reference/syntax/select/order_by.md):
- `WHERE a = $var1 AND b > $var2 ORDER BY a, b DESC LIMIT 1`, all index rows that match the filter conditions will be read;
-
-- `WHERE a = $var1 AND b > $var2 ORDER BY a DESC, b DESC LIMIT 1`, only one line will be read.
+- `WHERE a = $var1 AND b > $var2 ORDER BY a DESC, b DESC LIMIT 1`, only one row will be read.
Considering the above, there's no use in pre-indexing all possible combinations of table columns to speed up the execution of any query. An index is always a compromise between the lookup and write speed and the storage space occupied by the data. Indexes are created for specific queries and search criteria made by an app in the database.
@@ -43,6 +42,7 @@ Considering the above, there's no use in pre-indexing all possible combinations
For a table to be accessed by a secondary index, its name must be explicitly specified in the `VIEW` section after the table name as described in the article about the YQL [`SELECT` statement](../yql/reference/syntax/select#secondary_index). For example, a query to retrieve orders from the `orders` table by the specified customer ID (`id_customer`) looks like this:
+
```yql
DECLARE $customer_id AS Uint64;
@@ -51,12 +51,94 @@ FROM orders VIEW idx_customer AS o
WHERE o.id_customer = $customer_id
```
+
Where `idx_customer` is the name of the secondary index on the `orders` table with the `id_customer` field specified first.
If no `VIEW` section is specified, making a query like this results in a full scan of the `orders` table.
In transactional applications, such information queries are executed with paginated data output. This eliminates an increase in the cost and time of query execution if the number of entries that meet the filtering conditions grows. The described approach to writing [paginated queries](../dev/paging.md) using the primary key can also be applied to columns that are part of a secondary index.
+An experimental capability to automatically select a secondary index for use in a query is also available. The selection algorithm is currently rule-based and uses only the query text to automatically select a secondary index.
+
+### Automatic use of indexes when selecting data
+
+Explicitly specifying the `VIEW` section takes priority over the optimizer's decision to use secondary indexes. That is, the query
+
+
+```yql
+SELECT * FROM `Table` VIEW Index
+```
+
+
+is guaranteed to perform selection using the `Index` index.
+
+To explicitly specify reading using the primary key, use the following construct:
+
+
+```yql
+SELECT * FROM `Table` VIEW PRIMARY KEY
+```
+
+
+#### Criteria for selecting a secondary index
+
+The index used for reading is selected during query optimization when determining the row ranges that need to be read (predicate pushdown). Indexes, like the main table, are a set of rows ordered by a set of key columns.
+
+Selection between reading using an index and reading using the primary key is based on the following metrics:
+
+1. Need for additional reads from the main table. If the index contains all columns required for the query, no additional reads are needed.
+2. Length of the point prefix of the predicate for the corresponding table's key. That is, the predicate constrains a set of columns that are the first components of the key with point conditions: `=`, `IN`, `IS NULL`. Priority is given to indexes for which all indexed columns are fixed, or to the main table if the entire primary key is point-fixed.
+3. Number of columns used in the read range bounds. In the following query to the `Table` table with primary key (Key1, Key2, Key3)
+
+
+```yql
+SELECT * FROM `Table` WHERE (Key1, Key2, Key3) < ($param1, $param2, $param3) AND (Key1, Key2) > ($param4, $param5)
+```
+
+
+reading will be performed in the range `(($param4, $param5), ($param1, $param2, $param3))`, and thus the number of used columns will be 3. Similarly, preference is given to indexes for which all indexed columns are used.
+
+Read methods are ranked against each other according to criterion 2, with ties broken by criterion 3, and criterion 1 is additionally taken into account.
+
+#### Examples of automatic index selection
+
+
+```yql
+CREATE TABLE `Table` (
+ Key Int32,
+ SubKey1 Int32,
+ SubKey2 String,
+ Value1 String,
+ Value2 String,
+ PRIMARY KEY (Key, SubKey1, SubKey2),
+ INDEX Index12 GLOBAL ON (SubKey1, SubKey2),
+ INDEX Index21 GLOBAL ON (SubKey2, Value1),
+ INDEX Index212 GLOBAL ON (SubKey2) COVER (Value2)
+);
+
+```
+
+
+`SELECT * FROM Table WHERE SubKey1 = $p1 and SubKey2 > $p2` — `Index12` will be used. The range expression is `(($p1; $p2), ($p1)]`. The point prefix length for `Index12` is 1; for other indexes, it is 0.
+
+`SELECT * FROM Table WHERE Key = $p1 and SubKey1 = $p2 And SubKey2 = $p2` — no index will be used. When selecting a scan of the main table, all 3 columns are used: `[Key, Fk1, Fk2]`, and the point prefix length is 3.
+
+`SELECT * FROM Table WHERE Key = $p1 and SubKey2 = $p2` — secondary indexes will not be used. When selecting any secondary index, 1 column is used, and the point prefix is also at most 1 for any index choice.
+
+`SELECT * FROM Table WHERE Key >= $p1 and SubKey1 = $p2 And SubKey2 = $p3` — `Index12` should be selected, because when it is selected, the resulting range `[[Fk1; Fk2; Key], [Fk1; Fk2])` yields a point prefix length of 2, and 3 columns are used.
+
+`SELECT * FROM Table WHERE Key = 2 and SubKey2 = 3` — secondary indexes should not be used. When reading by `PK` or using any of the secondary indexes, the point prefix consists of at most one column. Also, at most one column is used.
+
+`SELECT * FROM Table WHERE SubKey1 > 2` — `Index12` should be selected. Only when using `Index12` will the read range be non-trivial.
+
+`SELECT * FROM Table WHERE SubKey2 = 2` — either `Index21` or `Index212` may be selected. When using the indexes mentioned above, the point prefix length will be 1. The number of used columns is also maximized when selecting `Index21` and `Index212`.
+
+`SELECT Value2 FROM Table WHERE SubKey2 = 2` — `Index212` should be selected. When using `Index21` and `Index212`, the point prefix length will be 1, but when using `Index212`, no read from the main table is required.
+
+`SELECT * FROM Table WHERE SubKey2 > 2` — `Index21` or `Index212` will be used, because the read range is non-trivial only when using them.
+
+`SELECT * FROM Table WHERE SubKey1 = 2` — `Index12` will be used, because when using it, the point prefix length will be 1, whereas in other cases it is 0.
+
## Checking the cost of queries {#cost}
Any query made in a transactional application should be checked in terms of the number of I/O operations it performed in the database and how much CPU was used to run it. You should also make sure these indicators don't continuously grow as the database volume grows. {{ ydb-short-name }} returns statistics required for the analysis after running each query.
@@ -65,10 +147,11 @@ If you use the {{ ydb-short-name }} CLI, select the `--stats` option to enable p
## Updating data using a secondary index {#update}
-The [`UPDATE`](../yql/reference/syntax/update.md), [`UPSERT`](../yql/reference/syntax/upsert_into.md), and [`REPLACE`](../yql/reference/syntax/replace_into.md) YQL statements don't permit specifying a secondary index to perform a search for data, so an attempt to make an `UPDATE ... WHERE indexed_field = $value` will result in a full scan of the table. To avoid this, you can first run `SELECT` by index to get the primary key value and then `UPDATE` by the primary key. You can also use `UPDATE ON`.
+The [`UPDATE`](../yql/reference/syntax/update.md), [`UPSERT`](../yql/reference/syntax/upsert_into.md), and [`REPLACE`](../yql/reference/syntax/replace_into.md) YQL record modification statements don't permit specifying a secondary index to perform a search for data, so an attempt to make an `UPDATE ... WHERE indexed_field = $value` will result in a full scan of the table. To avoid this, you can first run `SELECT` by index to get the primary key value and then run `UPDATE` by the primary key. You can also use `UPDATE ON`.
To update data in the `table1` table, run the query:
+
```yql
$to_update = (
SELECT pk_field, $f1 AS field1, $f2 AS field2, ...
@@ -78,6 +161,7 @@ $to_update = (
UPDATE table1 ON SELECT * FROM $to_update
```
+
{% note info %}
Currently, data updating is possible only using a synchronous secondary index. This limitation exists because data modification is permitted only in [Serializable](../concepts/transactions.md#modes) transactions, and accessing asynchronous indices would violate the guarantees of this transaction mode.
@@ -90,6 +174,7 @@ To delete data by secondary index, use `SELECT` with a predicate by secondary in
To delete all data about series with zero views from the `series` table, run the query:
+
```yql
DELETE FROM series ON
SELECT series_id
@@ -97,6 +182,7 @@ FROM series VIEW views_index
WHERE views = 0;
```
+
{% note info %}
Currently, deleting data is possible only using a synchronous secondary index. This is because data removal is permitted only in [Serializable](../concepts/transactions.md#modes) transactions, and accessing asynchronous indices would violate the guarantees of this transaction mode.