diff options
| author | github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> | 2026-06-08 21:32:47 +0000 |
|---|---|---|
| committer | GitHub <[email protected]> | 2026-06-08 21:32:47 +0000 |
| commit | bd2dd5673ee5a8f1a3e2d5566bf889d16afa389b (patch) | |
| tree | 3b7884040befede279b8f6c62e6dcd94fcc66e7c | |
| parent | da31258ef601d51d98bee0a08a1cb784bf245df5 (diff) | |
Auto-translate docs from PR #37955 (#42886)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
| -rw-r--r-- | ydb/docs/en/core/dev/streaming-query/enrichment.md | 62 | ||||
| -rw-r--r-- | ydb/docs/en/core/dev/streaming-query/index.md | 18 | ||||
| -rw-r--r-- | ydb/docs/en/core/recipes/streaming_queries/topics.md | 118 |
3 files changed, 114 insertions, 84 deletions
diff --git a/ydb/docs/en/core/dev/streaming-query/enrichment.md b/ydb/docs/en/core/dev/streaming-query/enrichment.md index 8a2f7fde1e5..13e79a5c753 100644 --- a/ydb/docs/en/core/dev/streaming-query/enrichment.md +++ b/ydb/docs/en/core/dev/streaming-query/enrichment.md @@ -1,24 +1,25 @@ # Data enrichment -**Data enrichment** means attaching additional information from a lookup to events in the stream. For example, an event may only contain an ID, while a lookup provides a name or other attributes. Lookups can come from a [local table](#enrichment-local-table) or from [S3 object storage](#enrichment-s3). +Data enrichment is adding additional information from a reference to events from a stream. For example, an event contains only an identifier, and the reference allows adding a name or other attributes to it. As a reference, you can use data from a [local table](#enrichment-local-table) or from an [S3 object storage](#enrichment-s3). -In [streaming queries](../../concepts/streaming-query.md), you attach a lookup with `JOIN`. The stream must be on the left, the lookup on the right. +In [streaming queries](../../concepts/streaming-query.md), the reference is connected using the `JOIN` construct. The stream must be on the left, the reference on the right. {% note warning %} -The entire lookup is loaded into memory when the query starts. If the lookup data changes, restart the query to pick up fresh data — delete it with [DROP STREAMING QUERY](../../yql/reference/syntax/drop-streaming-query.md) and create it again with [CREATE STREAMING QUERY](../../yql/reference/syntax/create-streaming-query.md). +The reference is fully loaded into memory when the query starts. If the data in the reference has changed, to get the current version of the reference, you need to restart the query — delete it using [DROP STREAMING QUERY](../../yql/reference/syntax/drop-streaming-query.md) and recreate it using [CREATE STREAMING QUERY](../../yql/reference/syntax/create-streaming-query.md). {% endnote %} -## Prepare a data source for topics +## Preparing a data source for working with topics + +Create an external data source for working with topics. A [secret](../../yql/reference/syntax/create-secret.md) is used to store the token, and the source is created using [CREATE EXTERNAL DATA SOURCE](../../yql/reference/syntax/create-external-data-source.md). -Create an external data source for topics. Store tokens in a [secret](../../yql/reference/syntax/create-secret.md) and create the source with [CREATE EXTERNAL DATA SOURCE](../../yql/reference/syntax/create-external-data-source.md). ```yql --- Secret with token for YDB +-- Секрет с токеном для подключения к YDB CREATE SECRET `secrets/ydb_token` WITH (value = "<ydb_token>"); --- YDB data source for reading/writing topics +-- Источник данных YDB для чтения/записи топиков CREATE EXTERNAL DATA SOURCE ydb_source WITH ( SOURCE_TYPE = "Ydb", LOCATION = "<ydb_endpoint>", @@ -28,16 +29,17 @@ CREATE EXTERNAL DATA SOURCE ydb_source WITH ( ); ``` + Where: -- `<ydb_endpoint>` — {{ ydb-short-name }} endpoint, for example `grpcs://<ydb_host>:2135`. -- `<db_name>` — path to the {{ ydb-short-name }} database, for example `/Root/database`. +- `<ydb_endpoint>` — endpoint of {{ ydb-short-name }}, for example `grpcs://<ydb_host>:2135`. +- `<db_name>` — path to the database {{ ydb-short-name }}, for example `/Root/database`. -## Streaming queries for enrichment +## Streaming queries for data enrichment -The examples below read events from an input topic, join each event with a service name from the lookup on `ServiceId`, and write the result to an output topic. +The queries in the examples below read events from the input topic, attach the service name from the reference by `ServiceId` to each event, and write the result to the output topic. -Functions used in the queries: +More details about the functions used in the queries: - [TableRow](../../yql/reference/builtins/basic.md#tablerow) - [Yson::From](../../yql/reference/udf/list/yson.md#ysonfrom) @@ -47,15 +49,16 @@ Functions used in the queries: ### Enrichment from a local table {#enrichment-local-table} -Here the lookup is stored in table `services_dict` in the current database ([table](../../concepts/datamodel/table.md)). +In this example, the reference is stored in the [table](../../concepts/datamodel/table.md) `services_dict` in the current database. + +Create a [streaming query](../../concepts/streaming-query.md) that performs enrichment: -Create a [streaming query](../../concepts/streaming-query.md) that performs the enrichment: ```yql CREATE STREAMING QUERY query_with_table_join AS DO BEGIN --- Read events from input topic +-- Чтение событий из входного топика $topic_data = SELECT * FROM @@ -69,7 +72,7 @@ WITH ( ) ); --- Join lookup to stream on ServiceId +-- Присоединение справочника к потоку по ServiceId $joined_data = SELECT s.Name AS Name, t.* @@ -80,7 +83,7 @@ LEFT JOIN ON t.ServiceId = s.ServiceId; --- Write to output topic (JSON) +-- Запись в выходной топик (JSON) INSERT INTO ydb_source.output_topic SELECT @@ -91,14 +94,16 @@ FROM END DO ``` + ### Enrichment from S3 {#enrichment-s3} -The lookup is stored in S3 and connected through an [external data source](../../concepts/query_execution/federated_query/s3/external_data_source.md). +The reference is stored in S3 and connected via an [external data source](../../concepts/query_execution/federated_query/s3/external_data_source.md). + +Create an additional [external data source](../../yql/reference/syntax/create-external-data-source.md) to read the reference from S3: -Create another [external data source](../../yql/reference/syntax/create-external-data-source.md) to read the lookup from S3: ```yql --- S3 data source for lookup data +-- Источник данных S3 для чтения справочника CREATE EXTERNAL DATA SOURCE s3_source WITH ( SOURCE_TYPE = "ObjectStorage", LOCATION = "<s3_endpoint>", @@ -106,17 +111,19 @@ CREATE EXTERNAL DATA SOURCE s3_source WITH ( ); ``` + Where: -- `<s3_endpoint>` — S3 URL, for example `https://storage.yandexcloud.net/<bucket>/` in Yandex Cloud. +- `<s3_endpoint>` — URL of the S3 storage, for example `https://storage.yandexcloud.net/<bucket>/` for Yandex Cloud. + +Create a [streaming query](../../concepts/streaming-query.md) that performs enrichment: -Create a [streaming query](../../concepts/streaming-query.md) that performs the enrichment: ```yql CREATE STREAMING QUERY query_with_join AS DO BEGIN --- Read events from input topic +-- Чтение событий из входного топика $topic_data = SELECT * FROM @@ -130,7 +137,7 @@ WITH ( ) ); --- Read service lookup from S3 +-- Чтение справочника сервисов из S3 $s3_data = SELECT * FROM @@ -143,7 +150,7 @@ WITH ( ) ); --- Join lookup to stream on ServiceId +-- Присоединение справочника к потоку по ServiceId $joined_data = SELECT s.Name AS Name, t.* @@ -154,7 +161,7 @@ LEFT JOIN ON t.ServiceId = s.ServiceId; --- Write JSON to output topic +-- Запись результата в выходной топик в формате JSON INSERT INTO ydb_source.output_topic SELECT @@ -165,4 +172,5 @@ FROM END DO ``` -For supported data formats (`json_each_row`, `csv_with_names`, etc.), see [{#T}](streaming-query-formats.md). + +More details about data formats (`json_each_row`, `csv_with_names`, etc.): [{#T}](streaming-query-formats.md). diff --git a/ydb/docs/en/core/dev/streaming-query/index.md b/ydb/docs/en/core/dev/streaming-query/index.md index 0cf1cca4418..8376cb6f1d1 100644 --- a/ydb/docs/en/core/dev/streaming-query/index.md +++ b/ydb/docs/en/core/dev/streaming-query/index.md @@ -1,15 +1,15 @@ # Streaming queries -Practical guidance for working with [streaming queries](../../concepts/glossary.md#streaming-query): +Practical aspects of working with [streaming queries](../../concepts/glossary.md#streaming-query): -- [Common patterns](patterns.md) — minimal examples to get started quickly -- [Writing to tables](table-writing.md) — how streaming queries write into {{ ydb-short-name }} tables in near real time -- [Data enrichment](enrichment.md) — enriching the stream using external sources -- [Topic read/write formats](streaming-query-formats.md) — supported formats when working with topics and usage examples -- [Delivery guarantees](guarantees.md) — guarantee levels, windowing anomalies, and recommendations -- [Checkpoints](checkpoints.md) — persisting processing state for fault tolerance and recovery +- [Typical patterns](patterns.md) — minimal examples for quick start +- [Writing to tables](table-writing.md) — how streaming queries allow writing data to {{ ydb-short-name }} tables in real time. +- [Data enrichment](enrichment.md) — methods of enriching data in a stream using external sources. +- [Data formats for reading/writing topics](streaming-query-formats.md) — supported data formats when working with topics, examples of their use. +- [Data delivery guarantees](guarantees.md) — level of guarantees, observed anomalies in window aggregation, and recommendations. +- [Checkpoints](checkpoints.md) — a mechanism for saving the state of stream processing to ensure fault tolerance and recovery capability. ## See also -- [Recipes for streaming queries](../../recipes/streaming_queries/index.md) -- [Streaming queries overview](../../concepts/streaming-query.md) +- [Recipes for working with streaming queries](../../recipes/streaming_queries/index.md) +- [Description of streaming queries](../../concepts/streaming-query.md) diff --git a/ydb/docs/en/core/recipes/streaming_queries/topics.md b/ydb/docs/en/core/recipes/streaming_queries/topics.md index a3f2f2e585b..ed2e0080fe4 100644 --- a/ydb/docs/en/core/recipes/streaming_queries/topics.md +++ b/ydb/docs/en/core/recipes/streaming_queries/topics.md @@ -1,32 +1,32 @@ -# Quickstart: reading and writing topics +# Quick start: reading and writing to topics -This tutorial walks you through your first [streaming query](../../concepts/streaming-query.md). +In this guide, you will create your first [streaming query](../../concepts/streaming-query.md). The query will: -- Read events from an input [topic](../../concepts/datamodel/topic.md); -- Keep only errors; -- Count errors per server over 10-minute windows; -- Write results to an output topic. +- read events from the input [topic](../../concepts/datamodel/topic.md) +- filter only errors +- count the number of errors per server over 10 minutes +- write the result to the output topic. -Events are JSON with timestamp, log level, and host name. +Events arrive in JSON format with fields: time, logging level, and server name. -Steps: +You will perform the following steps: -* [Create topics](#step1); -* [Create an external data source](#step2); -* [Create the streaming query](#step3); -* [Check query state](#step4); -* [Produce sample input](#step5); -* [Read the output topic](#step6); -* [Delete the streaming query](#step7). +* [creating topics](#step1) +* [creating an external data source](#step2) +* [creating a streaming query](#step3) +* [viewing the query status](#step4) +* [populating the input topic with data](#step5) +* [checking the output topic contents](#step6) +* [deleting the streaming query](#step7). ## Prerequisites {#requirements} -You need: +To run the examples, you will need: -* A running {{ ydb-short-name }} database — see [quick start](../../quickstart.md); -* Feature flags `enable_external_data_sources` and `enable_streaming_queries` enabled. +* a running {{ ydb-short-name }} database — see [quick start](../../quickstart.md) +* enabled flags `enable_external_data_sources` and `enable_streaming_queries`. {% list tabs %} @@ -56,24 +56,29 @@ You need: {% include [ydb-cli-profile](../../_includes/ydb-cli-profile.md) %} -## Step 1. Create topics {#step1} +## Step 1. Creating topics {#step1} + +Create the input and output [topics](../../concepts/datamodel/topic.md): -Create input and output [topics](../../concepts/datamodel/topic.md): ```sql CREATE TOPIC input_topic; CREATE TOPIC output_topic; ``` -Verify: + +Verify that the topics are created: + ```bash ./ydb --profile quickstart scheme ls ``` -## Step 2. Create an external data source {#step2} -Create an [external data source](../../concepts/datamodel/external_data_source.md) with [CREATE EXTERNAL DATA SOURCE](../../yql/reference/syntax/create-external-data-source.md): +## Step 2. Creating an external data source {#step2} + +Create an [external data source](../../concepts/datamodel/external_data_source.md) using [CREATE EXTERNAL DATA SOURCE](../../yql/reference/syntax/create-external-data-source.md): + ```sql CREATE EXTERNAL DATA SOURCE ydb_source WITH ( @@ -84,15 +89,17 @@ CREATE EXTERNAL DATA SOURCE ydb_source WITH ( ); ``` + {% note info %} -Set `LOCATION` and `DATABASE_NAME` to match your {{ ydb-short-name }} deployment. +Specify the `LOCATION` and `DATABASE_NAME` values that correspond to your {{ ydb-short-name }} database. {% endnote %} -## Step 3. Create the streaming query {#step3} +## Step 3. Creating a streaming query {#step3} + +Create a [streaming query](../../concepts/streaming-query.md) using [CREATE STREAMING QUERY](../../yql/reference/syntax/create-streaming-query.md): -Create a [streaming query](../../concepts/streaming-query.md) with [CREATE STREAMING QUERY](../../yql/reference/syntax/create-streaming-query.md): ```sql CREATE STREAMING QUERY query_example AS @@ -101,7 +108,7 @@ DO BEGIN $number_errors = SELECT Host, COUNT(*) AS ErrorCount, - CAST(HOP_START() AS String) AS Ts -- Window start time for the aggregate row + CAST(HOP_START() AS String) AS Ts -- Время начала окна, соответствующего результату агрегации FROM ydb_source.input_topic WITH ( @@ -115,28 +122,30 @@ WITH ( WHERE Level = "error" GROUP BY - HOP(CAST(Time AS Timestamp), "PT600S", "PT600S", "PT0S"), -- Non-overlapping 10-minute windows + HOP(CAST(Time AS Timestamp), "PT600S", "PT600S", "PT0S"), -- Число ошибок на неперекрывающихся окнах длиной 10 минут Host; INSERT INTO ydb_source.output_topic SELECT - ToBytes(Unwrap(Yson::SerializeJson(Yson::From(TableRow())))) -- Serialize columns to JSON + ToBytes(Unwrap(Yson::SerializeJson(Yson::From(TableRow())))) -- Сериализация всех колонок в JSON FROM $number_errors; END DO ``` -More detail: -- `GROUP BY HOP` and `HOP_START` — [{#T}](../../yql/reference/syntax/select/group-by.md#group-by-hop). -- Writing to topics — [{#T}](../../dev/streaming-query/streaming-query-formats.md#write_formats). -- JSON serialization: [TableRow](../../yql/reference/builtins/basic#tablerow), [Yson::From](../../yql/reference/udf/list/yson#ysonfrom), [Yson::SerializeJson](../../yql/reference/udf/list/yson#ysonserializejson), [Unwrap](../../yql/reference/builtins/basic#unwrap), [ToBytes](../../yql/reference/builtins/basic#to-from-bytes). +Details: + +- Aggregation `GROUP BY HOP` and function `HOP_START` — [{#T}](../../yql/reference/syntax/select/group-by.md#group-by-hop). +- Writing data to a topic — [{#T}](../../dev/streaming-query/streaming-query-formats.md#write_formats). +- Serialization to JSON: [TableRow](../../yql/reference/builtins/basic#tablerow), [Yson::From](../../yql/reference/udf/list/yson#ysonfrom), [Yson::SerializeJson](../../yql/reference/udf/list/yson#ysonserializejson), [Unwrap](../../yql/reference/builtins/basic#unwrap), [ToBytes](../../yql/reference/builtins/basic#to-from-bytes). -## Step 4. Check query state {#step4} +## Step 4. Viewing the query status {#step4} + +Check the query status via the [streaming_queries](../../dev/system-views.md#streaming_queries) system table: -Inspect the `.sys/streaming_queries` system view [{#T}](../../dev/system-views.md#streaming_queries): ```sql SELECT @@ -148,13 +157,15 @@ FROM `.sys/streaming_queries` ``` -`Status` should be `RUNNING`. Otherwise inspect `Issues`. -If the query is `SUSPENDED` or `Issues` contains errors, see troubleshooting documentation. +Make sure that the `Status` field has the value `RUNNING`. Otherwise, check the `Issues` field. + +If the query is in the `SUSPENDED` status or there are errors in the `Issues` field, refer to the error diagnostics section. -## Step 5. Produce sample input {#step5} +## Step 5. Populating the input topic with data {#step5} + +Write test messages to the topic using the [{{ ydb-short-name }} CLI](../../reference/ydb-cli/index.md): -Write test messages with [{{ ydb-short-name }} CLI](../../reference/ydb-cli/index.md): ```bash echo '{"Time": "2025-01-01T00:00:00.000000Z", "Level": "error", "Host": "host-1"}' | ./ydb --profile quickstart topic write input_topic @@ -164,34 +175,45 @@ echo '{"Time": "2025-01-01T00:12:00.000000Z", "Level": "error", "Host": "host-2" echo '{"Time": "2025-01-01T00:12:00.000000Z", "Level": "error", "Host": "host-1"}' | ./ydb --profile quickstart topic write input_topic ``` -Results appear in the output topic after the 10-minute aggregation window closes. -## Step 6. Read the output topic {#step6} +The result will appear in the output topic after the 10-minute aggregation window closes. + +## Step 6. Checking the output topic contents {#step6} + +Read data from the output topic: + ```bash ./ydb --profile quickstart topic read output_topic --partition-ids 0 --start-offset 0 --limit 10 --format newline-delimited ``` -Expected output: + +Expected result: + ```json {"ErrorCount":1,"Host":"host-2","Ts":"2025-01-01T00:00:00Z"} {"ErrorCount":2,"Host":"host-1","Ts":"2025-01-01T00:00:00Z"} ``` -## Step 7. Delete the query {#step7} + +## Step 7. Deleting the query {#step7} + +Delete the query using [DROP STREAMING QUERY](../../yql/reference/syntax/drop-streaming-query.md): + ```sql DROP STREAMING QUERY query_example; ``` -## Next steps {#next-steps} -- [Data formats](../../dev/streaming-query/streaming-query-formats.md) supported in streaming queries. -- [Enrich with a lookup](../../dev/streaming-query/enrichment.md) from a local table or S3. -- [Write results to tables](../../dev/streaming-query/table-writing.md). +## What's next {#next-steps} + +- Explore the [data formats](../../dev/streaming-query/streaming-query-formats.md) supported in streaming queries. +- Learn how to [enrich data with a reference](../../dev/streaming-query/enrichment.md) from a local table or from S3. +- Learn how to [write results to tables](../../dev/streaming-query/table-writing.md). ## See also -* [{#T}](../../concepts/streaming-query.md); +* [{#T}](../../concepts/streaming-query.md) * [{#T}](../../dev/streaming-query/streaming-query-formats.md). |
