ROADMAP.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139

# YDB Roadmap
## Intro
The document contains high-level roadmap for YDB. Take a look at [👑 Epics Project](https://github.com/orgs/ydb-platform/projects/46/) also.
## Legend
We use the following symbols as abbreviations:

1. ㉓ - feature appeared in the Roadmap for 2023;
1. ㉔ - feature appeared in the Roadmap for 2024;
1. ✅ - feature has been released;
1. 🚧 - feature is partially available and is under development;
1. ❌ - feature has been refused;
1. 🔥 - not yet released, but we are in rush.

## Query Processor

1. ㉔ **Unique secondary indexes**
1. ㉔ Apply **indexes automatically** to optimize data fetching
1. ㉔ **Default values** for table columns
1. ㉔ **Asynchronous LLVM JIT** query compilation
1. ㉔ **Parameters in DECLARE clause are becoming optional**, better SQL compatibility
1. 🚧㉔ **Cost-based optimizer** for join order and join algorithm selection
1. ㉔ **``INSERT INTO table FROM SELECT``** for large datasets
1. ㉔ Support for **transactional writes into both row and column tables**
1. ㉔ Support for **computed columns in a table**
1. ㉔ Support for **temporary tables**
1. ㉔ Support for **VIEW** SQL clause
1. ㉔ **Data Spilling** in case there is issufient amount of RAM
1. ㉔ **TPC-H, TPC-DS for 100TB** dataset
1. ✅ ㉓ Support for **Snapshot Readonly** transactions mode
1. 🚧 ㉓ **Better resource management** for KQP Resource Manager (share information about nodes resources, avoid OOMs)
1. ✅ ㉓ Switch to **New Engine** for OLTP queries
1. ✅ ㉓ Support **`not null` for PK (primary key) table columns**
1. ✅ ㉓ **Aggregates and predicates push down to column-oriented tables**
1. ✅ ㉓ **Optimize data formats** for data transition between query phases
1. ✅ ㉓ **Index Rename/Rebuild**
1. ✅ ㉓ **KQP Session Actor** as a replacement for KQP Worker Actor (optimize to reduce CPU usage)
1. **PostgreSQL compatibility**
    * ✅ ㉓ Support PostgreSQL datatypes **serialization/deserialization** in YDB Public API
    * 🚧 ㉓ PostgreSQL compatible **query execution** (TPC-C, TPC-H queries should work)
    * ✅ ㉓ Support for PostgreSQL **wire protocol**
1. ㉓ Support a single **Database connection string** instead of multiple parameters
1. ㉓ Support **constraints in query optimizer**
1. **Query Processor 3.0** (a set of tasks to be more like traditional database in case of query execution functionality)
    * ㉓ Support for **Streaming Lookup Join** via MVCC snapshots (avoid distributed transactions, scalability is better)
    * ㉓ **Universal API call for DML, DDL with unlimited results size for OLTP/OLAP workload** (aka ExecuteQuery)
    * ✅ ㉓ Support for **secondary indexes in ScanQuery**
    * ✅ ㉓ **Transaction can see its own updates** (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction)
1. ✅ ㉓ **Computation graphs caching (compute/datashard programs)** (optimize CPU usage)
1. 🚧 ㉓ **RPC Deadline & Cancellation propagation** (smooth timeout management)
1. ✅ ㉓ **DDL for column-oriented tables**

## Database Core (Tablets, etc)
1. ✅ ㉔ **Exact Nearest Neighbor Vector Search**
1. ㉔ **Approximate Nearest Neighbor Vector Search**. [Global vector index](https://github.com/ydb-platform/ydb/issues/8967)
1. ㉔ **Volatile transactions**. YDB Distributed transactions 2.0, minimize network round trips in happy path
1. ㉔ **Table statistics** for cost-based optimizer
1. ㉔ **Memory optimization for row tables** (avoid full [SST index loading](https://github.com/ydb-platform/ydb/issues/1483), dynamic cache adjusting)
1. ㉔ Reduce minimum requirements for **the number of cores to 2** for YDB node
1. ㉔ **Incremental backup** and **Point-in-time recovery**
1. ㉔ **``ALTER CHANGEFEED``**
1. ㉔ **Async Replication** between YDB databases (column tables, topics)
1. ㉔ **Async Replication** between YDB databases (schema changes)
1. ㉔ Support for **Debezium** format
1. ㉔ **Topics autoscaling** (increase/decrease number of partitions in the topic automatically)
1. ㉔ **Extended Kafka API** protocol to YDB Topics support (balance reads, support for v19)
1. ㉔ **Schema for YDB Topics**
1. ㉔ **Message-level parallelism** in YDB Topics
1. ✅ ㉓ Get **YDB topics** (aka pers queue, streams) ready for production
1. ✅ ㉓ Turn on **MVCC support** by default
1. ✅ ㉓ Enable **Snapshot read mode** by default (take and use MVCC snapshot for reads instead of running distributed transaction for reads)
1. ✅ ㉓ **Change Data Capture** (be able to get change feed of table updates)
1. 🔥 ㉓ **Async Replication** between YDB databases  (first version, row tables, w/o schema changes)
1. ✅ ㉓ **Background compaction for DataShards**
1. ✅ ㉓ **Compressed Backups**. Add functionality to compress backup data
1. ㉓ Process of **Extending State Storage** without cluster downtime. If a cluster grows from, say, 9 nodes to 900 State Storage configuration stays the same (9 nodes), it leads to a performance bottleneck.
1. **Split/Merge DataShards *BY LOAD* by default**. Most users require this feature turned on by default
1. ✅ ㉓ Support **PostgreSQL datatypes** in tablet local database
1. **Basic histogram for DataShards** (first step towards cost based optimizations)
1. ✅ ㉓ **Transaction can see its own updates** (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction)
1. ㉓ **Data Ingestion from topic to table** (implement built-in compatibility to ingest data to YDB tables from topics)
1. ㉓ Support **snapshot read over read replicas** (consistent reads against read replicas)
1. ㉓ 🚧 **Transactions between topics and tables**
1. ✅ ㉓ Support for **Kafka API compatible protocol** to YDB Topics

### Hardcore or system wide
1. ㉔ **Tracing** capabilities
1. ㉔ Automatically **balance tablet channels** via BlobStorage groups
1. ✅ ㉓ **Datashard iterator reads via MVCC**
1. ❌ *(refused)* ㉓ **Switch to TRope** (or don't use TString/std::string directly, provide zero-copy data passing between components)
1. ㉓ **Avoid Node Broker as SPF** (NBS must work without Node Broker under emergency conditions)
1. ㉓ **Subscriptions in SchemeBoard** (optimize interaction with SchemeBoard via subsription to updates)

## Security
1. ✅ ㉓ Basic LDAP Support
1. ㉔ Support for OpenID Connect
1. ㉔ Authentication via KeyCloack
1. ㉔ Support for SASL framework

## BlobStorage
1. ㉔ BlobStorage **latency optimization** (p999), less CPU consumption
1. ㉔ **ActorSystem performance optimizations**
1. ㉔ Optimize **ActorSystem for ARM processors**
1. ㉔ **Effortless initial cluster deployment** (provide only nodes and disks description)
1. ㉔ **Reduce number of BlobStorage groups** for a database (add ability to remove unneeded groups)
1. ㉓ **"One leg" storage migration without downtime** (migrate 1/3 of the cluster from one AZ to another for mirror3-dc erasure encoding)
1. ✅ ㉓ **ActorSystem 1.5** (dynamically reassign threads in different thread pools)
1. ✅ ㉓ **Publish an utility for BlobStorage management** (it's called ds_tool for now, improve it and open)
1. ㉓ **Self-heal for degrated BlobStorage groups** (automatic self-heal for groups with two broken disks, get VDisk Donors production ready)
1. ㉓ **BlobDepot** (a component for smooth blobs management between groups)
1. ㉓ **Avoid BSC (BlobStorage Controller) as SPF** (be able to run the cluster without BSC in emergency cases)
1. ㉓ **BSC manages static group** (reconfiguration of the static BlobStorage group must be done BlobStorage Controller as for any other group)
1. ㉓ **(Semi-)Hard disk space separation** (Better guarantees for disk space usage by VDisks on a single PDisk)
1. ㉓ **Reduce space amplification** (Optimize storage layer)
1. ✅ ㉓ **Storage nodes decommission** (Add ability to remove storage nodes)

## Analytical Capabilities
1. ㉔ **Backup** for column tables
1. ㉔ Column tables **autosharding**
1. ㉓ 🚧 **Log Store** (log friendly column-oriented storage which allows to create 1+ million tables for logs storing)
1. ㉓ 🚧 **Column-oriented Tables** (introduce a Column-oriented tables in additon to Row-orinted tables)
1. ㉓ **Tiered Storage for Column-oriented Tables** (with the ability to store the data in S3)

## Federated Query
1. ✅ ㉓ **Run the first version**

## Embedded UI
Detailed roadmap could be found at [YDB Embedded UI repo](https://github.com/ydb-platform/ydb-embedded-ui/blob/main/ROADMAP.md).

## Command Line Utility
1. 🚧 ㉓ Use a **single `ydb yql`** instead of `ydb table query` or `ydb scripting`
1. ✅ ㉓ Interactive CLI

## Tests and Benchmarks
1. ㉓ **Built-in load test for DataShards** in YCSB manner
1. ✅ ㉓ **`ydb workload` for topics**
1. ✅ ㉔ **Jepsen tests support** [Blog post](https://blog.ydb.tech/hardening-ydb-with-jepsen-lessons-learned-e3238a7ef4f2)

## Experiments
1. ❌ *(refused)* Try **RTMR-tablet** for key-value workload