YT-25951: Introduce the OffshoreDataGateway component - ydb

diff options

author	Pavel Bashkirov <[email protected]>	2026-05-15 01:31:11 +0300
committer	robot-piglet <[email protected]>	2026-05-15 02:10:41 +0300
commit	e07b7f1cc70445eb5b848b71b85fba285045d2f2 (patch)
tree	ab4c2067772c4e9ed1f0d54bba0a8f98bcf7eb0f /library/cpp/threading/equeue
parent	4e50aca37d1b5ca6ddb1aa7c7ac23a22275fef41 (diff)

YT-25951: Introduce the OffshoreDataGateway component

* Changelog entry Type: feature Component: offshore data gateway This PR adds the `OffshoreDataGateway` component which is used when reading data from S3. This component is also integrated into the replication reader to actually perform the reads. One of the most important topics to note in this work is the introduction of the new proto message `TChunkReplicaSpec` and its usage in the `OffshoreDataGateway`'s RPC requests. This structure allows us to pass the medium index which is required to find the specific offshore medium where the chunk is located. Maybe there are other solutions, I'll be happy to discuss that. Another topic is the approach to handling different media in the replication reader. Right now the replication reader uses only addresses to uniquely identify different peers to read from. This works because we read from data nodes only, and we do not care about the medium there. Now the reads may also happen from `OffshoreDataGateway`-s, and the medium starts to matter. Imagine a scenario when a chunk has two offshore replicas on two different mediums, one pointing to Google S3 storage, another to AWS S3. We must have a way to differentiate those two replicas, even though the address is the same - the sentinel `OffshoreNodeAddress`. This is why I introduce a structure called `TPeerId` which includes both the address and the medium index, and now replication reader works with it to differentiate different replicas. Last topic is the testing. It's impossible to implement an "honest" integration test at the moment as writes to offshore replicas are not implemented, and also masters know nothing of them. I have implemented a C\+\+ unit test to check the behaviour of replication reader only - `test_s3_data.cpp`. --- Pull Request resolved: <https://github.com/ytsaurus/ytsaurus/pull/1688> Co-authored-by: cherepashka <[email protected]> Co-authored-by: cherepashka <[email protected]> Co-authored-by: cherepashka <[email protected]> Co-authored-by: cherepashka <[email protected]> Co-authored-by: cherepashka <[email protected]> Co-authored-by: cherepashka <[email protected]> Co-authored-by: cherepashka <[email protected]> Co-authored-by: cherepashka <[email protected]> Co-authored-by: cherepashka <[email protected]> Co-authored-by: cherepashka <[email protected]> Co-authored-by: cherepashka <[email protected]> Co-authored-by: cherepashka <[email protected]> Co-authored-by: cherepashka <[email protected]> Co-authored-by: cherepashka <[email protected]> commit_hash:7941b82f5735c5788fefec1ccf5175ddd86528a5

Diffstat (limited to 'library/cpp/threading/equeue')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: