<feed xmlns='http://www.w3.org/2005/Atom'>
<title>ydb/library/cpp/yt, branch CLI_2.32.0</title>
<subtitle>Mirror of YDB github repos</subtitle>
<id>https://code.mastervirt.ru/ydb/atom?h=CLI_2.32.0</id>
<link rel='self' href='https://code.mastervirt.ru/ydb/atom?h=CLI_2.32.0'/>
<link rel='alternate' type='text/html' href='https://code.mastervirt.ru/ydb/'/>
<updated>2026-06-19T17:53:29Z</updated>
<entry>
<title>YT-28504: Support heterogeneous lookup in caches</title>
<updated>2026-06-19T17:53:29Z</updated>
<author>
<name>babenko</name>
<email>babenko@yandex-team.com</email>
</author>
<published>2026-06-19T17:35:37Z</published>
<link rel='alternate' type='text/html' href='https://code.mastervirt.ru/ydb/commit/?id=77f13a22a1d248303c4d9a6ad19c6adbae202dbb'/>
<id>urn:sha1:77f13a22a1d248303c4d9a6ad19c6adbae202dbb</id>
<content type='text'>
commit_hash:acb3e84437f5bdb125d7c1807847eb5edecbb11f
</content>
</entry>
<entry>
<title>Add lock-free per-CPU primitives to library/cpp/yt/rseq</title>
<updated>2026-06-19T12:12:00Z</updated>
<author>
<name>babenko</name>
<email>babenko@yandex-team.com</email>
</author>
<published>2026-06-19T11:27:43Z</published>
<link rel='alternate' type='text/html' href='https://code.mastervirt.ru/ydb/commit/?id=89c0e29c8f9ba29ecdc736fefda87286482ac213'/>
<id>urn:sha1:89c0e29c8f9ba29ecdc736fefda87286482ac213</id>
<content type='text'>
Introduce AddPerCpu and StorePerCpu over an rseq-sharded per-CPU array.

On the x86-64 Linux fast path the update is committed by a hand-rolled
rseq critical section (non-atomic, migration-safe): addq for the 8-byte
accumulate, movq / movdqu for the 8- or 16-byte store. The kernel
restarts the sequence on preemption or migration, and only one thread
runs on a CPU at a time, so no atomic or lock is needed. Off the fast
path (other arches, no kernel rseq) the operation falls back to an
atomic on the slot indexed by sched_getcpu().

A naturally-aligned 8-byte store is single-copy atomic on x86-64, so it
is never observed torn; the 16-byte store may be, which is acceptable for
a last-writer-wins gauge.
commit_hash:6250f6e9e35cf3895ebafe0b534ec12cca50b03b
</content>
</entry>
<entry>
<title>Add TTscp::GetApproximate</title>
<updated>2026-06-18T10:04:31Z</updated>
<author>
<name>babenko</name>
<email>babenko@yandex-team.com</email>
</author>
<published>2026-06-18T09:20:33Z</published>
<link rel='alternate' type='text/html' href='https://code.mastervirt.ru/ydb/commit/?id=50fd836ab1e51d127495bb37dab3888b27e0ce09'/>
<id>urn:sha1:50fd836ab1e51d127495bb37dab3888b27e0ce09</id>
<content type='text'>
TTscp::GetApproximate takes the processor id from the rseq fast path
(GetCurrentCpuId) and the instant from a non-serializing rdtsc, instead of the
single serializing rdtscp of TTscp::Get.

TPerCpuGauge::Update switches to it: the per-shard timestamp only orders writes
across shards to pick the freshest value, so the lower precision is fine. Update
is virtual and now YT_PREVENT_TLS_CACHING -- the fiber-TLS boundary the inlined
rseq read needs.

#### Benchmark

sas2-2769 (glibc 2.31 + tcmalloc, rseq fast path), median of 5:

| primitive | time |
|---|---|
| TTscp::Get() (rdtscp) | 14.1 ns |
| TTscp::GetApproximate() (rseq + rdtsc) | 10.6 ns (-25%) |
commit_hash:b277b6551accd6d0b879f8ffb168bcbe8d9fbb74
</content>
</entry>
<entry>
<title>Make library/cpp/yt/rseq a Linux-only dependency of library/cpp/yt/system</title>
<updated>2026-06-15T08:32:51Z</updated>
<author>
<name>babenko</name>
<email>babenko@yandex-team.com</email>
</author>
<published>2026-06-15T08:13:32Z</published>
<link rel='alternate' type='text/html' href='https://code.mastervirt.ru/ydb/commit/?id=12be02b42fc24cf7bd990d56da8cf8908a35db2d'/>
<id>urn:sha1:12be02b42fc24cf7bd990d56da8cf8908a35db2d</id>
<content type='text'>
Make library/cpp/yt/rseq a Linux-only dependency of library/cpp/yt/system
commit_hash:7d6f5e738658447529440425b55b2891f6664d81
</content>
</entry>
<entry>
<title>Add YT_DEFINE_SENTINEL_OPTIONAL macro</title>
<updated>2026-06-14T23:36:20Z</updated>
<author>
<name>babenko</name>
<email>babenko@yandex-team.com</email>
</author>
<published>2026-06-14T23:16:08Z</published>
<link rel='alternate' type='text/html' href='https://code.mastervirt.ru/ydb/commit/?id=f8cd3660cc9c26e7dcbc92b9c226dece12649d90'/>
<id>urn:sha1:f8cd3660cc9c26e7dcbc92b9c226dece12649d90</id>
<content type='text'>
Add a macro for defining `TSentinelOptional` type aliases when the value type is not structural (e.g. has protected members) and therefore cannot use `TValueSentinel&lt;V&gt;`. The macro collapses the boilerplate sentinel struct plus `using` declaration into a single line:

```cpp
// before
struct TInstantSentinel
{
    static constexpr auto Sentinel = TInstant::Zero();
};
using TSentinelOptionalInstant = TSentinelOptional&lt;TInstant, TInstantSentinel&gt;;

// after
YT_DEFINE_SENTINEL_OPTIONAL(TSentinelOptionalInstant, TInstant, TInstant::Zero());
```

Also convert the existing call sites in `service_detail.cpp` and `inferrum/block_cache.cpp`.
commit_hash:5dcdeb8db215736b0ce5a5b71f30aead91c7b8e8
</content>
</entry>
<entry>
<title>YT-18571: Move TTscp to library/cpp/yt/system</title>
<updated>2026-06-14T21:29:58Z</updated>
<author>
<name>babenko</name>
<email>babenko@yandex-team.com</email>
</author>
<published>2026-06-14T20:32:15Z</published>
<link rel='alternate' type='text/html' href='https://code.mastervirt.ru/ydb/commit/?id=972b95687df432234ca66ad90d59ac74ae6048e3'/>
<id>urn:sha1:972b95687df432234ca66ad90d59ac74ae6048e3</id>
<content type='text'>
commit_hash:33721d8fd9919cec2c217db529145c881baf144b
</content>
</entry>
<entry>
<title>Fix rseq fast path on glibc &lt; 2.35: read the shared __rseq_abi area</title>
<updated>2026-06-14T16:17:43Z</updated>
<author>
<name>babenko</name>
<email>babenko@yandex-team.com</email>
</author>
<published>2026-06-14T15:52:20Z</published>
<link rel='alternate' type='text/html' href='https://code.mastervirt.ru/ydb/commit/?id=52f00b11c4259dfa810ecdcba576a801dd043a87'/>
<id>urn:sha1:52f00b11c4259dfa810ecdcba576a801dd043a87</id>
<content type='text'>
The own-area approach did not deliver the fast path on glibc 2.31 (YT's current
runtime). There tcmalloc registers the conventional `__rseq_abi` area for every
thread; our attempt to register a separate area was rejected by the kernel with
EINVAL (a thread may have only one rseq area), so `cpu_id` stayed -1 and every
`GetCurrentCpuId()` fell back to `sched_getcpu()` (~17-20 ns, slower than the
rdtscp it replaced).

Read the shared `__rseq_abi` symbol instead -- the area tcmalloc, librseq and
pre-2.35 glibc all register. Our definition is weak, so it coalesces with theirs
when present (the common case -- tcmalloc owns it) and stands alone otherwise
(e.g. musl), with us registering it. We register with the conventional signature
`0x53053053` and size 32, so re-registering an already-registered area returns
EBUSY (treated as success) rather than EINVAL -- coexisting cleanly with tcmalloc.

glibc &gt;= 2.35 still takes the `__rseq_offset` path unchanged.

Measured on sas2-2769 (glibc 2.31 + tcmalloc): `GetCurrentCpuId()` 20.0 ns -&gt; 0.60 ns,
verified via strace that our registration now returns EBUSY against tcmalloc's
`__rseq_abi` (was EINVAL against a separate area).
commit_hash:509809deeb5f7c671817fcd9ebcc8499eabf096e
</content>
</entry>
<entry>
<title>Fold TEnumTraits GetMinValue/GetMaxValue to compile time</title>
<updated>2026-06-14T16:10:22Z</updated>
<author>
<name>babenko</name>
<email>babenko@yandex-team.com</email>
</author>
<published>2026-06-14T15:50:09Z</published>
<link rel='alternate' type='text/html' href='https://code.mastervirt.ru/ydb/commit/?id=969f69bfb876d8108e12cd731d24d3b6f984916b'/>
<id>urn:sha1:969f69bfb876d8108e12cd731d24d3b6f984916b</id>
<content type='text'>
GetMinValue()/GetMaxValue() are constexpr, but when called from a runtime
context for a large-domain enum, clang does not fold the min/max_element and
emits a runtime scan over the whole domain on every call. This is hot on the
master replay path: TEnumIndexedArray::operator[] bounds-checks against these
(e.g. TCypressManager::FindHandler), and TCompositeAutomaton::RememberReign
hits GetCurrentReign() = GetMaxValue() over the ~3300-entry EMasterReign domain
per mutation.

Bind the result to a constexpr local to force compile-time evaluation. Verified
by disasm on a 240-value sample enum: getmin() goes from a ~44-instruction
runtime scan to a single 'mov $const'. No behavior change.

Part of YT-28453 (master replay-speed optimizations).
commit_hash:7cdb969e00ba219415d80c5c8c984aa8bbde99d2
</content>
</entry>
<entry>
<title>Add library/cpp/yt/rseq: NYT::GetCurrentCpuId() via Linux rseq</title>
<updated>2026-06-13T22:35:19Z</updated>
<author>
<name>babenko</name>
<email>babenko@yandex-team.com</email>
</author>
<published>2026-06-13T22:15:56Z</published>
<link rel='alternate' type='text/html' href='https://code.mastervirt.ru/ydb/commit/?id=450b2fac082943e2029b3390c99501c365795d64'/>
<id>urn:sha1:450b2fac082943e2029b3390c99501c365795d64</id>
<content type='text'>
Self-contained current-CPU-id reader backed by Linux **rseq** (restartable
sequences), with **no third-party dependency** (no librseq):

* The rseq ABI is hand-defined; the calling thread is registered lazily via the
  rseq syscall.
* Fast path is a single inlined, **branch-free** thread-local read. The offset
  always points at a readable `cpu_id` -- the glibc-owned area when glibc registers
  rseq (&gt;= 2.35, via the weak `__rseq_offset`/`__rseq_size`), otherwise our own
  area -- so an unregistered thread reads `-1` and routes to the slow path.
* Falls back to `sched_getcpu()` (Linux) or `0` (darwin/windows). Works on glibc
  **and musl** alike (librseq does not build on musl).

Fiber-TLS contract: the inlined read must be reached only via a non-inlinable,
fiber-switch-free frame (a virtual call or `YT_PREVENT_TLS_CACHING`).

#### Benchmark -- cost of one cpu-id read

| source | time / call |
|---|---|
| `GetCurrentCpuId()` (rseq) | **0.34 ns** |
| `sched_getcpu()` (vDSO) | 3.5 ns |
| `rdtscp` (what `TTscp::Get()` does) | 23 ns |

This is an alternative to the librseq-based review/13886037 -- same speed, but no
contrib dependency and it also covers musl. The unit test pins to each allowed CPU
and asserts the reported id matches.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
commit_hash:09d282c2f48755836b1cd68cedbffc3c6a662eed
</content>
</entry>
<entry>
<title>Include port.h from logger.h</title>
<updated>2026-06-13T20:49:35Z</updated>
<author>
<name>babenko</name>
<email>babenko@yandex-team.com</email>
</author>
<published>2026-06-13T20:31:30Z</published>
<link rel='alternate' type='text/html' href='https://code.mastervirt.ru/ydb/commit/?id=732b297f8ec8809088cf3e34459d71307104ae24'/>
<id>urn:sha1:732b297f8ec8809088cf3e34459d71307104ae24</id>
<content type='text'>
YT_LOG_TRACE is gated on YT_ENABLE_TRACE_LOGGING, defined in
library/cpp/yt/misc/port.h. logger.h relies on that macro but does not
include port.h, so in a TU that does not pull in port.h before logger.h,
YT_LOG_TRACE silently compiles to a no-op regardless of the configured
log level. Make logger.h self-contained by including the header that
defines the macro it depends on.
commit_hash:c53f26a7dff9d3f9c5a9d9aab8ea7fa31d11ec49
</content>
</entry>
</feed>
