ydb - Mirror of YDB github repos

	Commit message (Collapse)	Author	Age	Files	Lines
*	YT-28458: Make per-CPU rseq fast path dlopen-safe	babenko	7 days	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hardens `library/cpp/yt/rseq` for the case where it is linked into a dlopen'd, position-independent module (e.g. a YQL UDF `.so`). Extracted from the profiling work that enables the rseq fast path by default. TLS model. The weak `__rseq_abi` gets `global-dynamic` linkage under `__PIC__/__PIE__` (`initial-exec` otherwise), mirroring `contrib/libs/tcmalloc`. `initial-exec` needs a slot in the static TLS block reserved at startup, which the loader cannot grant a module dlopen'd later — the module would fail to load with "cannot allocate memory in static TLS block". This only changes the cold `&__rseq_abi` accesses; the hot path still reads `(thread_pointer + CpuIdFieldOffset)`. Runtime safety probe `IsPerCpuFastPathSafe()`.* The cached thread-pointer offset is valid only when `__rseq_abi` sits at a fixed offset from the thread pointer — a glibc-owned area or the static TLS block (incl. tcmalloc), the common case. When our `__rseq_abi` instead lands in a dlopen'd module's dynamically allocated TLS, the offset is valid only on the thread that computed it; on other threads the hot path's first store (`area->rseq_cs`) would corrupt unrelated memory. The probe spawns one thread and checks — by pointer comparison, never dereferencing the suspect offset — that the offset names that thread's rseq area; if not, callers use the atomic fallback. Decided once and cached (one thread spawn at first use).= commit_hash:633f58f500d9d097800da81f526c56283445ffc7
*	Add library/cpp/yt/rseq: NYT::GetCurrentCpuId() via Linux rseq	babenko	2026-06-14	1	-0/+44
	Self-contained current-CPU-id reader backed by Linux rseq (restartable sequences), with no third-party dependency (no librseq): * The rseq ABI is hand-defined; the calling thread is registered lazily via the rseq syscall. * Fast path is a single inlined, branch-free thread-local read. The offset always points at a readable `cpu_id` -- the glibc-owned area when glibc registers rseq (>= 2.35, via the weak `__rseq_offset`/`__rseq_size`), otherwise our own area -- so an unregistered thread reads `-1` and routes to the slow path. * Falls back to `sched_getcpu()` (Linux) or `0` (darwin/windows). Works on glibc and musl alike (librseq does not build on musl). Fiber-TLS contract: the inlined read must be reached only via a non-inlinable, fiber-switch-free frame (a virtual call or `YT_PREVENT_TLS_CACHING`). #### Benchmark -- cost of one cpu-id read \| source \| time / call \| \|---\|---\| \| `GetCurrentCpuId()` (rseq) \| 0.34 ns \| \| `sched_getcpu()` (vDSO) \| 3.5 ns \| \| `rdtscp` (what `TTscp::Get()` does) \| 23 ns \| This is an alternative to the librseq-based review/13886037 -- same speed, but no contrib dependency and it also covers musl. The unit test pins to each allowed CPU and asserts the reported id matches. 🤖 Generated with [Claude Code](https://claude.com/claude-code) commit_hash:09d282c2f48755836b1cd68cedbffc3c6a662eed