summaryrefslogtreecommitdiffstats
path: root/library/cpp/yt/rseq/unittests/per_cpu_ut.cpp
Commit message (Collapse)AuthorAgeFilesLines
* YT-28458: Make per-CPU rseq fast path dlopen-safebabenko7 days1-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Hardens `library/cpp/yt/rseq` for the case where it is linked into a dlopen'd, position-independent module (e.g. a YQL UDF `.so`). Extracted from the profiling work that enables the rseq fast path by default. **TLS model.** The weak `__rseq_abi` gets `global-dynamic` linkage under `__PIC__/__PIE__` (`initial-exec` otherwise), mirroring `contrib/libs/tcmalloc`. `initial-exec` needs a slot in the static TLS block reserved at startup, which the loader cannot grant a module dlopen'd later — the module would fail to load with "cannot allocate memory in static TLS block". This only changes the cold `&__rseq_abi` accesses; the hot path still reads `*(thread_pointer + CpuIdFieldOffset)`. **Runtime safety probe `IsPerCpuFastPathSafe()`.** The cached thread-pointer offset is valid only when `__rseq_abi` sits at a fixed offset from the thread pointer — a glibc-owned area or the static TLS block (incl. tcmalloc), the common case. When our `__rseq_abi` instead lands in a dlopen'd module's *dynamically allocated* TLS, the offset is valid only on the thread that computed it; on other threads the hot path's first store (`area->rseq_cs`) would corrupt unrelated memory. The probe spawns one thread and checks — by pointer comparison, never dereferencing the suspect offset — that the offset names that thread's rseq area; if not, callers use the atomic fallback. Decided once and cached (one thread spawn at first use).= commit_hash:633f58f500d9d097800da81f526c56283445ffc7
* Add lock-free per-CPU primitives to library/cpp/yt/rseqbabenko10 days1-0/+229
Introduce AddPerCpu and StorePerCpu over an rseq-sharded per-CPU array. On the x86-64 Linux fast path the update is committed by a hand-rolled rseq critical section (non-atomic, migration-safe): addq for the 8-byte accumulate, movq / movdqu for the 8- or 16-byte store. The kernel restarts the sequence on preemption or migration, and only one thread runs on a CPU at a time, so no atomic or lock is needed. Off the fast path (other arches, no kernel rseq) the operation falls back to an atomic on the slot indexed by sched_getcpu(). A naturally-aligned 8-byte store is single-copy atomic on x86-64, so it is never observed torn; the 16-byte store may be, which is acceptable for a last-writer-wins gauge. commit_hash:6250f6e9e35cf3895ebafe0b534ec12cca50b03b