summaryrefslogtreecommitdiffstats
path: root/yt/cpp/mapreduce/client/init.cpp
diff options
context:
space:
mode:
authorbabenko <[email protected]>2026-06-12 23:55:04 +0300
committerbabenko <[email protected]>2026-06-13 00:27:42 +0300
commita554ee99d3a734792487291a49115a44ec24e8c6 (patch)
tree0d874fdd1142dcf0710475ecbbaeb452730a6978 /yt/cpp/mapreduce/client/init.cpp
parent42e4b751702f065de932ef765bc6948c5e7d1e4b (diff)
Speed up TRefCountedTracker allocation accounting
Every `TRefCounted` allocation/free bumps a per-thread counter in `TRefCountedTracker`. That hot path went through two `Y_NO_INLINE` TLS accessors (one per thread-local) plus a facade call. This reworks it: 1. Merge the three per-thread thread-locals (`...Slots`, `...SlotsBegin`, `...SlotsSize`) into one `constinit thread_local` struct -- one cache line, one TLS base, no dynamic-init guard. 2. Read it directly instead of through the generated `Y_NO_INLINE` accessor, collapsing the per-call accessor `call` into a couple of `%fs`-relative instructions. 3. Keep it fiber-safe by construction: the counter entry points are now `private` and `friend`ed only to `TRefCountedTrackerFacade`, the single cross-library boundary, which is pinned `YT_PREVENT_TLS_CACHING`. The hot path inlines into that out-of-line function, so the thread pointer is re-read on every call and can never be cached across a fiber migration. Every other function that touches the TLS (`*Slow`, `GetLocalSlot`, the thread-exit reclaimer) is pinned the same way. Isolated microbenchmark (`BM_RefCountedTracker_AllocateFreeInstance`, added here), pinned core, allocate+free pair: | | ns/pair | |---|---| | before | 11.5 | | after | 4.4 | `AllocateInstance` body: 8 instructions, 0 internal calls, no jmp. commit_hash:8601e1eaa2ee25e3e43a8792fb10b4c02c1b2cef
Diffstat (limited to 'yt/cpp/mapreduce/client/init.cpp')
0 files changed, 0 insertions, 0 deletions