diff options
| author | babenko <[email protected]> | 2026-06-12 23:55:04 +0300 |
|---|---|---|
| committer | babenko <[email protected]> | 2026-06-13 00:27:42 +0300 |
| commit | a554ee99d3a734792487291a49115a44ec24e8c6 (patch) | |
| tree | 0d874fdd1142dcf0710475ecbbaeb452730a6978 /yt/cpp/mapreduce/client/init.cpp | |
| parent | 42e4b751702f065de932ef765bc6948c5e7d1e4b (diff) | |
Speed up TRefCountedTracker allocation accounting
Every `TRefCounted` allocation/free bumps a per-thread counter in `TRefCountedTracker`. That hot path went through two `Y_NO_INLINE` TLS accessors (one per thread-local) plus a facade call. This reworks it:
1. Merge the three per-thread thread-locals (`...Slots`, `...SlotsBegin`, `...SlotsSize`) into one `constinit thread_local` struct -- one cache line, one TLS base, no dynamic-init guard.
2. Read it directly instead of through the generated `Y_NO_INLINE` accessor, collapsing the per-call accessor `call` into a couple of `%fs`-relative instructions.
3. Keep it fiber-safe by construction: the counter entry points are now `private` and `friend`ed only to `TRefCountedTrackerFacade`, the single cross-library boundary, which is pinned `YT_PREVENT_TLS_CACHING`. The hot path inlines into that out-of-line function, so the thread pointer is re-read on every call and can never be cached across a fiber migration. Every other function that touches the TLS (`*Slow`, `GetLocalSlot`, the thread-exit reclaimer) is pinned the same way.
Isolated microbenchmark (`BM_RefCountedTracker_AllocateFreeInstance`, added here), pinned core, allocate+free pair:
| | ns/pair |
|---|---|
| before | 11.5 |
| after | 4.4 |
`AllocateInstance` body: 8 instructions, 0 internal calls, no jmp.
commit_hash:8601e1eaa2ee25e3e43a8792fb10b4c02c1b2cef
Diffstat (limited to 'yt/cpp/mapreduce/client/init.cpp')
0 files changed, 0 insertions, 0 deletions
