diff options
| author | pavook <[email protected]> | 2026-02-11 04:03:48 +0300 |
|---|---|---|
| committer | pavook <[email protected]> | 2026-02-11 04:27:50 +0300 |
| commit | 72e2e4fb5634001484795bce3f6ca27e21b24905 (patch) | |
| tree | 3848cf51a83d59e11461a4ab32176770eb4aaddf /library/python/testing | |
| parent | 7f819b0c89c679ccc44e86c56b407dbeff61e787 (diff) | |
YT-27167: Better TBitwiseUnversionedValueHash
+20-30% throughput on UpdateColumnarStatistics benchmark (with large statistics enabled)
- Do not factor in value.Id when calculating column digest
- Pack metadata directly instead of multiple HashCombine calls
- Use SplitMix64 finalizer for proper bit distribution
- Use cheaper xor with metadata instead of HashCombine
- Use XXH3 for strings
- Remove unnecessary copy
- Measured quality increased: on 20 (c=1..20) sequences `{nc | n \in [1..10^6]}` MAE dropped from ~36% to ~20%
HLL digests might temporarily suffer a 2x increase upon merging with the previously computed ones.
commit_hash:0bf661245cf1848ba9ef8b6c840c18dfd05bd2a4
Diffstat (limited to 'library/python/testing')
0 files changed, 0 insertions, 0 deletions
