ydb - Mirror of YDB github repos

diff options

author	apollo1321 <apollo1321@yandex-team.com>	2025-05-01 13:35:41 +0300
committer	apollo1321 <apollo1321@yandex-team.com>	2025-05-01 13:51:43 +0300
commit	e1b2809b60d8b79857b8515832a51056101516e2 (patch)
tree	fc29e43b2c99a914d90745e576f2f1d39f7b6040 /contrib/python/matplotlib/py2/src
parent	4463ac0859eddd33f1129c64ec620719ef364cca (diff)
download	ydb-e1b2809b60d8b79857b8515832a51056101516e2.tar.gz

YT-10317: Simplify unordered chunk pool slicing algorithm

This PR simplifies the calculation of `data_weight_per_job` within the `TUnorderedChunkPool`. **Current Workflow:** 1\. **TJobSizeConstraints:** \- Users define constraints in the job specification, such as `data_weight_per_job`, `job_count`, etc. \- These user constraints are transformed into `job_count`. \- `data_weight_per_job` is then calculated based on this `job_count`. 2\. **TUnorderedChunkPool:** \- Within this pool, `data_weight_per_job` is again transformed into `job_count`. \- The ideal `data_weight_per_job` for slicing is calculated as `remaining_data_weight / remaining_job_count`. **Proposed Changes:** This PR simplifies the algorithm by directly using the `data_weight_per_job` from `TJobSizeConstraints` in the `TUnorderedChunkPool`. Previously, the approach could lead to an increase or a decrease in `data_weight_per_job` during the slicing process. For instance, with an initial `data_weight_per_job` of `400`, the previous algorithm might split inputs into jobs with data weights of `[433, 433, 394, 394, 394]`. In contrast, the updated algorithm consistently maintains job sizes, resulting in a distribution of `[433, 433, 433, 433, 316]`. **Additional Notes:** \- The current algorithm has special handling for the AutoMerge task, using `data_weight_per_job` directly from `TJobSizeConstraints`. \- Although the current algorithm might provide speed improvements in certain specific scenarios, it is not a consistently reliable solution overall. To more effectively reduce tail latency in operations, it is preferable to use a job splitting mechanism. \- The simplified logic facilitates the future introduction of slicing mechanisms based on compressed data size, which the old approach would complicate. commit_hash:2d450fb007e35c6a59dc136f504e2e77f46db625

Diffstat (limited to 'contrib/python/matplotlib/py2/src')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: