Restoring authorship annotation for <thegeorg@yandex-team.ru>. Commit 2 of 2.

author: thegeorg <thegeorg@yandex-team.ru> 2022-02-10 16:45:12 +0300
committer: Daniil Cherednik <dcherednik@yandex-team.ru> 2022-02-10 16:45:12 +0300
commit: 49116032d905455a7b1c994e4a696afc885c1e71 (patch)
tree: be835aa92c6248212e705f25388ebafcf84bc7a1 /contrib/libs/zstd/lib/zdict.h
parent: 4e839db24a3bbc9f1c610c43d6faaaa99824dcca (diff)
download: ydb-49116032d905455a7b1c994e4a696afc885c1e71.tar.gz
1 files changed, 151 insertions, 151 deletions
diff --git a/contrib/libs/zstd/lib/zdict.h b/contrib/libs/zstd/lib/zdict.h
index cce8285b77..f1e139a40d 100644
--- a/contrib/libs/zstd/lib/zdict.h
+++ b/contrib/libs/zstd/lib/zdict.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) Yann Collet, Facebook, Inc. 
+ * Copyright (c) Yann Collet, Facebook, Inc.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in the
@@ -36,146 +36,146 @@ extern "C" {
 #  define ZDICTLIB_API ZDICTLIB_VISIBILITY
 #endif
 
-/******************************************************************************* 
- * Zstd dictionary builder 
- * 
- * FAQ 
- * === 
- * Why should I use a dictionary? 
- * ------------------------------ 
- * 
- * Zstd can use dictionaries to improve compression ratio of small data. 
- * Traditionally small files don't compress well because there is very little 
+/*******************************************************************************
+ * Zstd dictionary builder
+ *
+ * FAQ
+ * ===
+ * Why should I use a dictionary?
+ * ------------------------------
+ *
+ * Zstd can use dictionaries to improve compression ratio of small data.
+ * Traditionally small files don't compress well because there is very little
  * repetition in a single sample, since it is small. But, if you are compressing
- * many similar files, like a bunch of JSON records that share the same 
- * structure, you can train a dictionary on ahead of time on some samples of 
- * these files. Then, zstd can use the dictionary to find repetitions that are 
- * present across samples. This can vastly improve compression ratio. 
- * 
- * When is a dictionary useful? 
- * ---------------------------- 
- * 
- * Dictionaries are useful when compressing many small files that are similar. 
- * The larger a file is, the less benefit a dictionary will have. Generally, 
- * we don't expect dictionary compression to be effective past 100KB. And the 
- * smaller a file is, the more we would expect the dictionary to help. 
- * 
- * How do I use a dictionary? 
- * -------------------------- 
- * 
- * Simply pass the dictionary to the zstd compressor with 
- * `ZSTD_CCtx_loadDictionary()`. The same dictionary must then be passed to 
- * the decompressor, using `ZSTD_DCtx_loadDictionary()`. There are other 
- * more advanced functions that allow selecting some options, see zstd.h for 
- * complete documentation. 
- * 
- * What is a zstd dictionary? 
- * -------------------------- 
- * 
- * A zstd dictionary has two pieces: Its header, and its content. The header 
- * contains a magic number, the dictionary ID, and entropy tables. These 
- * entropy tables allow zstd to save on header costs in the compressed file, 
- * which really matters for small data. The content is just bytes, which are 
- * repeated content that is common across many samples. 
- * 
- * What is a raw content dictionary? 
- * --------------------------------- 
- * 
- * A raw content dictionary is just bytes. It doesn't have a zstd dictionary 
- * header, a dictionary ID, or entropy tables. Any buffer is a valid raw 
- * content dictionary. 
- * 
- * How do I train a dictionary? 
- * ---------------------------- 
- * 
- * Gather samples from your use case. These samples should be similar to each 
- * other. If you have several use cases, you could try to train one dictionary 
- * per use case. 
- * 
- * Pass those samples to `ZDICT_trainFromBuffer()` and that will train your 
- * dictionary. There are a few advanced versions of this function, but this 
- * is a great starting point. If you want to further tune your dictionary 
- * you could try `ZDICT_optimizeTrainFromBuffer_cover()`. If that is too slow 
- * you can try `ZDICT_optimizeTrainFromBuffer_fastCover()`. 
- * 
- * If the dictionary training function fails, that is likely because you 
- * either passed too few samples, or a dictionary would not be effective 
- * for your data. Look at the messages that the dictionary trainer printed, 
- * if it doesn't say too few samples, then a dictionary would not be effective. 
- * 
- * How large should my dictionary be? 
- * ---------------------------------- 
- * 
- * A reasonable dictionary size, the `dictBufferCapacity`, is about 100KB. 
- * The zstd CLI defaults to a 110KB dictionary. You likely don't need a 
- * dictionary larger than that. But, most use cases can get away with a 
- * smaller dictionary. The advanced dictionary builders can automatically 
- * shrink the dictionary for you, and select a the smallest size that 
- * doesn't hurt compression ratio too much. See the `shrinkDict` parameter. 
- * A smaller dictionary can save memory, and potentially speed up 
- * compression. 
- * 
- * How many samples should I provide to the dictionary builder? 
- * ------------------------------------------------------------ 
- * 
- * We generally recommend passing ~100x the size of the dictionary 
- * in samples. A few thousand should suffice. Having too few samples 
- * can hurt the dictionaries effectiveness. Having more samples will 
- * only improve the dictionaries effectiveness. But having too many 
- * samples can slow down the dictionary builder. 
- * 
- * How do I determine if a dictionary will be effective? 
- * ----------------------------------------------------- 
- * 
- * Simply train a dictionary and try it out. You can use zstd's built in 
- * benchmarking tool to test the dictionary effectiveness. 
- * 
- *   # Benchmark levels 1-3 without a dictionary 
- *   zstd -b1e3 -r /path/to/my/files 
+ * many similar files, like a bunch of JSON records that share the same
+ * structure, you can train a dictionary on ahead of time on some samples of
+ * these files. Then, zstd can use the dictionary to find repetitions that are
+ * present across samples. This can vastly improve compression ratio.
+ *
+ * When is a dictionary useful?
+ * ----------------------------
+ *
+ * Dictionaries are useful when compressing many small files that are similar.
+ * The larger a file is, the less benefit a dictionary will have. Generally,
+ * we don't expect dictionary compression to be effective past 100KB. And the
+ * smaller a file is, the more we would expect the dictionary to help.
+ *
+ * How do I use a dictionary?
+ * --------------------------
+ *
+ * Simply pass the dictionary to the zstd compressor with
+ * `ZSTD_CCtx_loadDictionary()`. The same dictionary must then be passed to
+ * the decompressor, using `ZSTD_DCtx_loadDictionary()`. There are other
+ * more advanced functions that allow selecting some options, see zstd.h for
+ * complete documentation.
+ *
+ * What is a zstd dictionary?
+ * --------------------------
+ *
+ * A zstd dictionary has two pieces: Its header, and its content. The header
+ * contains a magic number, the dictionary ID, and entropy tables. These
+ * entropy tables allow zstd to save on header costs in the compressed file,
+ * which really matters for small data. The content is just bytes, which are
+ * repeated content that is common across many samples.
+ *
+ * What is a raw content dictionary?
+ * ---------------------------------
+ *
+ * A raw content dictionary is just bytes. It doesn't have a zstd dictionary
+ * header, a dictionary ID, or entropy tables. Any buffer is a valid raw
+ * content dictionary.
+ *
+ * How do I train a dictionary?
+ * ----------------------------
+ *
+ * Gather samples from your use case. These samples should be similar to each
+ * other. If you have several use cases, you could try to train one dictionary
+ * per use case.
+ *
+ * Pass those samples to `ZDICT_trainFromBuffer()` and that will train your
+ * dictionary. There are a few advanced versions of this function, but this
+ * is a great starting point. If you want to further tune your dictionary
+ * you could try `ZDICT_optimizeTrainFromBuffer_cover()`. If that is too slow
+ * you can try `ZDICT_optimizeTrainFromBuffer_fastCover()`.
+ *
+ * If the dictionary training function fails, that is likely because you
+ * either passed too few samples, or a dictionary would not be effective
+ * for your data. Look at the messages that the dictionary trainer printed,
+ * if it doesn't say too few samples, then a dictionary would not be effective.
+ *
+ * How large should my dictionary be?
+ * ----------------------------------
+ *
+ * A reasonable dictionary size, the `dictBufferCapacity`, is about 100KB.
+ * The zstd CLI defaults to a 110KB dictionary. You likely don't need a
+ * dictionary larger than that. But, most use cases can get away with a
+ * smaller dictionary. The advanced dictionary builders can automatically
+ * shrink the dictionary for you, and select a the smallest size that
+ * doesn't hurt compression ratio too much. See the `shrinkDict` parameter.
+ * A smaller dictionary can save memory, and potentially speed up
+ * compression.
+ *
+ * How many samples should I provide to the dictionary builder?
+ * ------------------------------------------------------------
+ *
+ * We generally recommend passing ~100x the size of the dictionary
+ * in samples. A few thousand should suffice. Having too few samples
+ * can hurt the dictionaries effectiveness. Having more samples will
+ * only improve the dictionaries effectiveness. But having too many
+ * samples can slow down the dictionary builder.
+ *
+ * How do I determine if a dictionary will be effective?
+ * -----------------------------------------------------
+ *
+ * Simply train a dictionary and try it out. You can use zstd's built in
+ * benchmarking tool to test the dictionary effectiveness.
+ *
+ *   # Benchmark levels 1-3 without a dictionary
+ *   zstd -b1e3 -r /path/to/my/files
  *   # Benchmark levels 1-3 with a dictionary
- *   zstd -b1e3 -r /path/to/my/files -D /path/to/my/dictionary 
- * 
- * When should I retrain a dictionary? 
- * ----------------------------------- 
- * 
- * You should retrain a dictionary when its effectiveness drops. Dictionary 
- * effectiveness drops as the data you are compressing changes. Generally, we do 
- * expect dictionaries to "decay" over time, as your data changes, but the rate 
- * at which they decay depends on your use case. Internally, we regularly 
- * retrain dictionaries, and if the new dictionary performs significantly 
- * better than the old dictionary, we will ship the new dictionary. 
- * 
- * I have a raw content dictionary, how do I turn it into a zstd dictionary? 
- * ------------------------------------------------------------------------- 
- * 
- * If you have a raw content dictionary, e.g. by manually constructing it, or 
- * using a third-party dictionary builder, you can turn it into a zstd 
- * dictionary by using `ZDICT_finalizeDictionary()`. You'll also have to 
- * provide some samples of the data. It will add the zstd header to the 
- * raw content, which contains a dictionary ID and entropy tables, which 
- * will improve compression ratio, and allow zstd to write the dictionary ID 
- * into the frame, if you so choose. 
- * 
- * Do I have to use zstd's dictionary builder? 
- * ------------------------------------------- 
- * 
- * No! You can construct dictionary content however you please, it is just 
- * bytes. It will always be valid as a raw content dictionary. If you want 
- * a zstd dictionary, which can improve compression ratio, use 
- * `ZDICT_finalizeDictionary()`. 
- * 
- * What is the attack surface of a zstd dictionary? 
- * ------------------------------------------------ 
- * 
- * Zstd is heavily fuzz tested, including loading fuzzed dictionaries, so 
- * zstd should never crash, or access out-of-bounds memory no matter what 
- * the dictionary is. However, if an attacker can control the dictionary 
- * during decompression, they can cause zstd to generate arbitrary bytes, 
- * just like if they controlled the compressed data. 
- * 
- ******************************************************************************/ 
-
- 
+ *   zstd -b1e3 -r /path/to/my/files -D /path/to/my/dictionary
+ *
+ * When should I retrain a dictionary?
+ * -----------------------------------
+ *
+ * You should retrain a dictionary when its effectiveness drops. Dictionary
+ * effectiveness drops as the data you are compressing changes. Generally, we do
+ * expect dictionaries to "decay" over time, as your data changes, but the rate
+ * at which they decay depends on your use case. Internally, we regularly
+ * retrain dictionaries, and if the new dictionary performs significantly
+ * better than the old dictionary, we will ship the new dictionary.
+ *
+ * I have a raw content dictionary, how do I turn it into a zstd dictionary?
+ * -------------------------------------------------------------------------
+ *
+ * If you have a raw content dictionary, e.g. by manually constructing it, or
+ * using a third-party dictionary builder, you can turn it into a zstd
+ * dictionary by using `ZDICT_finalizeDictionary()`. You'll also have to
+ * provide some samples of the data. It will add the zstd header to the
+ * raw content, which contains a dictionary ID and entropy tables, which
+ * will improve compression ratio, and allow zstd to write the dictionary ID
+ * into the frame, if you so choose.
+ *
+ * Do I have to use zstd's dictionary builder?
+ * -------------------------------------------
+ *
+ * No! You can construct dictionary content however you please, it is just
+ * bytes. It will always be valid as a raw content dictionary. If you want
+ * a zstd dictionary, which can improve compression ratio, use
+ * `ZDICT_finalizeDictionary()`.
+ *
+ * What is the attack surface of a zstd dictionary?
+ * ------------------------------------------------
+ *
+ * Zstd is heavily fuzz tested, including loading fuzzed dictionaries, so
+ * zstd should never crash, or access out-of-bounds memory no matter what
+ * the dictionary is. However, if an attacker can control the dictionary
+ * during decompression, they can cause zstd to generate arbitrary bytes,
+ * just like if they controlled the compressed data.
+ *
+ ******************************************************************************/
+
+
 /*! ZDICT_trainFromBuffer():
  *  Train a dictionary from an array of samples.
  *  Redirect towards ZDICT_optimizeTrainFromBuffer_fastCover() single-threaded, with d=8, steps=4,
@@ -203,14 +203,14 @@ ZDICTLIB_API size_t ZDICT_trainFromBuffer(void* dictBuffer, size_t dictBufferCap
 typedef struct {
     int      compressionLevel;   /*< optimize for a specific zstd compression level; 0 means default */
     unsigned notificationLevel;  /*< Write log to stderr; 0 = none (default); 1 = errors; 2 = progression; 3 = details; 4 = debug; */
-    unsigned dictID;             /*< force dictID value; 0 means auto mode (32-bits random value) 
-                                  *   NOTE: The zstd format reserves some dictionary IDs for future use. 
-                                  *         You may use them in private settings, but be warned that they 
-                                  *         may be used by zstd in a public dictionary registry in the future. 
-                                  *         These dictionary IDs are: 
-                                  *           - low range  : <= 32767 
-                                  *           - high range : >= (2^31) 
-                                  */ 
+    unsigned dictID;             /*< force dictID value; 0 means auto mode (32-bits random value)
+                                  *   NOTE: The zstd format reserves some dictionary IDs for future use.
+                                  *         You may use them in private settings, but be warned that they
+                                  *         may be used by zstd in a public dictionary registry in the future.
+                                  *         These dictionary IDs are:
+                                  *           - low range  : <= 32767
+                                  *           - high range : >= (2^31)
+                                  */
 } ZDICT_params_t;
 
 /*! ZDICT_finalizeDictionary():
@@ -410,11 +410,11 @@ typedef struct {
  *  Note: ZDICT_trainFromBuffer_legacy() will send notifications into stderr if instructed to, using notificationLevel>0.
  */
 ZDICTLIB_API size_t ZDICT_trainFromBuffer_legacy(
-    void* dictBuffer, size_t dictBufferCapacity, 
-    const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples, 
+    void* dictBuffer, size_t dictBufferCapacity,
+    const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples,
     ZDICT_legacy_params_t parameters);
 
- 
+
 /* Deprecation warnings */
 /* It is generally possible to disable deprecation warnings from compiler,
    for example with -Wno-deprecated-declarations for gcc
@@ -426,7 +426,7 @@ ZDICTLIB_API size_t ZDICT_trainFromBuffer_legacy(
 #  define ZDICT_GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__)
 #  if defined (__cplusplus) && (__cplusplus >= 201402) /* C++14 or greater */
 #    define ZDICT_DEPRECATED(message) [[deprecated(message)]] ZDICTLIB_API
-#  elif defined(__clang__) || (ZDICT_GCC_VERSION >= 405) 
+#  elif defined(__clang__) || (ZDICT_GCC_VERSION >= 405)
 #    define ZDICT_DEPRECATED(message) ZDICTLIB_API __attribute__((deprecated(message)))
 #  elif (ZDICT_GCC_VERSION >= 301)
 #    define ZDICT_DEPRECATED(message) ZDICTLIB_API __attribute__((deprecated))
author	thegeorg <thegeorg@yandex-team.ru>	2022-02-10 16:45:12 +0300
committer	Daniil Cherednik <dcherednik@yandex-team.ru>	2022-02-10 16:45:12 +0300
commit	49116032d905455a7b1c994e4a696afc885c1e71 (patch)
tree	be835aa92c6248212e705f25388ebafcf84bc7a1 /contrib/libs/zstd/lib/zdict.h
parent	4e839db24a3bbc9f1c610c43d6faaaa99824dcca (diff)
download	ydb-49116032d905455a7b1c994e4a696afc885c1e71.tar.gz