diff options
author | somov <somov@yandex-team.ru> | 2022-02-10 16:45:47 +0300 |
---|---|---|
committer | Daniil Cherednik <dcherednik@yandex-team.ru> | 2022-02-10 16:45:47 +0300 |
commit | a5950576e397b1909261050b8c7da16db58f10b1 (patch) | |
tree | 7ba7677f6a4c3e19e2cefab34d16df2c8963b4d4 /contrib/libs/farmhash/README | |
parent | 81eddc8c0b55990194e112b02d127b87d54164a9 (diff) | |
download | ydb-a5950576e397b1909261050b8c7da16db58f10b1.tar.gz |
Restoring authorship annotation for <somov@yandex-team.ru>. Commit 1 of 2.
Diffstat (limited to 'contrib/libs/farmhash/README')
-rw-r--r-- | contrib/libs/farmhash/README | 326 |
1 files changed, 163 insertions, 163 deletions
diff --git a/contrib/libs/farmhash/README b/contrib/libs/farmhash/README index 146936d21a..a0b2414a2c 100644 --- a/contrib/libs/farmhash/README +++ b/contrib/libs/farmhash/README @@ -1,163 +1,163 @@ -FarmHash, a family of hash functions. -Version 1.1 - -Introduction -============ - -A general overview of hash functions and their use is available in the file -Understanding_Hash_Functions in this directory. It may be helpful to read it -before using FarmHash. - -FarmHash provides hash functions for strings and other data. The functions -mix the input bits thoroughly but are not suitable for cryptography. See -"Hash Quality," below, for details on how FarmHash was tested and so on. - -We provide reference implementations in C++, with a friendly MIT license. - -All members of the FarmHash family were designed with heavy reliance on -previous work by Jyrki Alakuijala, Austin Appleby, Bob Jenkins, and others. - - -Recommended Usage -================= - -Our belief is that the typical hash function is mostly used for in-memory hash -tables and similar. That use case allows hash functions that differ on -different platforms, and that change from time to time. For this, I recommend -using wrapper functions in a .h file with comments such as, "may change from -time to time, may differ on different platforms, and may change depending on -NDEBUG." - -Some projects may also require a forever-fixed, portable hash function. Again -we recommend using wrapper functions in a .h, but in this case the comments on -them would be very different. - -We have provided a sample of these wrapper functions in src/farmhash.h. Our -hope is that most people will need nothing more than src/farmhash.h and -src/farmhash.cc. Those two files are a usable and relatively portable library. -(One portability snag: if your compiler doesn't have __builtin_expect then -you may need to define FARMHASH_NO_BUILTIN_EXPECT.) For those that prefer -using a configure script (perhaps because they want to "make install" later), -FarmHash has one, but for many people it's best to ignore it. - -Note that the wrapper functions such as Hash() in src/farmhash.h can select -one of several hash functions. The selection is done at compile time, based -on your machine architecture (e.g., sizeof(size_t)) and the availability of -vector instructions (e.g., SSE4.1). - -To get the best performance from FarmHash, one will need to think a bit about -when to use compiler flags that allow vector instructions and such: -maes, --msse4.2, -mavx, etc., or their equivalents for other compilers. Those are -the g++ flags that make g++ emit more types of machine instructions than it -otherwise would. For example, if you are confident that you will only be -using FarmHash on systems with SSE4.2 and/or AES, you may communicate that to -the compiler as explained in src/farmhash.cc. If not, use -maes, -mavx, etc., -when you can, and the appropriate choices will be made by via conditional -compilation in src/farmhash.cc. - -It may be beneficial to try -O3 or other compiler flags as well. I also have -found feedback-directed optimization (FDO) to improve the speed of FarmHash. - -The "configure" script: creating config.h -========================================= - -We provide reference implementations of several FarmHash functions, written in -C++. The build system is based on autoconf. It defaults the C++ compiler -flags to "-g -O2", which may or may not be best. - -If you are planning to use the configure script, I generally recommend -trying this first, unless you know that your system lacks AVX and/or AESNI: - - ./configure CXXFLAGS="-g -mavx -maes -O3" - make all check - -If that fails, you can retry with -mavx and/or -maes removed, or with -mavx replaced by --msse4.1 or -msse4.2. - -Please see below for thoughts on cross-platform testing, if that is a concern. -Finally, if you want to install a library, you may use - - make install - -Some useful flags for configure include: - - --enable-optional-builtin-expect: This causes __builtin_expect to be optional. - If you don't use this flag, the assumption is that FarmHash will be compiled - with compilers that provide __builtin_expect. In practice, some FarmHash - variants may be slightly faster if __builtin_expect is available, but it - isn't very important and affects speed only. - -Further Details -=============== - -The above instructions will produce a single source-level library that -includes multiple hash functions. It will use conditional compilation, and -perhaps GCC's multiversioning, to select among the functions. In addition, -"make all check" will create an object file using your chosen compiler, and -test it. The object file won't necessarily contain all the code that would be -used if you were to compile the code on other platforms. The downside of this -is obvious: the paths not tested may not actually work if and when you try -them. The FarmHash developers try hard to prevent such problems; please let -us know if you find bugs. - -To aid your cross-platform testing, for each relevant platform you may -compile your program that uses farmhash.cc with the preprocessor flag -FARMHASHSELFTEST equal to 1. This causes a FarmHash self test to run -at program startup; the self test writes output to stdout and then -calls std::exit(). You can see this in action by running "make check": -see src/farm-test.cc for details. - -There's also a trivial workaround to force particular functions to be used: -modify the wrapper functions in hash.h. You can prevent choices being made via -conditional compilation or multiversioning by choosing FarmHash variants with -names like farmhashaa::Hash32, farmhashab::Hash64, etc.: those compute the same -hash function regardless of conditional compilation, multiversioning, or -endianness. Consult their comments and ifdefs to learn their requirements: for -example, they are not all guaranteed to work on all platforms. - -Known Issues -============ - -1) FarmHash was developed with little-endian architectures in mind. It should -work on big-endian too, but less work has gone into optimizing for those -platforms. To make FarmHash work properly on big-endian platforms you may -need to modify the wrapper .h file and/or your compiler flags to arrange for -FARMHASH_BIG_ENDIAN to be defined, though there is logic that tries to figure -it out automatically. - -2) FarmHash's implementation is fairly complex. - -3) The techniques described in dev/INSTRUCTIONS to let hash function -developers regenerate src/*.cc from dev/* are hacky and not so portable. - -Hash Quality -============ - -We like to test hash functions with SMHasher, among other things. -SMHasher isn't perfect, but it seems to find almost any significant flaw. -SMHasher is available at http://code.google.com/p/smhasher/ - -SMHasher is designed to pass a 32-bit seed to the hash functions it tests. -For our functions that accept a seed, we use the given seed directly (padded -with zeroes as needed); for our functions that don't accept a seed, we hash -the concatenation of the given seed and the input string. - -Some minor flaws in 32-bit and 64-bit functions are harmless, as we -expect the primary use of these functions will be in hash tables. We -may have gone slightly overboard in trying to please SMHasher and other -similar tests, but we don't want anyone to choose a different hash function -because of some minor issue reported by a quality test. - -If your setup is similar enough to mine, it's easy to use SMHasher and other -tools yourself via the "builder" in the dev directory. See dev/INSTRUCTIONS. -(Improvements to that directory are a relatively low priority, and code -there is never going to be as portable as the other parts of FarmHash.) - -For more information -==================== - -http://code.google.com/p/farmhash/ - -farmhash-discuss@googlegroups.com - -Please feel free to send us comments, questions, bug reports, or patches. +FarmHash, a family of hash functions. +Version 1.1 + +Introduction +============ + +A general overview of hash functions and their use is available in the file +Understanding_Hash_Functions in this directory. It may be helpful to read it +before using FarmHash. + +FarmHash provides hash functions for strings and other data. The functions +mix the input bits thoroughly but are not suitable for cryptography. See +"Hash Quality," below, for details on how FarmHash was tested and so on. + +We provide reference implementations in C++, with a friendly MIT license. + +All members of the FarmHash family were designed with heavy reliance on +previous work by Jyrki Alakuijala, Austin Appleby, Bob Jenkins, and others. + + +Recommended Usage +================= + +Our belief is that the typical hash function is mostly used for in-memory hash +tables and similar. That use case allows hash functions that differ on +different platforms, and that change from time to time. For this, I recommend +using wrapper functions in a .h file with comments such as, "may change from +time to time, may differ on different platforms, and may change depending on +NDEBUG." + +Some projects may also require a forever-fixed, portable hash function. Again +we recommend using wrapper functions in a .h, but in this case the comments on +them would be very different. + +We have provided a sample of these wrapper functions in src/farmhash.h. Our +hope is that most people will need nothing more than src/farmhash.h and +src/farmhash.cc. Those two files are a usable and relatively portable library. +(One portability snag: if your compiler doesn't have __builtin_expect then +you may need to define FARMHASH_NO_BUILTIN_EXPECT.) For those that prefer +using a configure script (perhaps because they want to "make install" later), +FarmHash has one, but for many people it's best to ignore it. + +Note that the wrapper functions such as Hash() in src/farmhash.h can select +one of several hash functions. The selection is done at compile time, based +on your machine architecture (e.g., sizeof(size_t)) and the availability of +vector instructions (e.g., SSE4.1). + +To get the best performance from FarmHash, one will need to think a bit about +when to use compiler flags that allow vector instructions and such: -maes, +-msse4.2, -mavx, etc., or their equivalents for other compilers. Those are +the g++ flags that make g++ emit more types of machine instructions than it +otherwise would. For example, if you are confident that you will only be +using FarmHash on systems with SSE4.2 and/or AES, you may communicate that to +the compiler as explained in src/farmhash.cc. If not, use -maes, -mavx, etc., +when you can, and the appropriate choices will be made by via conditional +compilation in src/farmhash.cc. + +It may be beneficial to try -O3 or other compiler flags as well. I also have +found feedback-directed optimization (FDO) to improve the speed of FarmHash. + +The "configure" script: creating config.h +========================================= + +We provide reference implementations of several FarmHash functions, written in +C++. The build system is based on autoconf. It defaults the C++ compiler +flags to "-g -O2", which may or may not be best. + +If you are planning to use the configure script, I generally recommend +trying this first, unless you know that your system lacks AVX and/or AESNI: + + ./configure CXXFLAGS="-g -mavx -maes -O3" + make all check + +If that fails, you can retry with -mavx and/or -maes removed, or with -mavx replaced by +-msse4.1 or -msse4.2. + +Please see below for thoughts on cross-platform testing, if that is a concern. +Finally, if you want to install a library, you may use + + make install + +Some useful flags for configure include: + + --enable-optional-builtin-expect: This causes __builtin_expect to be optional. + If you don't use this flag, the assumption is that FarmHash will be compiled + with compilers that provide __builtin_expect. In practice, some FarmHash + variants may be slightly faster if __builtin_expect is available, but it + isn't very important and affects speed only. + +Further Details +=============== + +The above instructions will produce a single source-level library that +includes multiple hash functions. It will use conditional compilation, and +perhaps GCC's multiversioning, to select among the functions. In addition, +"make all check" will create an object file using your chosen compiler, and +test it. The object file won't necessarily contain all the code that would be +used if you were to compile the code on other platforms. The downside of this +is obvious: the paths not tested may not actually work if and when you try +them. The FarmHash developers try hard to prevent such problems; please let +us know if you find bugs. + +To aid your cross-platform testing, for each relevant platform you may +compile your program that uses farmhash.cc with the preprocessor flag +FARMHASHSELFTEST equal to 1. This causes a FarmHash self test to run +at program startup; the self test writes output to stdout and then +calls std::exit(). You can see this in action by running "make check": +see src/farm-test.cc for details. + +There's also a trivial workaround to force particular functions to be used: +modify the wrapper functions in hash.h. You can prevent choices being made via +conditional compilation or multiversioning by choosing FarmHash variants with +names like farmhashaa::Hash32, farmhashab::Hash64, etc.: those compute the same +hash function regardless of conditional compilation, multiversioning, or +endianness. Consult their comments and ifdefs to learn their requirements: for +example, they are not all guaranteed to work on all platforms. + +Known Issues +============ + +1) FarmHash was developed with little-endian architectures in mind. It should +work on big-endian too, but less work has gone into optimizing for those +platforms. To make FarmHash work properly on big-endian platforms you may +need to modify the wrapper .h file and/or your compiler flags to arrange for +FARMHASH_BIG_ENDIAN to be defined, though there is logic that tries to figure +it out automatically. + +2) FarmHash's implementation is fairly complex. + +3) The techniques described in dev/INSTRUCTIONS to let hash function +developers regenerate src/*.cc from dev/* are hacky and not so portable. + +Hash Quality +============ + +We like to test hash functions with SMHasher, among other things. +SMHasher isn't perfect, but it seems to find almost any significant flaw. +SMHasher is available at http://code.google.com/p/smhasher/ + +SMHasher is designed to pass a 32-bit seed to the hash functions it tests. +For our functions that accept a seed, we use the given seed directly (padded +with zeroes as needed); for our functions that don't accept a seed, we hash +the concatenation of the given seed and the input string. + +Some minor flaws in 32-bit and 64-bit functions are harmless, as we +expect the primary use of these functions will be in hash tables. We +may have gone slightly overboard in trying to please SMHasher and other +similar tests, but we don't want anyone to choose a different hash function +because of some minor issue reported by a quality test. + +If your setup is similar enough to mine, it's easy to use SMHasher and other +tools yourself via the "builder" in the dev directory. See dev/INSTRUCTIONS. +(Improvements to that directory are a relatively low priority, and code +there is never going to be as portable as the other parts of FarmHash.) + +For more information +==================== + +http://code.google.com/p/farmhash/ + +farmhash-discuss@googlegroups.com + +Please feel free to send us comments, questions, bug reports, or patches. |