aboutsummaryrefslogtreecommitdiffstats
path: root/contrib/libs/farmhash/README
blob: a0b2414a2c62842fa36519cf32b8196a38557f4e (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
FarmHash, a family of hash functions. 
Version 1.1 
 
Introduction 
============ 
 
A general overview of hash functions and their use is available in the file 
Understanding_Hash_Functions in this directory.  It may be helpful to read it 
before using FarmHash. 
 
FarmHash provides hash functions for strings and other data.  The functions 
mix the input bits thoroughly but are not suitable for cryptography.  See 
"Hash Quality," below, for details on how FarmHash was tested and so on. 
 
We provide reference implementations in C++, with a friendly MIT license. 
 
All members of the FarmHash family were designed with heavy reliance on 
previous work by Jyrki Alakuijala, Austin Appleby, Bob Jenkins, and others. 
 
 
Recommended Usage 
================= 
 
Our belief is that the typical hash function is mostly used for in-memory hash 
tables and similar.  That use case allows hash functions that differ on 
different platforms, and that change from time to time.  For this, I recommend 
using wrapper functions in a .h file with comments such as, "may change from 
time to time, may differ on different platforms, and may change depending on 
NDEBUG." 
 
Some projects may also require a forever-fixed, portable hash function.  Again 
we recommend using wrapper functions in a .h, but in this case the comments on 
them would be very different. 
 
We have provided a sample of these wrapper functions in src/farmhash.h.  Our 
hope is that most people will need nothing more than src/farmhash.h and 
src/farmhash.cc.  Those two files are a usable and relatively portable library. 
(One portability snag: if your compiler doesn't have __builtin_expect then 
you may need to define FARMHASH_NO_BUILTIN_EXPECT.)  For those that prefer 
using a configure script (perhaps because they want to "make install" later), 
FarmHash has one, but for many people it's best to ignore it. 
 
Note that the wrapper functions such as Hash() in src/farmhash.h can select 
one of several hash functions.  The selection is done at compile time, based 
on your machine architecture (e.g., sizeof(size_t)) and the availability of 
vector instructions (e.g., SSE4.1). 
 
To get the best performance from FarmHash, one will need to think a bit about 
when to use compiler flags that allow vector instructions and such: -maes, 
-msse4.2, -mavx, etc., or their equivalents for other compilers.  Those are 
the g++ flags that make g++ emit more types of machine instructions than it 
otherwise would.  For example, if you are confident that you will only be 
using FarmHash on systems with SSE4.2 and/or AES, you may communicate that to 
the compiler as explained in src/farmhash.cc.  If not, use -maes, -mavx, etc., 
when you can, and the appropriate choices will be made by via conditional 
compilation in src/farmhash.cc. 
 
It may be beneficial to try -O3 or other compiler flags as well.  I also have 
found feedback-directed optimization (FDO) to improve the speed of FarmHash. 
 
The "configure" script: creating config.h 
========================================= 
 
We provide reference implementations of several FarmHash functions, written in 
C++.  The build system is based on autoconf.  It defaults the C++ compiler 
flags to "-g -O2", which may or may not be best. 
 
If you are planning to use the configure script, I generally recommend 
trying this first, unless you know that your system lacks AVX and/or AESNI: 
 
  ./configure CXXFLAGS="-g -mavx -maes -O3" 
  make all check 
 
If that fails, you can retry with -mavx and/or -maes removed, or with -mavx replaced by 
-msse4.1 or -msse4.2. 
 
Please see below for thoughts on cross-platform testing, if that is a concern. 
Finally, if you want to install a library, you may use 
 
  make install 
 
Some useful flags for configure include: 
 
  --enable-optional-builtin-expect: This causes __builtin_expect to be optional. 
    If you don't use this flag, the assumption is that FarmHash will be compiled 
    with compilers that provide __builtin_expect.  In practice, some FarmHash 
    variants may be slightly faster if __builtin_expect is available, but it 
    isn't very important and affects speed only. 
 
Further Details 
=============== 
 
The above instructions will produce a single source-level library that 
includes multiple hash functions.  It will use conditional compilation, and 
perhaps GCC's multiversioning, to select among the functions.  In addition, 
"make all check" will create an object file using your chosen compiler, and 
test it.  The object file won't necessarily contain all the code that would be 
used if you were to compile the code on other platforms.  The downside of this 
is obvious: the paths not tested may not actually work if and when you try 
them.  The FarmHash developers try hard to prevent such problems; please let 
us know if you find bugs. 
 
To aid your cross-platform testing, for each relevant platform you may 
compile your program that uses farmhash.cc with the preprocessor flag 
FARMHASHSELFTEST equal to 1.  This causes a FarmHash self test to run 
at program startup; the self test writes output to stdout and then 
calls std::exit().  You can see this in action by running "make check": 
see src/farm-test.cc for details. 
 
There's also a trivial workaround to force particular functions to be used: 
modify the wrapper functions in hash.h.  You can prevent choices being made via 
conditional compilation or multiversioning by choosing FarmHash variants with 
names like farmhashaa::Hash32, farmhashab::Hash64, etc.: those compute the same 
hash function regardless of conditional compilation, multiversioning, or 
endianness.  Consult their comments and ifdefs to learn their requirements: for 
example, they are not all guaranteed to work on all platforms. 
 
Known Issues 
============ 
 
1) FarmHash was developed with little-endian architectures in mind.  It should 
work on big-endian too, but less work has gone into optimizing for those 
platforms.  To make FarmHash work properly on big-endian platforms you may 
need to modify the wrapper .h file and/or your compiler flags to arrange for 
FARMHASH_BIG_ENDIAN to be defined, though there is logic that tries to figure 
it out automatically. 
 
2) FarmHash's implementation is fairly complex. 
 
3) The techniques described in dev/INSTRUCTIONS to let hash function 
developers regenerate src/*.cc from dev/* are hacky and not so portable. 
 
Hash Quality 
============ 
 
We like to test hash functions with SMHasher, among other things. 
SMHasher isn't perfect, but it seems to find almost any significant flaw. 
SMHasher is available at http://code.google.com/p/smhasher/ 
 
SMHasher is designed to pass a 32-bit seed to the hash functions it tests. 
For our functions that accept a seed, we use the given seed directly (padded 
with zeroes as needed); for our functions that don't accept a seed, we hash 
the concatenation of the given seed and the input string. 
 
Some minor flaws in 32-bit and 64-bit functions are harmless, as we 
expect the primary use of these functions will be in hash tables.  We 
may have gone slightly overboard in trying to please SMHasher and other 
similar tests, but we don't want anyone to choose a different hash function 
because of some minor issue reported by a quality test. 
 
If your setup is similar enough to mine, it's easy to use SMHasher and other 
tools yourself via the "builder" in the dev directory.  See dev/INSTRUCTIONS. 
(Improvements to that directory are a relatively low priority, and code 
there is never going to be as portable as the other parts of FarmHash.) 
 
For more information 
==================== 
 
http://code.google.com/p/farmhash/ 
 
farmhash-discuss@googlegroups.com 
 
Please feel free to send us comments, questions, bug reports, or patches.