blob: 346791d3d39e5abc0cba6d58de593a04418cb8ac (
plain) (
blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
|
fast approximate function of float exp(float) and float log(float)
-----------------------------------------------------------------------------
<How to use>
Include fmath.hpp and use fmath::log() and fmath::exp().
fmath::PowGenerator is a class to generate a function to compute pow(x, y)
of x >= 0 for a given fixed y > 0.
eg.
fmath::PowGenerator f(1.234);
f.get(x) returns pow(x, 1.234);
<Prototype>
-----------------------------------------------------------------------------
float fmath::exp(float);
float fmath::log(float);
__m128 fmath::exp_ps(__m128);
__m128 fmath::log_ps(__m128);
-----------------------------------------------------------------------------
<Experimental>
If you install xbyak(https://github.com/herumi/xbyak/)
and define FMATH_USE_XBYAK before including fmath.hpp,
then fmath::exp() and fmath::exp_ps() will be about 10~20 % faster.
Xbyak version uses SSE4.1 if available.
# AVX version of fmath::exp is experimental
<Benchmark>
-----------------------------------------------------------------------------
compiler Visual Studio 2010RC / icc 11.1 / gcc 4.3.2 on cygwin / gcc 4.4.1 on 64bit Linux
option
cl(icl):
/Ox /Ob2 /GS- /Zi /D_SECURE_SCL=0 /MD /Oy /arch:SSE2 /fp:fast /DNOMINMAX
gcc:
-O3 -fomit-frame-pointer -DNDEBUG -fno-operator-names -msse2 -mfpmath=sse -ffast-math -march=core2
unit ; <clocks without dummy loop> / <clocks with loop>
Core i7-2600 3.4GHz
Windows7 SP1 ubuntu 10.10
VC10(32bit) VC10(64bit) gcc4.5(64bit)
std::exp 91.8/101.0 14.1/ 23.5 639.8/650.8
fmath::exp 7.6/ 16.9 1.3/ 10.7 3.1/ 14.1
std::expx4 340.5/351.6 86.9/ 95.5 2608.2/2617.2
fmath::expx4 61.0/ 72.1 42.6/ 51.3 46.5/ 55.5
fmath::exp_ps 22.9/ 34.1 19.9/ 28.6 58.8/ 67.9
std log 57.2/ 66.3 15.9/ 24.1 41.4/ 49.6
fmath log 9.5/ 18.6 4.4/ 12.6 4.8/ 13.0
std logx4 230.8/241.9 94.7/103.5 204.5/213.5
fmath logx4 34.1/ 45.2 28.6/ 37.4 27.8/ 36.8
fmath log_ps 21.1/ 32.2 15.3/ 24.1 56.9/ 65.9
Xeon X5650 2.67GHz
ubuntu 10.04
gcc4.4.3-4(64bit)
std::exp 559.9/568.7
fmath::exp 5.1/ 13.8
std::expx4 2257.1/2266.0
fmath::expx4 44.5/ 53.3
fmath::exp_ps 59.8/ 68.7
std log 45.4/ 54.1
fmath log 4.2/ 12.9
std logx4 212.1/220.8
fmath logx4 27.2/ 35.9
fmath log_ps 58.0/ 66.7
Core i7 2.8GHz
VC2010 icc11.1
Xp(32bit) Xp(64bit) Xp(32bit) Xp(64bit)
std::exp 88.2/ 97.2 14.6/ 22.4 19.1/ 28.8 13.7/ 21.7
fmath::exp 10.2/ 19.2 3.6/ 11.4 8.2/ 17.9 3.5/ 11.5
std::exp x 4 342.0/357.5 94.8/102.8 79.1/ 91.9 94.3/102.6
fmath::exp_ps 25.7/ 41.2 25.2/ 33.3 31.8/ 44.6 25.6/ 34.0
icl::exp_ps 34.8/ 47.6 31.6/ 39.9
std::log 57.9/ 67.1 18.8/ 27.0 41.4/ 51.8 19.6/ 28.2
fmath::log 8.5/ 17.7 3.4/ 11.6 13.3/ 23.7 1.2/ 9.8
std::log x 4 241.2/255.8 114.2/122.5 113.0/127.0 102.0/110.6
fmath::log_ps 23.9/ 38.4 20.5/ 28.8 26.2/ 40.2 23.9/ 32.5
icl::log_ps 34.2/ 48.2 34.8/ 43.4
Core2Duo 2.6GHz
VC2010 icc11.1 gcc 4.4.3 gcc 4.3.2 on cygwin
Xp(32bit) Xp(64bit) Xp(32bit) Xp(64bit) Linux(64bit) Xp(32bit)
std::exp 139.9/150.1 24.5/ 33.0 27.4/ 38.4 18.0/ 27.1 586.0/591.5 157.8/167.9
fmath::exp 10.1/ 20.3 5.6/ 14.1 10.1/ 21.1 5.8/ 14.9 9.1/ 14.6 10.8/ 20.9
std::exp x 4 572.9/585.5 122.5/133.3 107.7/124.2 100.7/111.4 2583.6/2608.5 658.7/694.6
fmath::exp_ps 41.7/ 54.3 35.9/ 46.7 48.4/ 64.9 38.8/ 49.4 49.0/ 73.9 52.8/ 88.7
icl::exp_ps 45.1/ 61.6 47.8/ 58.5
std::log 66.3/ 77.0 22.8/ 31.8 42.8/ 53.9 18.8/ 27.8 82.8/ 91.7 114.7/124.8
fmath::log 10.3/ 21.1 4.2/ 13.3 12.3/ 23.4 2.6/ 11.7 5.0/ 13.9 13.1/ 23.2
std::log x 4 273.3/286.1 125.2/136.1 123.4/139.1 104.7/115.1 329.2/356.3 473.2/509.2
fmath::log_ps 38.4/ 51.2 29.3/ 40.1 36.3/ 52.0 31.8/ 42.3 28.8/ 55.8 50.5/ 86.4
icl::log_ps 58.7/ 74.4 56.3/ 66.7
Quad-Core AMD Opteron 2376
gcc 4.4.1 gcc.4.41
Linux(32bit) Linux(64bit)
std::exp 112.9/128.9 528.7/542.2
fmath::exp 23.7/ 39.6 17.1/ 30.7
std::exp x 4 540.7/562.4 2127.3/2159.7
fmath::exp_ps 108.6/130.3 71.3/103.7
std::log 182.4/198.7 110.7/124.0
fmath::log 23.0/ 39.3 11.6/ 24.9
std::log x 4 827.1/848.6 464.8/497.2
fmath::log_ps 102.0/123.5 76.5/108.9
-----------------------------------------------------------------------------
<Remark>
gcc puts warnings such as "dereferencing type-punned pointer will break strict-aliasing rules."
It is no problem.
Please change #if 1 in fmath.hpp:423 if you worry about it. But it causes a little slower.
-----------------------------------------------------------------------------
<License>
modified new BSD License
http://www.opensource.org/licenses/bsd-license.php
-----------------------------------------------------------------------------
<History>
2011/Mar/25 exp supports AVX
2011/Mar/25 exp, exp_ps support avx
2010/Feb/16 add fmath::exp_ps, log_ps and optimize functions
2010/Jan/10 add fmath::PowGenerator
2009/Dec/28 add fmath::log()
2009/Dec/09 support cygwin
2009/Dec/08 first version
-----------------------------------------------------------------------------
<Author>
http://herumi.in.coocan.jp/
MITSUNARI Shigeo(herumi@nifty.com)
|