1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
|
FFmpeg & evaluating performance on the PowerPC Architecture HOWTO
(c) 2003-2004 Romain Dolbeau <romain@dolbeau.org>
I - Introduction
The PowerPC architecture and its SIMD extension AltiVec offer some
interesting tools to evaluate performance and improve the code.
This document try to explain how to use those tools with FFmpeg.
The architecture itself offers two ways to evaluate the performance of
a given piece of code:
1) The Time Base Registers (TBL)
2) The Performance Monitor Counter Registers (PMC)
The firsts are always available, always active, but they're not very
accurate : the registers increment by one every four *bus* cycle. On
my 667 Mhz tibook (ppc7450) , this means once every twenty *processor*
cycle. So we won't use that.
The PMC are much more useful : not only they can report cycle-accurate
timing, but they can also be used to monitor many other parameters,
such as the number of AltiVec stalls for every kind of instructions,
or instruction cache misses. The downside is that not all processors
support the PMC (all G3, all G4 and the 970 do support them), and
they're inactive by default - you need to activate them with a
dedicated tool. Also, the number of available PMC depend on the
procesor : the various 604 have 2, the various 75x (aka. G3) have 4,
anbd the various 74xx (aka G4) have 6.
*WARNING*: The powerpc 970 is not very well documented, and its PMC
registers are 64bits wide. To properly notify the code, you *must*
tune for the 970 (using --tune=970), or the code will assume 32bits
registers.
II - Enabling FFmpeg PowerPC performance support
This need to be done by hand. First, you need to configure FFmpeg as
usual, plus using the "--powerpc-perf-enable". for instance :
#####
./configure --prefix=/usr/local/ffmpeg-cvs --cc=gcc-3.3 --tune=7450 --powerpc-perf-enable
#####
This will configure FFmpeg to install inside /usr/local/ffmpeg-cvs,
compiling with gcc-3.3 (you should try to use this one or a newer
gcc), and tuning for the PowerPC7450 (i.e. the newer G4 ; as a rule of
thumb, those at 550Mhz and more). It will also enables the PMCs.
You may also edit the file "config.h" to enable the following line:
#####
// #define ALTIVEC_USE_REFERENCE_C_CODE 1
#####
If you enable this line, then the code will not make use of AltiVec,
but will use the reference C code instead. This is useful to compare
performance between the two versions of the code.
Also, the number of enabled PMC is defined in "libavcodec/ppc/dsputil_ppc.h" :
#####
#define POWERPC_NUM_PMC_ENABLED 4
#####
If you have a G4 cpus, you can enable all 6 PMCs. DO NOT enable more
PMCs than available on your cpu !
Then, simply compile ffmpeg as usual (make && make install).
III - Using FFmpeg PowerPC performance support
This FFmeg can be used exactly as usual. But before exiting, Ffmpeg
will dump a per-function report that looks like this:
#####
PowerPC performance report
Values are from the PMC registers, and represent whatever the
registers are set to record.
Function "gmc1_altivec" (pmc1):
min: 231
max: 1339867
avg: 558.25 (255302)
Function "gmc1_altivec" (pmc2):
min: 93
max: 2164
avg: 267.31 (255302)
Function "gmc1_altivec" (pmc3):
min: 72
max: 1987
avg: 276.20 (255302)
(...)
#####
In this example, PMC1 was set to record CPU cycles, PMC2 was set to
record AltiVec Permute Stall Cycle, and PMC3 was set to record AltiVec
Issue Stalls.
The function "gmc1_altivec" was monitored 255302 times, and the
minimum execution time was 231 processor cycles. The max and average
aren't much use, as it's very likely the OS interrupted execution for
reasons of it's own :-(
With the exact same setting and source file, but using the reference C
code we get :
#####
PowerPC performance report
Values are from the PMC registers, and represent whatever the
registers are set to record.
Function "gmc1_altivec" (pmc1):
min: 592
max: 2532235
avg: 962.88 (255302)
Function "gmc1_altivec" (pmc2):
min: 0
max: 33
avg: 0.00 (255302)
Function "gmc1_altivec" (pmc3):
min: 0
max: 350
avg: 0.03 (255302)
(...)
#####
592 cycles, so the fastest AltiVec execution is about 2.5x faster than
the fastest C execution in this example. It's not perfect but it's not
bad (well I wrote this function so I can't say otherwise :-).
Once you have that kind of report, you can try to improve things by
finding what goes wrong and fixing it ; in the example above, one
shoud try to diminish the number of AltiVec stalls, as this *may*
improve performances.
IV) Enabling the PMC in MacOS X
This is easy. Use "Monster" and "monster". Those tools come from
Apple's CHUD package, and can be found hidden in the developer web
site & ftp site. "MONster" is the graphical application, use it to
generate a config file specifying what each register should
monitor. Then use the command-line application "monster" to use that
config file, and enjoy the results.
Note that "MONster" can be used for many other stuff, but it's
documented by Apple, it's not my subject.
V) Enabling the PMC in Linux
I don't know how to do it, sorry :-) Any idea very much welcome.
--
Romain Dolbeau
<romain@dolbeau.org>
|