mirror of
				https://github.com/nyanmisaka/ffmpeg-rockchip.git
				synced 2025-10-27 02:41:54 +08:00 
			
		
		
		
	
		
			
				
	
	
		
			173 lines
		
	
	
		
			5.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			173 lines
		
	
	
		
			5.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| FFmpeg & evaluating performance on the PowerPC Architecture HOWTO
 | |
| 
 | |
| (c) 2003-2004 Romain Dolbeau <romain@dolbeau.org>
 | |
| 
 | |
| 
 | |
| 
 | |
| I - Introduction
 | |
| 
 | |
| The PowerPC architecture and its SIMD extension AltiVec offer some
 | |
| interesting tools to evaluate performance and improve the code.
 | |
| This document tries to explain how to use those tools with FFmpeg.
 | |
| 
 | |
| The architecture itself offers two ways to evaluate the performance of
 | |
| a given piece of code:
 | |
| 
 | |
| 1) The Time Base Registers (TBL)
 | |
| 2) The Performance Monitor Counter Registers (PMC)
 | |
| 
 | |
| The first ones are always available, always active, but they're not very
 | |
| accurate: the registers increment by one every four *bus* cycles. On
 | |
| my 667 Mhz tiBook (ppc7450), this means once every twenty *processor*
 | |
| cycles. So we won't use that.
 | |
| 
 | |
| The PMC are much more useful: not only can they report cycle-accurate
 | |
| timing, but they can also be used to monitor many other parameters,
 | |
| such as the number of AltiVec stalls for every kind of instruction,
 | |
| or instruction cache misses. The downside is that not all processors
 | |
| support the PMC (all G3, all G4 and the 970 do support them), and
 | |
| they're inactive by default - you need to activate them with a
 | |
| dedicated tool. Also, the number of available PMC depends on the
 | |
| procesor: the various 604 have 2, the various 75x (aka. G3) have 4,
 | |
| and the various 74xx (aka G4) have 6.
 | |
| 
 | |
| *WARNING*: The PowerPC 970 is not very well documented, and its PMC
 | |
| registers are 64 bits wide. To properly notify the code, you *must*
 | |
| tune for the 970 (using --tune=970), or the code will assume 32 bit
 | |
| registers.
 | |
| 
 | |
| 
 | |
| II - Enabling FFmpeg PowerPC performance support
 | |
| 
 | |
| This needs to be done by hand. First, you need to configure FFmpeg as
 | |
| usual, but add the "--powerpc-perf-enable" option. For instance:
 | |
| 
 | |
| #####
 | |
| ./configure --prefix=/usr/local/ffmpeg-svn --cc=gcc-3.3 --tune=7450 --powerpc-perf-enable
 | |
| #####
 | |
| 
 | |
| This will configure FFmpeg to install inside /usr/local/ffmpeg-svn,
 | |
| compiling with gcc-3.3 (you should try to use this one or a newer
 | |
| gcc), and tuning for the PowerPC 7450 (i.e. the newer G4; as a rule of
 | |
| thumb, those at 550Mhz and more). It will also enable the PMC.
 | |
| 
 | |
| You may also edit the file "config.h" to enable the following line:
 | |
| 
 | |
| #####
 | |
| // #define ALTIVEC_USE_REFERENCE_C_CODE 1
 | |
| #####
 | |
| 
 | |
| If you enable this line, then the code will not make use of AltiVec,
 | |
| but will use the reference C code instead. This is useful to compare
 | |
| performance between two versions of the code.
 | |
| 
 | |
| Also, the number of enabled PMC is defined in "libavcodec/ppc/dsputil_ppc.h":
 | |
| 
 | |
| #####
 | |
| #define POWERPC_NUM_PMC_ENABLED 4
 | |
| #####
 | |
| 
 | |
| If you have a G4 CPU, you can enable all 6 PMC. DO NOT enable more
 | |
| PMC than available on your CPU!
 | |
| 
 | |
| Then, simply compile FFmpeg as usual (make && make install).
 | |
| 
 | |
| 
 | |
| 
 | |
| III - Using FFmpeg PowerPC performance support
 | |
| 
 | |
| This FFmeg can be used exactly as usual. But before exiting, FFmpeg
 | |
| will dump a per-function report that looks like this:
 | |
| 
 | |
| #####
 | |
| PowerPC performance report
 | |
|  Values are from the PMC registers, and represent whatever the
 | |
|  registers are set to record.
 | |
|  Function "gmc1_altivec" (pmc1):
 | |
|         min: 231
 | |
|         max: 1339867
 | |
|         avg: 558.25 (255302)
 | |
|  Function "gmc1_altivec" (pmc2):
 | |
|         min: 93
 | |
|         max: 2164
 | |
|         avg: 267.31 (255302)
 | |
|  Function "gmc1_altivec" (pmc3):
 | |
|         min: 72
 | |
|         max: 1987
 | |
|         avg: 276.20 (255302)
 | |
| (...)
 | |
| #####
 | |
| 
 | |
| In this example, PMC1 was set to record CPU cycles, PMC2 was set to
 | |
| record AltiVec Permute Stall Cycles, and PMC3 was set to record AltiVec
 | |
| Issue Stalls.
 | |
| 
 | |
| The function "gmc1_altivec" was monitored 255302 times, and the
 | |
| minimum execution time was 231 processor cycles. The max and average
 | |
| aren't much use, as it's very likely the OS interrupted execution for
 | |
| reasons of its own :-(
 | |
| 
 | |
| With the exact same settings and source file, but using the reference C
 | |
| code we get:
 | |
| 
 | |
| #####
 | |
| PowerPC performance report
 | |
|  Values are from the PMC registers, and represent whatever the
 | |
|  registers are set to record.
 | |
|  Function "gmc1_altivec" (pmc1):
 | |
|         min: 592
 | |
|         max: 2532235
 | |
|         avg: 962.88 (255302)
 | |
|  Function "gmc1_altivec" (pmc2):
 | |
|         min: 0
 | |
|         max: 33
 | |
|         avg: 0.00 (255302)
 | |
|  Function "gmc1_altivec" (pmc3):
 | |
|         min: 0
 | |
|         max: 350
 | |
|         avg: 0.03 (255302)
 | |
| (...)
 | |
| #####
 | |
| 
 | |
| 592 cycles, so the fastest AltiVec execution is about 2.5x faster than
 | |
| the fastest C execution in this example. It's not perfect but it's not
 | |
| bad (well I wrote this function so I can't say otherwise :-).
 | |
| 
 | |
| Once you have that kind of report, you can try to improve things by
 | |
| finding what goes wrong and fixing it; in the example above, one
 | |
| should try to diminish the number of AltiVec stalls, as this *may*
 | |
| improve performance.
 | |
| 
 | |
| 
 | |
| 
 | |
| IV) Enabling the PMC in Mac OS X
 | |
| 
 | |
| This is easy. Use "Monster" and "monster". Those tools come from
 | |
| Apple's CHUD package, and can be found hidden in the developer web
 | |
| site & FTP site. "MONster" is the graphical application, use it to
 | |
| generate a config file specifying what each register should
 | |
| monitor. Then use the command-line application "monster" to use that
 | |
| config file, and enjoy the results.
 | |
| 
 | |
| Note that "MONster" can be used for many other things, but it's
 | |
| documented by Apple, it's not my subject.
 | |
| 
 | |
| If you are using CHUD 4.4.2 or later, you'll notice that MONster is
 | |
| no longer available. It's been superseeded by Shark, where
 | |
| configuration of PMCs is available as a plugin.
 | |
| 
 | |
| 
 | |
| 
 | |
| V) Enabling the PMC on Linux
 | |
| 
 | |
| On linux you may use oprofile from http://oprofile.sf.net, depending on the
 | |
| version and the cpu you may need to apply a patch[1] to access a set of the
 | |
| possibile counters from the userspace application. You can always define them
 | |
| using the kernel interface /dev/oprofile/* .
 | |
| 
 | |
| [1] http://dev.gentoo.org/~lu_zero/development/oprofile-g4-20060423.patch
 | |
| 
 | |
| --
 | |
| Romain Dolbeau <romain@dolbeau.org>
 | |
| Luca Barbato <lu_zero@gentoo.org>
 | 
