July 24, 2009, 7:35 am
MPlayer is a popular video playback application which is available for Linux. The h264 video codec provides good visual quality and small file sizes, but requires a fair amount of processing power to decode in real time.
In this two-part article series I'll look at two ways to possibly improve your h264 playback with MPlayer: using multiple CPU cores and using the GPU to offload the decoding. I'll include benchmarks so you can see the effect each of these has on performance. Hopefully after reading these articles you can make an educated guess if either method might be enough to solve your high-definition h264-decoding jitters.
By default MPlayer uses a single thread to decode an h264 video stream. This means that on most dual and quad core machines you are artificially limited in what video files you can smoothly decode with MPlayer. If you have a multicore CPU and are experiencing jittery playback of h264 streams then using the ffmpeg-mt library for decoding might save you from having to purchase new hardware to play your video files. As the name implies, ffmpeg-mt is a multithreaded version of ffmpeg. In theory, if you have 4 cores then they can all be fully used to decode h264 streams. In practice there is always some coordination overhead to multithreading, so you are unlikely to acheive twice the performance when using two cores.
For this article series I'll use two freely available video files: Big Buck Bunny (BBB) (The 1920x1080 H.264 version) and Elephants Dream (ED). BBB is available for download as a h264 file, ED is an mpeg-4 encoded video. Both can be downloaded in 1920x1080.
Unfortunately the high definition 1920x1080 version of ED is encoded as MPEG4. I used the mencoder from the Fedora 9 packages (mencoder-1.0-0.100.20090204svn.fc9.1.x86_64) to transcode the video file into h264 with the following command. The x264 encoding options in x264encopts are taken from the recommendations on MPlayer's website with the addition of crf=16 to tell mencoder to select a bitrate for good quality output.
/usr/bin/mencoder -frames 5000 -oac copy \
-ovc x264 \
-o /tmp/Elephant.avi /.../Elephants_Dream_HD.avi
Notice that I only take the first 5,000 frames of the movie to the output file and copy its audio stream verbatum. I'll refer to the h264 encode of ED as ED264. With 5,000 frames ED264 runs for 208.3 seconds when played back at normal speed (24 frames per second).
To suppliment the computer generated BBB and ED video files in benchmarking, movie trailers for "The Bourne Ultimatum" (TBU) and "I Am Legend" (IAL) were used from h264info.com. Other sample x264 files can be obtained from apple.com and yahoo.com.
To use MPlayer with ffmpeg-mt, you will probably have to compile from source. First checkout MPlayer from its source repository, then check out ffmpeg-mt from its source repository and replace the ffmpeg code in the MPlayer checkout with the code from the ffmpeg-mt checkout. The commands are listed both on MPlayer's news page and this ubuntu forums thread.
The commands I used to build mplayer-mt are shown below. Note that the executable is renamed so it can be installed in parallel with the normal mplayer package from your Linux distribution. I used the --disable-vdpau configure option to disable any advantage that might be obtained during decoding from the GPU as that is the subject of the next article. On my system mplayer would not build without disabling the libdirac codec.
$ svn co svn://svn.mplayerhq.hu/mplayer/trunk mplayer
Checked out external at revision 1174.
Checked out revision 29408.
$ git clone git://gitorious.org/ffmpeg/ffmpeg-mt.git
Initialized empty Git repository in
remote: Counting objects: 96524, done.
$ rm -rf mplayer/libavcodec mplayer/libavformat mplayer/libavutil
$ cp -a ffmpeg-mt/libavcodec ffmpeg-mt/libavformat ffmpeg-mt/libavutil mplayer/
$ cd mplayer
$ ./configure --disable-vdpau --disable-libdirac-lavc
$ nice make -j 4
$ sudo install -m 755 mplayer /usr/local/bin/mplayer-mt
When grafting the libraries from one project into another, you sometimes expect things to have changed in subtle ways and you might have to do a little handy work. The versions I used were current at 1 July 2009 and the ff_codec_wav_tags symbol had been renamed to just codec_wav_tags in the ffmpeg-mt branches. Removing the "ff_" prefix from both references to these symbols in the mp_taglists.c from mplayer allowed the build to complete.
cc -Wundef -Wdisabled-optimization -Wno-pointer-sign
-Wdeclaration-after-statement -std=gnu99 -Wall -Wno-switch
-Wpointer-arith -Wredundant-decls -O4 -march=native -mtune=native
-pipe -ffast-math -fomit-frame-pointer -D_LARGEFILE_SOURCE
--D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -Ilibdvdread4 -I.
--D_REENTRANT -I/usr/include/directfb -I/usr/include/
--I/usr/include/SDL -D_REENTRANT -pthread -I/usr/include/kde/artsc
--I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -D_REENTRANT
--c -o libmpdemux/mp_taglists.o libmpdemux/mp_taglists.c
libmpdemux/mp_taglists.c:54: error: 'ff_codec_wav_tags' undeclared here
(not in a function)
libmpdemux/mp_taglists.c:99: error: 'ff_codec_bmp_tags' undeclared here (not in a function)
libmpdemux/mp_taglists.c:99: error: initializer element is not constant
libmpdemux/mp_taglists.c:99: error: (near initialization for 'mp_bmp_taglists')
make: *** [libmpdemux/mp_taglists.o] Error 1
For comparison, a second mplayer binary was built using the same mplayer source version with the same configure options, the difference being that the normal build uses the default single threaded ffmpeg. The mplayer commands below use the single threaded version while the ffmpeg-mt version is called mplayer-mt.
All playback commands use the following options to mplayer. The codec is forced to h264, video output is disabled so that frames are simply thrown away after they have been decoded, -benchmark runs the video as fast as possible instead of at 24 or 25 frames per second, and -nosound is used to take sound decoding out of the benchmark.
mplayer -vc ffh264 -vo null -benchmark -nosound
In addition, two runs were done with mplayer-mt, using either 2 or 4 threads. I would have liked to run the ffmpeg-mt code with a single thread to see what overhead if any was introduced, but attempts at this caused mplayer to crash. The only additional option was -lavdopts threads=N where N is 2 or 4. For simplicitly I'll refer to the standard mplayer run as mplayer-single, the 2 thread ffmpeg-mt mplayer as mplayer-mt2 and the 4 thread version as mplayer-mt4.
The ED264 5,000 frame file was the first to be tested. The VC Runtime is how much time was spent in the video codec as reported by mplayer -benchmark. Higher FPS are obviously a good thing, and having some breathing room above real time (25) affords you more chance that more difficult scenes will still be decoded in real time. The last column shows how much faster than the single threaded mplayer the multithreaded versions are.
Only the first three minutes were used from BBB for the benchmarks shown below. The -endpos 3:00 was used to limit the benchmark to the first three minutes.
The benchmarks for the entire movie trailer for "The Bourne Ultimatum" (TBU) and "I Am Legend" (IAL) are shown below.
The figures for the above trailers would seem to indicate that there is no problem to solve with ffmpeg-mt. Getting 90 FPS using just a single core should be enough. The below test limits the benchmark to a 20 second interval starting 64 seconds into the "I Am Legend" trailer. The extra command line options are -ss 64 -endpos 20. Notice that the FPS for single core decoding has dropped from 90 to 53 for this segment. The last thing you want to have happen during video playback is for an action packed scene to slow to below real time playback.
Although the streams used did not have difficult enough action to cause a single core of the Intel Q6600 to generate an average FPS below real time, we are most interested in the relative performance of using ffmpeg-mt in this article. Using two cores does not give you twice the performance but 1.6 to 1.8 times. Likewise, four cores gives you 2.25 to 2.6 times the playback performance. Clearly the gain is more noticeable from one to two cores, but if you have four you might as well take advantage of the extra decoding headroom available to you.
Tune in next time when we'll take a look at how well GPU accelerated h264 decoding works. If using multiple CPU cores is not enough, perhaps a new graphics card can enable your current machine to smoothly decode your video streams.