Extracting License Information From RPM Files and Distributions



I’ve noticed what seems to be a growing demand on behalf of downstream recipients of open source software to provide a list of all the licenses which can be found in the software provided. Generally, what is meant is a ‘license overview’ of the various packages in the software and not a comprehensive analysis of all the licenses in the source package.
About RPM packages and distributions
What a downstream recipient normally receives and installs is a compiled binary package. In the case of the ‘RPM’ distributions such as openSUSE, Fedora and Mandriva, this is a binary ‘RPM’ package (e.g. mypackage-0.1.i586.rpm would refer to a package called ‘mypackage’ in version 0.1 for a i586 (32bit) architecture).
By receiving a compiled binary ‘RPM’ the downstream recipient has the advantage of not needing to compile the software himself/herself. Furthermore, the RPM package generally knows what other package ‘dependencies’ or ‘requirements’ it has. If, for example, mypackage-0.1.i586.rpm requires a particular library from yourpackage version 0.2, then whoever builds mypackage-0.1.i586.rpm would insert a ‘tag’ into the RPM, instructing the rpm package manager to only install the package if yourpackage is present on the system, in the appropriate version. Thus, if I try to install mypackage-0.1.i586.rpm without yourpackage-0.2.i586.rpm already installed, rpm will tell me that there is a missing dependency.
So why is this technical RPM information interesting in the context of what I already stated about a growing demand to provide an overview of the licenses in e.g. a RPM distribution? Simply put, an RPM package can have a number of ‘tags’, one of which is the ‘License’ tag. Other tags are ‘Version’ and ‘Description’. As might be expected, the License tag of an RPM is used to state the license of the package. This, however, is where things get a little more complicated.
Getting the source code
As mentioned, the RPM package is already compiled from source code. The source code for a package will normally be downloaded from an ‘upstream’ source such as http://www.sourceforge.net. In general, the upstream source code will be provided in a compressed tarball. Using the example already provided in the previous paragraph, the source code package for mypackage-0.1.i586.rpm might be provided from Sourceforge asmypackage-0.1.tar.gz (a tarball compressed with gzip). This source code package, when unpacked, may contain numerous source code files, writtenby numerous authors at various stages. It may contain files copied from other open source projects. Thus, there are many different files, with potentially many different licenses, authored by potentially many different authors. Most upstream projects will have defined a ‘project license’. For example, in the case of mypackage-0.1, I might declare the license to be the General Public License Version 2.
Building the RPM package and declaring the license
The RPM ‘License’ tag is declared at the same time as all the other RPM tags – when you create the ‘spec’ file. The spec file is a text document which contains information about the package and instructions on how to build the package. The first number of lines in the spec file will define parameters such as the name of the package, the version of the package, the license of the package and where the sources for building the package are located. Taking the example above, where I have downloaded the GPLv2 ‘mypackage’ from Sourceforge, I would enter something like ‘License: GPLv2’ in the appropriate line in the spec file.
When the package has been successfully built, the RPM file’mypackage-0.1.i586.rpm’ will be produced. This is a binary file which can be installed on any RPM based system (e.g. SUSE, Red Hat, Mandriva). You don’t, however, have to install the RPM in order to access the RPM tags. You do, however, need the rpm program to actually access the information easily.
How to access the rpm information
It is at this stage that I can loop back to the original issue – given a ‘RPM Linux distribution’ such as SUSE Linux or Red Hat Linux, how does one go about getting an overview of the licenses of all of theRPMs in the distribution? It depends on whether you have installed the Linux system or not.
If you have already installed your Linux system, then you could use a command such as the following to get a list of in the format Package Name, Version, License (e.g. mypackage,0.1,GPLv2):
rpm -qa –queryformat ‘%{name},%{version},%{license}n’

If you would like to get license information for any particular package (i.e. where you already know the name of the package), you could use the following command for a package already installed on your system:
rpm -q –queryformat ‘%{name},%{version},%{license}n’ package.rpm
A handy trick which uses the command above can be used if you have a directory full of RPMs – for example if you have received an RPM Linux distribution such as Red Hat Linux or SUSE Linux and you change into the directory in which these RPM files are located (e.g. on a SUSE Linux DVD, they would be located in dvd://suse/):
find . -name “*.rpm” | xargs rpm -qp –queryformat ‘%{name},%{version},%{license}n’
If the package is not already installed, but you have the package RPM file, you could use the same command as we already used (above) to extract the license information:rpm -qp –queryformat ‘%{name},%{version},%{license}n’ package.rpm
Differences in license declarations?
If you perform the above commands on different RPM based systems, you will notice that sometimes, although the RPM package name and version may be the same, the license text output by RPM will not be the same. Take, for example, the “General Public License version 2 or later”. The Fedora project (who, incidentally, have a fantastic license resource at https://fedoraproject.org/wiki/Licensing#SoftwareLicenses) define this as GPLv2+. Some versions of SUSE Linux would have defined this as “GPL Version 2 or later”, though in more recent versions of openSUSE, tag syntax is often similar to the Fedora syntax. Thus, although the license being declared might be the same one, the syntax for declaring the license might be different across different distributions. 

Hopefully some of the commands above may be of some help. It is important to note that the license overview produced by such commands is just a shortcut. It is often the license of the package as declared by the upstream developer. It may be more interesting and informative to browse through the source code of the package, to find out what licenses really are in the package. Some companies have developed software to help you to do so.