Search your IT data with Splunk

27

Author: Anže Vidmar

When something goes wrong in an IT server farm, it can take days for system administrators to find the root cause. Splunk is an enterprise-level search tool that can index logs and IT data, including server events, network events, and application events from one or more servers or network devices. You can then search data from across all your servers from just one place with a single browser- or console-based tool. It’s designed for data mining in real-time, allowing system administrators to quickly and easily find the cause of a problem on the network.

Splunk runs on Intel (x86), SPARC (Solaris), and PPC (Mac OS X) platforms, under Linux, FreeBSD, Solaris, and Mac OS X operating systems. Despite the fact that Splunk is not available for Microsoft Windows operating systems, it’s capable of collecting data from Windows servers, as well as any network device.

The Linux version is available as a tarball or .rpm or .deb binary package. It’s not open source software, but the company does offer a version that’s free of charge. The free version allows you to index up to 500MB of data a day. If you need more than that you’ll have to purchase a professional license, but you can sign up for a free 30-day Splunk professional license to try it out.

After downloading the software, extract the package content and run setup script to configure the Splunk daemon, then fire up a Web browser and point it to the server where Splunk is running, using the default port 8000, and configure it. The software runs as service, so it doesn’t need much in the way of resources. When service is started the system starts 18 splunkd processes that are responsible for gathering and indexing data. Each process running on my Debian Sarge system used less than 1% of processor power, and all splunkd processes together took up around 20MB of RAM. Note that you need only one installed instance of Splunk in order to have a functional Splunk server, but if you would like to send Splunk data from one server to another (e.g. from one site to another) you need Splunk to be installed on the other server too. Also note that for distribution searches you don’t need more than one installation of Splunk. The company offers a detailed installation manual.

Gathered data is indexed and stored in a single file; there’s no need for a separate database back end.

Splunk provides the ability to automatically use the correct type of data for indexing, so that users don’t need to use any pre-made templates. For example, if you setup a Cisco PIX Firewall to send all its logs to the Splunk server, the Splunk service automatically recognizes the data as Cisco PIX log data type and classifies it as a specific source type — in our example, a syslog data. The same goes for other source types, such as Microsoft IIS, Microsoft Exchange, SNMP, NFS, and other inputs.

While Splunk indexes all the data it collects in real time, even from remote servers, the network connection for data transport is secured with encryption. The Web and command-line interfaces to the server provide read-only access to the data.

Splunk lets you send alerts or trigger shell scripts depending on specific events. For example, if you know that a dying hard disk is writing to log files a lot of I/O error events, you can make an alert that triggers the moment the “I/O error” string is written to a log file. When an event is triggered, Splunk can alert an admin via email, generates an entry in an RSS feed, or run a shell script. You can create triggers and alerts for practically any event that needs your attention.

Splunk also offers its own event log database. If you find a strange event in your logs and you would like to know what it means, instead of searching Google, you can simply select the event and choose to look it up on the Web. You’re also able to send your event to the database, helping other Splunkers with your piece of information about a specific event.

Conclusion

Splunk can really help you solve problems quickly. While Splunk may be used mainly in data centers on big computer farms of mobile operators and telecom groups, it could be useful on any computer, even at home.