Linux.com

Community Blogs



How to setup EPEL repository on CentOS 5/6

What is EPEL repository EPEL (Extra Packages for Enterprise Linux) is a project from the Fedora group that maintains a repository of software packages that are not already present on RHEL/CentOS. The repository is compatible with RHEL and all close derivates like CentOS and Scientific Linux. By using epel we can easily install many packages (around 10,000) with yum command, that are not already present in the centos repositories. EPEL packages are usually based...
Read more... Comment (0)
 

Aptitude : Package Management in Debian Based Operating Systems

Apt, whether it is apt-get or apt-cache, is normally CLI based utility. If you prefer to use graphical environment (GUI), Aptitude is for you. The beauty of Aptitude is that, it can be used both in CLI mode and GUI mode. If it is run without any parameter/argument, it can be operated in GUI mode, CLI mode use of Aptitude is similar to that of apt-get command. There is an alternative to Aptitude if you are more comfortable with GUI mode, it is known as Synaptic. We will limit our discussion up to Aptitude and this article will help you understanding the basic use of Aptitude in GUI mode as well as CLI mode.

 

Read more on YourOwnLinux...

 

Setting Up a Multi-Node Hadoop Cluster with Beagle Bone Black

Learning map/reduce frameworks like Hadoop is a useful skill to have but not everyone has the resources to implement and test a full system. Thanks to cheap arm based boards, it is now more feasible for developers to set up a full Hadoop cluster. I coupled my existing knowledge of setting up and running single node Hadoop installs with my BeagleBone cluster from my previous post to create my second project. This tutorial goes through the steps I took to set up and run Hadoop on my Ubuntu cluster. It may not be a practical application for everyone to learn but currently distributed map/reduce experience is a good skill to have. All the machines in my cluster are already set up with Java and SSH from my first project so you may need to install them if you don't have them.

Set Up

The first step naturally is to download Hadoop from apache's site on each machine in the cluster and untar it. I used version 1.2.1 but version 2.0 and above is now available. I placed the resulting files in /usr/local and named the directory hadoop. With the files on the system, we can create a new user called hduser to actual run our jobs and a group for it called hadoop:

sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser

With the user created, we will make hduser the owner of the directory containing hadoop:

sudo chown -R hduser:hadoop /usr/local/hadoop

Then we create a temp directory to hold files and make the hduser the owner of it as well:

sudo mkdir -p /hadooptemp
sudo chown hduser:hadoop /hadooptemp

With the directories set up, log in as hduser. We will start by updating the .bashrc file in the home directory. We need to add two export lines at the top to point to our hadoop and java locations. My java installation was openjdk7 but yours may be different:

# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-armhf
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin

Next we can navigate to our hadoop installation directory and locate the conf directory. Once there we need to edit the hadoop-env.sh file and uncomment the java line to point to the location of the java installation again:

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-armhf

Next we can update core-site.xml to point to the temp location we created above and specify the root of the file system. Note that the name in the default url is the name of the master node:


<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/hadooptemp</value>
    <description>Root temporary directory.</description>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://beaglebone1:54310</value>
    <description>URI pointing to the default file system.</description>
  </property>
</configuration>

Next we can edit mapred-site.xml:


  

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>beaglebone1:54311</value>
<description>The host and port that the MapReduce job tracker runs
at.</description>
</property>
</configuration>

Finally we can edit hdfs-site.xml to list how many replication nodes we want. In this case I chose all 3:


<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.</description>
</property>
</configuration>

Once this has been done on all machines we can edit some configuration files on the master to tell it which nodes are slaves. Start by editing the masters file to make sure it just contains our host name:

beaglebone1

Now edit the slaves file to list all the nodes in our cluster:

beaglebone1
beaglebone2
beaglebone3

Next we need to make sure that our master can communicate to its slaves. Make sure hosts file on your nodes contains the names of all the nodes in your cluster. Now on the master, create a key and copy it to the authorized_keys:

ssh-keygen -t rsa -P ""
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Then copy the key to other nodes:

ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@beaglebone2

Finally we can test the connections from the master to itself and others using ssh.

Starting Hadoop

The first step is to format HDFS on the master. From the bin directory in hadoop run the following:

hadoop namenode -format

Once that is finished Now we can start the NameNode and JobTracker daemons on the master. We simply need to execute these two commands from the bin directory:

start-dfs.sh
start-mapred.sh

To stop them later we can use:

stop-mapred.sh
stop-dfs.sh

Running an Example

With Hadoop running on the master and slaves, we can test out one of the examples. First we need to create some files on our system. Create some text files in a directory of your chosing with a few words in each. We can then copy the files to hdfs. From the main hadoop directory, execute the following:

bin/hadoop dfs -mkdir /user/hduser/test
bin/hadoop dfs -copyFromLocal /tmp/*.txt /user/hduser/test

The first line will call mkdir on HDFS to create a directory /user/hduser/test. The second line will copy files I created in /tmp to the new HDFS directory. Now we can run the wordcount sample against it:

bin/hadoop hadoop-examples-1.2.1.jar wordcount /user/hduser/test /user/hduser/testout

The jar file name will vary based on what version of hadoop you downoaded. Once the job is finished, it will output the results in HDFS to /user/hduser/testout. To view the resulting files we can do this:

bin/hadoop dfs -ls /user/hduser/testout

We can then use the cat command to show the contents of the output:

bin/hadoop dfs -cat /user/hduser/testout/part-r-00000

This file will show us each word found and the number of times it was found. If we want to see proof that the job ran on all nodes, we can view the logs on the slaves from the hadoop/logs directory. For example, on the beaglebone2 node I can do this:

cat hadoop-hduser-datanode-beaglebone2.log

When I examined the file, I could see messages at the end showing the jobname and data received and sent, letting me know that all was well.

Conclusion

If you through all of this and it worked, congratulations on setting up a working cluster. Due to the slow performance of the BeagleBone's SD card, it is not the best device for getting actual work done. However, these steps are applicable to faster arm devices as they come along. In the meantime, the BeagleBone Black is a great platform for practice and learning how to set up distributed systems.

 

How To Parse Squid Proxy Access.log File using Squid Analyzer

Squid provides access.log to record all user activities which through it. IT Administrator can parse the file to see what happens there. But access.log is a raw file. You really to read it carefully to get valuable information. Since access.log file is a raw file, a third party software is needed to process it into a human readable information. Read more about squid analyzer parser tool posted at Linoxide.

 

Install Google Chrome 31 in CentOS/RHEL 6, Fedora 19/18

Google Chrome is a freeware web browser developed by Google. It was released as a beta version for Microsoft Windows on September 2, 2008, and as a stable public release on December 11, 2008. Chrome30 is latest version available released by Google.

Read article Install Google chrome on CentOS & RHEL 6 and Fedora 19/18 for installing latest google chrome on your Linux PC

 

3 Tips to Get Your Linux Project Funded With Kickstarter

Editor's Note: This is a guest blog post contributed by writer Annie Delgado on behalf of BlueFire PR. 

Linux thrives on collaboration, and a new fundraising trend is raising capital for startups with that spirit in mind. The crowdfunding revolution is in full swing and Kickstarter is the leading crowdfunding platform, so far raising $862 million to fund 51,365 projects. Technology campaigns are at the forefront of this platform. According to Kickstarter, the average successful technology project raises more than $75,000 dollars, the most of any category.

Read more... Comment (0)
 

Install MariaDB on CentOS 6.4

MariaDB MariaDB is the community developed fork of Mysql and is a great alternative to it. Its free and open source and is developed by the original developers of mysql. MariaDB is much superior to mysql in terms of features. Check out the comparison between mariadb and mysql. And the best thing is, that its a drop-in replacement for mysql, which means that just install mariadb in place of mysql and...

Read more... Comment (0)
 

Building a Compute Cluster with the BeagleBone Black

Building a Compute Cluster with the BeagleBone Black

As a developer, I've always been interested in learning about and developing for new technologies. Distributed and parallel computing are two topics I'm especially interested in, leading to my interest in creating a home cluster. Home clusters are of course nothing new and can easily be done using old desktops running Linux. Constantly running desktops (and laptops) consume space, use up a decent amount of power, cost money to set up and can emit a fair amount of heat. Thankfully, there has been a recent explosion of enthusiast interest in cheap arm based computers, the most popular of which is the Raspberry Pi. With a small size, extremely low power consumption and great Linux support, arm based boards are great for developer home projects. While the Raspberry Pi is a great little package and enjoys good community support, I decided to go with an alternative, the BeagleBone Black.

Launched in 2008, the original BeagleBoard was developed by Texas Instruments as an open source computer. It featured a 720 MHz Cortex A8 arm chip and 256MB of memory. The BeagleBoard-xm and BeagleBone were released in subsequent years leading to the BeagleBone Black as the most recent release. Though its $45 price tag is a little higher than a Raspberry Pi, it has a faster 1GHz Cortex 8 chip, 512 MB of RAM and extra USB connections. In addition to 2GB of onboard memory that comes with a pre-installed Linux distribution, there is a micro SD card slot allowing you to load additional versions of Linux and boot to them instead. Thanks to existing support for multiple Linux distributions such as Debian and Ubuntu, BeagleBone Black looked to me like a great inexpensive starting point for creating my very own home server cluster.

Setting up the Cluster

For my personal cluster I decided to start small and try it out with just three machines. The list of equipment that I bought is as follows:

1x 8 port gigabit switch
3x beaglebone blacks
3x ethernet cables
3x 5V 2 amp power supplys
3x 4 GB microSD cards

To keep it simple, I decided to build a command line cluster that I would control through my laptop or desktop. The BeagleBone Black supports HDMI output so you can use them as standalone computers but I figured that would not be necessary for my needs. The simplest way to get the BeagleBone Black running is to use the supplied USB cable to hook it up to an existing computer and SSH to the pre-installed OS. For my project though I chose to use the SD card slot and start with a fresh install. To accomplish this, I had to first load a version of Linux on to each of the thre SD cards. I used my existing Ubuntu Linux machine with a USB SD card reader to accomplish this task.

Initial searches for BeagleBone compatible distributions reveals there are a few places to download them. I decided to go with Ubuntu and found a nice pre-created image from http://rcn-ee.net/deb/rootfs/raring/. At the time I searched and downloaded, the most recent image was from August but there are now more recent builds. Once you have un-tared the file, you will see a lot of files and directories inside the newly created folder. Included is a nice utility for loading the OS on to an SD card called setup_sdcard.sh. If you aren't sure what device Linux is reading your SD card as, you can use the following to show you your devices:

sudo ./setup_sdcard.sh --probe-mmc

On my machine the SD card was listed as /dev/sdb with its main partition showing as /dev/sdb1. If you see the partition listed as I did, you need to unmount it before you can install the image on it. Once the card was ready, I ran the following:

sudo ./setup_sdcard.sh --mmc /dev/sdb --uboot bone

This command took care of the full install of the OS on to the SD card. Once it was finished I repeated I for the other two SD cards. The default user name for the installed distribution is ubuntu with password temppwd. I inserted the SD cards in to the BeagleBones and them connected them to the ethernet switch.

The last step was to power them up and boot them using the micro SD cards. Doing this required holding down the user boot button while connecting the 5V power connector. The button is located on a little raised section near the usb port and tells the device to read from the SD card. Once you see the lights flashing repeatedly you can release the button. Since each instance will have the same default hostname when initially booting, it is advisable to power them on one at a time and follow the steps below to set the IP and hostname before powering up the next one.

Configuring the BeagleBones

Once the hardware is set up and a machine is connected to the network, Putty or any other SSH client can be used to connect to the machines. The default hostname to connect to using the above image is ubuntu-armhf. My first task was to change the hostname. I chose to name mine beaglebone1, beaglebone2 and beaglebone3. First I used the hostname command:

sudo hostname beaglebone1

Next I edited /etc/hostname and placed the new hostname in the file. The next step was to hard code the IP address for so I could probably map it in the hosts file. I did this by editing /etc/network/interfaces to tell it to use static IPs. In my case I have a local network with a router at 192.168.1.1. I decided to start the IP addresses at 192.168.1.51 so the file on the first node looked like this:

    iface eth0 inet static
       address 192.168.1.51
       netmask 255.255.255.0
       network 192.168.1.0
       broadcast 192.168.1.255
       gateway 192.168.1.1

It is usually a good idea to pick something outside the range of IPs that your router might assign if you are going to have a lot of devices. Usually you can configure this range on your router. With this done, the final step to perform was to edit /etc/hosts and list the name and IP address of each node that would be in the cluster. My file ended up looking like this on each of them:

127.0.0.1     localhost
192.168.1.51  beaglebone1
192.168.1.52  beaglebone2
192.168.1.53  beaglebone3

Creating a Compute Cluster With MPI

After setting up all 3 BeagleBones, I was ready to tackle my first compute project. I figured a good starting point for this was to set up MPI. MPI is a standardized system for passing messages between machines on a network. It is powerful in that it distributes programs across nodes so each instance has access to the local memory of its machine and is supported by several languages such as C, Python and Java. There are many versions of MPI available so I chose MPICH which I was already familiar with. Installation was simple, consisting of the following three steps:

sudo apt-get update
sudo apt-get install gcc
sudo apt-get install libcr-dev mpich2 mpich2-doc

MPI works by using SSH to communicate between nodes and using a shared folder to share data. The first step to allowing this was to install NFS. I picked beaglebone1 to act as the master node in the MPI cluster and installed NFS server on it:

sudo apt-get install nfs-client

With this done, I installed the client version on the other two nodes:

sudo apt-get install nfs-server

Next I created a user and folder on each node that would be used by MPI. I decided to call mine hpcuser and started with its folder:

sudo mkdir /hpcuser

Once it was created on all the nodes, I synced up the folders by issuing this on the master node:

echo "/hpcuser *(rw,sync)" | sudo tee -a /etc/exports

Then I mounted the master's node on each slave so they can see any files that are added to the master node:

sudo mount beaglebone1:/hpcuser /hpcuser

To make sure this is mounted on reboots I edited /etc/fstab and added the following:

beaglebone1:/hpcuser    /hpcuser    nfs

Finally I created the hpcuser and assigned it the shared folder:

sudo useradd -d /hpcuser hpcuser

With network sharing set up across the machines, I installed SSH on all of them so that MPI could communicate with each:

sudo apt-get install openssh-server

The next step was to generate a key to use for the SSH communication. First I switched to the hpcuser and then used ssh-keygen to create the key.

su - hpcuser
ssh­keygen ­-t rsa

When performing this step, for simplicity you can keep the passphrase blank. When asked for a location, you can keep the default. If you want to use a passphrase, you will need to take extra steps to prevent SSH from prompting you to enter the phrase. You can use ssh-agent to store the key and prevent this. Once the key is generated, you simply store it in our authorized keys collection:

 
cd .ssh
cat id_rsa.pub >> authorized_keys

I then verified that the connections worked using ssh:

ssh hpcuser@beaglebone2

Testing MPI

Once the machines were able to successfully connect to each other, I wrote a simple program on the master node to try out. While logged in as hpcuser, I created a simple program in its root directory /hpcuser called mpi1.c. MPI needs the program to exist in the shared folder so it can run on each machine. The program below simply displays the index number of the current process, the total number of processes running and the name of the host of the current process. Finally, the main node receives a sum of all the process indexes from the other nodes and displays it:

#include <mpi.h>
#include <stdio.h>
int main(int argc, char* argv[])
{
    int rank, size, total;
    char hostname[1024];
    gethostname(hostname, 1023);
    MPI_Init(&argc, &argv);
    MPI_Comm_rank (MPI_COMM_WORLD, &rank);
    MPI_Comm_size (MPI_COMM_WORLD, &size);
    MPI_Reduce(&rank, &total, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
    printf("Testing MPI index %d of %d on hostname %s\n", rank, size, hostname);
    if (rank==0)
    {
        printf("Process sum is %d\n", total);
    }
    MPI_Finalize();
    return 0;
}

Next I created a file called machines.txt in the same directory and placed the names of the nodes in the cluster inside, one per line. This file tells MPI where it should run:

beaglebone1
beaglebone2
beaglebone3

With both files created, I finally compiled the program using mpicc and ran the test:

mpicc mpi1.c -o mpiprogram
mpiexec -n 8 -f machines.txt ./mpiprogram

This resulted in the following output demonstrating it ran on all 3 nodes:

Testing MPI index 4 of 8 on hostname beaglebone2
Testing MPI index 7 of 8 on hostname beaglebone2
Testing MPI index 5 of 8 on hostname beaglebone3
Testing MPI index 6 of 8 on hostname beaglebone1
Testing MPI index 1 of 8 on hostname beaglebone2
Testing MPI index 3 of 8 on hostname beaglebone1
Testing MPI index 2 of 8 on hostname beaglebone3
Testing MPI index 0 of 8 on hostname beaglebone1
Process sum is 28

Additional Projects

While MPI is a fairly straightforward starting point, a lot of people are more familiar with Hadoop. To test out Hadoop compatibility, I downloaded version 1.2 of Hadoop (hadoop-1.2.1.tar.gz) from the Hadoop downloads at http://www.apache.org/dyn/closer.cgi/hadoop/common. After following the basic set up steps I was able to get it running simple jobs on all nodes. Hadoop, however, shows a major limitation of the BeagleBone which is the speed of SD cards. As a result, using HDFS for jobs is especially slow so you may have mixed luck running anything this is disk IO heavy.

Another great use of the BeagleBones are as web servers and code repositories. It is very easy to install Git and Apache or Node.js the same as you would on other Ubuntu servers. Additionally you can install Jenkins or Hudson to create your own personal build server. Finally, you can utilize all the hookups of the BeagleBone and install xbmc to turn a BeagleBone in to a full media server.

The Future

In addition to single core boards such as the BeagleBones or Raspberry Pi, there are dual core boards starting to appear such as Pandaboard and Cubieboard with likely more on the way. The latter is priced only a little higher than the BeagleBone, supports connecting a 2.5 inch SATA hard disk and features a dual core chip in its latest version. Similar steps to those performed here can be used to set them up, giving hobbyists like me some really good options for home server building. I encourage anyone with the time to try them out and see what you can create.

 

 

Linux Shell Script To Monitor Space Usage and Send Email

Linux shell script to check /var logs space and send email if used space reach 80%. Also print space usage of each directory inside /var. Useful to find out which folder use most of space under /var. This script really helps system administrator to monitor their servers space usage. Based on the requirement , administrators can change the directoires they want to monitor.

 

#!/bin/bash

LIMIT='80'

#Here we declare variable LIMIT with max of used spave

DIR='/var'

#Here we declare variable DIR with name of directory

MAILTO=' This e-mail address is being protected from spambots. You need JavaScript enabled to view it '

#Here we declare variable MAILTO with email address

SUBJECT="$DIR disk usage"

#Here we declare variable SUBJECT with subject of email

MAILX='mailx'

#Here we declare variable MAILX with mailx command that will send email

which $MAILX > /dev/null 2>&1

#Here we check if mailx command exist

if ! [ $? -eq 0 ]

#We check exit status of previous command if exit status not 0 this mean that mailx is not installed on system

then

          echo "Please install $MAILX"

#Here we warn user that mailx not installed

          exit 1

#Here we will exit from script

fi

cd $DIR

#To check real used size, we need to navigate to folder

USED=`df . | awk '{print $5}' | sed -ne 2p | cut -d"%" -f1`

#This line will get used space of partition where we currently, this will use df command, and get used space in %, and after cut % from value.

if [ $USED -gt $LIMIT ]

#If used space is bigger than LIMIT

then

      du -sh ${DIR}/* | $MAILX -s "$SUBJECT" "$MAILTO"

#This will print space usage by each directory inside directory $DIR, and after MAILX will send email with SUBJECT to MAILTO

fi

Sample Output

./check_var.sh

37M     /var/cache

32K     /var/db

8.0K    /var/empty

4.0K    /var/games

70M     /var/lib

4.0K    /var/local

8.0K    /var/lock

38M     /var/log

0       /var/mail

4.0K    /var/nis

4.0K    /var/opt

4.0K    /var/preserve

88K     /var/run

220K    /var/spool

37M     /var/tmp

24M     /var/www

4.0K    /var/yp

Read more linux shell scripts

 

 

Eztables: simple yet powerful firewall configuration for Linux

Anyone who ever has a need to setup a firewall on Linux may be interested in Eztables.

It doesn't matter if you need to protect a laptop, server or want to setup a network firewall. Eztables supports it all.

If you're not afraid to touch the command line and edit a text file, you may be quite pleased with Eztables. 

Some features:

  • Basic input / output filtering
  • Network address translation (NAT)
  • Port address translation (PAT)
  • Support for VLANs
  • Working with Groups / Objects to aggregate hosts and services
  • Logging to syslog
  • Support for plugins
  • Automatically detects all network interfaces

 

 

How to Install TeamViewer 9 on Linux

TeamViewer is very useful app for connecting remote systems with graphical environment in easy steps. Till now most of users used on it windows systems. But as the desktop users are switching to linux distribution's, So they will requires TeamViewer on linux desktop also.  

This article How to Install TeamViewer 9 on Linux Distributions will provide you easy steps to install it.

 
Page 7 of 125

Who we are ?

The Linux Foundation is a non-profit consortium dedicated to the growth of Linux.

More About the foundation...

Frequent Questions

Join / Linux Training / Board