April 17, 2009

Is it possible to make a bulletproof cluster?

Is it possible to create a cluster in which any one computer can go down and the cluster will continue to operate? From what I understand about clusters, there is one master computer and the rest are slaves. If a slave goes down it really doesn't matter but if the master goes down the cluster is toast. I need the ultimate reliability and I don't want  to spend tons of money.

Does anyone know if this is possible?

This is a good question but maybe you lack some other information.
You need to specify WHICH KIND of cluster you want to achieve.
A short answer like this can only address you to some hints, please see (http://en.wikipedia.org/wiki/Computer_cluster) and try to understand which kind of cluster you need.
I'll do my own production example, I think it's related to your environment:
I've different services running in my company HA (High Availability) Cluster, let's take the simplest one: SSH.
With an HA Cluster (the kind you need maybe) you've a domain controller and one or more "clients", when client goes down nobody cares, master send you an announce on it (based on your notifications), when master goes down clients check for it and raise "an election" so someone else is your new master, cluster isn't toasted at all.
I don't care about your favorite distro, if you wanna do an HA Cluster (think the one you need) you need at least these two services:
heartbeat and drbd

Heartbeat as the name tells you checks machines (client/master checks themselfs), I suggest you to purchase a second NIC and run heartbeat as well as drbd in it

BRBD is a must have shared disk, you've one or more partitions on each machine with the disk data you want to have available (in my case even virtual machines inside it !), the disk is managed by the master only, clients get replication data from the net (use a second nic or the same used by heartbeat !), so data is well mirrored. When you crash your master (burn it, leave it unpowered, whatever you want...) someone else acting like a master mounts your data (home dirs, databases, files, whatever).

I've realized several HA clusters, without budget and "non corporate" distro (Slackware, Gentoo, Debian), with RHEL or SUSE it's quite easy to have this but they're not needed, unless you require some sort of certifications from sw vendors (read Oracle and others).
Three quick tips:
1- Checkout for the wikipedia link sent, learn from there the kind of cluster you need
2- If your cluster is HA then you've two magic services to learn: heartbeat and drbd, every common distro have these; HA services have quite good documents and cases. http://www.linux-ha.org/ is THE SITE, check everything on it
3- Mailme if you need it, I've got some skills on it, I can even publish an article/blog if you request something

Hope it help

These days many (as said before) different clustering facilities are available for Linux. Even those have dynamic mastering, to keep you going under a lot of bad situations.

In addition to above, there is also the OCFS2 filesystem (and its clustering layer) that provides you fast, ultimately reliable, clustered filesystem for Linux (that is also in the mainline kernel). See http://oss.oracle.com/projects/ocfs2/

Moreover, Oracle has the "Oracle Clusterware" (http://www.oracle.com/technology/products/database/clusterware/index.html) where you can manage any app to be clustered (including and not limited to Oracle products) for enterprise level tasks. The product is also bundled with the Unbreakable Linux Support (http://www.oracle.com/technologies/linux/index.html)


Take a look at this:

First sorry for the spelling I am dislex...

If we rember the history of the Internet and TCP/IP we will rember that part of it's development was from a US milatry project(arpnet) to make such a system. The idear was that if any center of operations was distried or cut of from the rest the IT and comuationtions would still continu.

So from a network view the answer is YES the real question is what software apart from TCP/IP is needed?

There is plenty of free open source cluster software for GNU/Linux. It's also...

There is plenty of free open source cluster software for GNU/Linux. It's also included in Enterprise Distros like RedHat Enterprise Linux. A common usage is, when you have several nodes and a shared storage, which is accessible for all nodes. An application is running as a service on one node and can be moved to another. When one node fails, automatic take over of the service is issued by the cluster software.

I don't know any cluster software for GNU/Linux with master and slave nodes, because the current cluster state and configuration is also stored on the shared storage. If no shared storage is available you can also use data replication between the nodes.

For more information see:




