September 20, 2016

GlusterFS : Kernel Out of memory

I'm running GlusterFS 3.7.6 on 3 Nodes (3 CentOS Servers) to serve as Apache WebServers. Nowadays i'm keep getting errors something like:

kernel: Out of memory: Kill process 30883 (glusterfs) score 99 or sacrifice child

.. in all of them. It happens like once every week now.

Whenever it happens, the CPU usages jumped to 100% (according to monitors) and then very soon one of the 3 Nodes gone offline, and the cluster seems to stall. Then of course, the whole Web farm is down.

But based on the Apache logs, LB logs, the Web Traffic is still quite normal. There is no sudden PEAK traffic, in order to trigger this panic. Also my Servers are quite far big enough to handle such amount of total traffic so far. (What i'm trying to say is, it is very unlikely to be the case of Apache panic due to high load.)

It rather seems to be something wrong in GlusterFS.

Please help to suggest what seems to be the cause and where should i look into.

Thanks everyone.

