kernel.org: Setting the record straight

7

Author: JT Smith

H. Peter Anvin of Transmeta commented about the recent kernel.org outage in this post to the LKML: “It has come to my attention there are some unfounded rumours about
the
kernel.org outage from this past weekend, and I wanted to set the
record straight:”

In particular, the outage was *not* caused by any kind of failure in
the new Compaq server hardware.  It worries me a bit that this
particular rumour was circulating, since it reflects badly on a donor
who has just provided us with a very nice machine.

The approximate history of the failure is as such:

Back in December, we were already planning to replace the old server
hardware after repeated problems, probably age-related.  We were
originally planning to put the new server in production after the
holidays, however, when the old server really started to flake out on
us we decided to push it in service early.

ISC was very accomodating and arranged for us to put it in service on
short notice, despite several logistics problem.  One of those
problems was the lack of a configured port for the management card in
the new server.  As a result, it did not get wired up at that time.

This past Friday morning, the kernel on the server apparently stopped
servicing user-space processes; the details aren't known, other than
the fact that pings and TCP SYNs received replies, but we couldn't
get
any actual data across.  Futhermore, on and off there was as much as
95% packet loss in pings, although the machine didn't stop responding
to pings until it was power cycled on Sunday.

Due to a miscommunication between myself and the staff at ISC we
weren't able to get the machine power cycled until Sunday (it did not
return from the power cycle) and the management port connected until
Monday.  Once the management port got connected, it was a 5-minute
job
to bring the machine back to life.

Finally, I would like to thank the many people and organizations who
have offered to host another kernel.org server in one way or another.
I'm going to be evaluating our options, with the goal of getting at
least one additional server if at all possible.  If so, my preference
will definitely be to try to obtain identical hardware with the one
we
currently have.

          -=hpa

-- 
 at work,  in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt    

Category:

  • Linux