Author: JT Smith
– Updated at 2:40 p.m. eastern time June 25 –
On Saturday, June 23, the primary controller in the router that controls access to all OSDN servers hosted at the Exodus facility in Waltham, MA, suffered a catastrophic failure. The sites affected were Slashdot, freshmeat, NewsForge, and Mediabuilder, among others.Update: The secondary controller did not automatically take over as it should have. The secondary controller did not automatically take over as it should have. OSDN network admins are trying to determine whether the outage was related to their
configurations or something related to a problem at Web hosting service Exodus.
OSDN and Cisco people, working through Saturday night, were unable to cure the problem. Sunday afternoon, OSDN employee Kurt Gray and Cisco rep Scott, working by telephone, were stepping through the router’s configuration and, says Kurt, as they worked to undo other changes that had been made, “on one reset everything came back.”
OSDN network operations were already in the process of rebuilding the company’s network to eliminate the router as a potential single point of failure.
As of 7 p.m. US EDT most of the sites were available at least part of the time, but full service was not yet restored. There may still be slowdowns or intermttent failures until a permanent fix is made.
We’ll have a more complete story within a few days. Right now, OSDN network operations staff members are too busy working to talk.