On Monday, 22 Aug 2011, Carleton lost all internet connectivity, via all providers, including Internet2, for a period of approximately five and a half hours from around 5PM until near 10:30PM. This was caused by a hardware failure in our border router.
All service has been restored.
Here are more details:
Shortly after 5PM on Monday, ITS became aware that internet access had suddenly stopped. Upon investigation, our border router (a Cisco 7206VXR) had failed and was attempting to reboot.
The router was unable to reboot successfully, so troubleshooting began at approximately 5:20. Chris Dlugosz, Carleton’s Network Manager, opened a case with Cisco TAC (Technical Assistant Center) around 5:40PM, and by 6:05, TAC had diagnosed a hardware failure in the router’s CPU. We maintain 24x7x4hour service on this device, so TAC informed Chris that a replacement part would be arriving in Northfield around 9PM.
The replacement parts arrived as scheduled, and with the assistance of TAC, an attempt was made to salvage any meaningful crash or configuration data from the failed router components. After an unsuccessful attempt to retrieve the last known configuration from the failed device, Chris swapped the various memory and solid-state disks into the new hardware and was able to bring the new device on-line, using an older configuration backup. The router was able to begin passing internet traffic around 10:30PM.