PLoS Journal Outage on April 9
A number of factors contributed to the long outage today. The outage was caused by the sabotaged fiber-optic cable lines San Jose. This affected the network traffic going to United Layer, our co-location facility. United Layer is supposed to have a redundant network line for failover in case something like this happens. I don’t know the details, but this redundant network line wasn’t working. Their engineers finally rerouted their customer’s traffic around the San Jose disruption at 1:43pm PST.
During the outage, we were able to redirect journal traffic to the everyONE Blog which Liz updated throughout the morning. We also launched an Amazon EC2 instance and were (literally) minutes away from having the sites running on EC2 albeit with a snapshot of production data from March 17.
United Layer will be held accountable for their part in the outage. We’ll also look to improve our disaster recovery plans to try and limit the downtime caused by future “catastrophic” outages.
A comment about the outage today left at sfgate.com: “The more complicated they make the plumbing, the easier it is to plug up the pipes.” – Lt. Com. Montgomery Scott