A botched software update to networking gear caused one of GitHub’s all-time worst outages last weekend, the second major disruption that customers of the popular social coding platform have suffered through in the past several weeks.
In a blog post, Github’s Mark Imbriaco explained that the December 22 outage came during a software update to its aggregation switches that were recommended by GitHub’s unnamed network vendor. The update was supposed to take care of the problem that led to the last major outage the company had in late November.
The problem was further exacerbated by the subsequent impacts on the fileservers which, combined with the other problems, forced the company to put the service into maintenance mode. Service disruptions lasted about 19 hours. GitHub was not out the entire time but latency and loss of access did persist over an extended period.
Data center networks can have a variety of switches…
View original post 311 more words