Telenor released details yesterday about the causes of its major outage last Friday/Saturday. It seems that a server software upgrade and restart triggered a signalling storm between network servers that got so out of hand it knocked out voice and SMS services – Telenor’s customers were unable to call or send SMS for up to 18 hours.
Telenor is clearly baffled about how this happened – specifically how a signalling storm caused by a “mobile broadband server” could have a knock-on effect of taking out the voice and SMS networks.
A quote from the company said:
“When looking for faults we had indications that the controlled restart of the server for mobile broadband could have triggered the problems, but it was not clear how this could affect the voice traffic. It is not normal that a restart creates such heavy signalling traffic between network servers, which we experienced Friday. The signalling traffic also created disturbances in several network units and it therefore became difficult and time consuming to localise the cause of the breakdown and have it corrected.”
It seems that as the technical team went through its fault management processes, it found one voice server that was particularly overloaded. That server was reconfigured and then the situation “gradually improved”. However, it wasn’t until all the voice servers in the network were disconnected and reconfigured that things got back to normal.
Telenor now believes that the signalling traffic increased far more than normal after the reconfiguration, due to the fact that the signalling level in the network was already high. It says it has has already implemented a number of measures to protect against similar events in the future. ”Primarily, we have increased the capacity in the network. In addition, we have established protective mechanisms that come into force if abnormal increases in signalling traffic occur. Along with several other measures, this has reduced the likelihood that we will be facing the same kind of situation that we experienced on June 10,” said Ragnar Kårhus, CEO. Telenor will also conduct a complete analysis of the design of the mobile network, and will also have the network structure evaluated by an external team.
Whatever the cause, it’s a reminder that capacity isn’t just about access and backhaul. An overloaded signalling network will kill services as surely as an overloaded cell or transmission link.
By the way, Norwegian tech firm Simula has made an animation of the outage – so you can watch Telenor’s servers turn red, as it happened. (sort of)