Whatever happened to redundancies? Without knowing more, I'm just speculating, but (1) if this was a hardware failure, there should be redundancies to take over (this should extend as far as a complete secondary control centre that can take the load in the event the primary control centre goes offline) and (2) if this was a software failure, the problem should have been caught in testing.I don't know what happened, but as a general rule it's impossible in software to test for every single thing that could ever go wrong, and there is no such thing as a bug free program. There will always be something out of the blue which no one ever thought of. Programmers have a saying: "Murphy was an optimist". Having said that, there could perhaps have been better handling of an unknown error.
The computer failure has shut down the entire network, according to the map on the Metro website.
Of course - I totally agree. It's impossible to check for every single possibility. However, one is supposed to design software to degrade or fail "gracefully"; if you don't implement proper exception-handling, then your program can just completely crash (and possibly corrupt useful data in the process) if something unexpected happens.
The fact that this computer failure brought down the whole network would suggest that there were no redundancies (or they all failed as well) or that the system or software has not been designed with quick recovery in mind if something does cause a catastrophic error.Edit
: The other point to add is that testing should try to cover all the common scenarios in which a catastrophic failure can occur and test the system's ability to handle and recover from them.