We had some major issues with the production web server that runs Railpage yesterday. Halfway through the update process to bring it in-line with our development platform it had a catastrophic failure with the boot partition, which turned out to be a known issue with this version of Ubuntu. This meant waiting while a recovery image was uploaded to the datacentre, which took a loooong time. Eventually it got there. It's yet another issue we've had with Ubuntu and OS upgrades, so now the decision has been made to look at alternative platforms.
We've since then discovered some more faults with the production web server, including one where lsof runs rampant and gobbles up all our CPU time. We have plans to replace this web server, but as we're running at capacity VM-wise that involves a trip to Sydney with a new ESXi box.
It's very disappointing to see code that was so thoroughly tested on one web server and operate for months without failure rendered useless because of faults with another web server.