V/Line Communications System - Complete Failure

 
  8502 Assistant Commissioner

The entire communications system for V/Line has failed leaving trains stranded and services across the network stopped.

Due to a communications fault, all V/Line train services will be held in place until further notice. As a result, customers can expect significant delays of up to 90 minutes. We apologise for the delay in your journey. [10:50 18/05]
Vline

Total system wide meltdown

Sponsored advertisement

  The Vinelander Minister for Railways

Location: Ballan, Victoria on the Ballarat Line
The entire communications system for V/Line has failed leaving trains stranded and services across the network stopped.

Due to a communications fault, all V/Line train services will be held in place until further notice. As a result, customers can expect significant delays of up to 90 minutes. We apologise for the delay in your journey. [10:50 18/05]

Total system wide meltdown
8502

And the reason this isn't posted here is...

https://www.railpage.com.au/f-t11398025.htm

M.
  8502 Assistant Commissioner



Total system wide meltdown
And the reason this isn't posted here is...

https://www.railpage.com.au/f-t11398025.htm

M.
The Vinelander

Hi Mike can't find a reference to today in that thread.
  bevans Site Admin

Location: Melbourne, Australia
Trains have been stuck at Sunshine for some time.



Imagine if you were on a metro service can they get through the metro area?
  freightgate Minister for Railways

Location: Albury, New South Wales
Is this a power failure or systems failure at centrol ?
  The Vinelander Minister for Railways

Location: Ballan, Victoria on the Ballarat Line


Total system wide meltdown
And the reason this isn't posted here is...

https://www.railpage.com.au/f-t11398025.htm

M.
Hi Mike can't find a reference to today in that thread.
8502

That's where a disruption is meant to go.

I didn't bother posting in there...but, for future reference that's where these interruptions and cancellations are meant to be posted.

Trains were back to normal past my place by 13:00.

M.
  8502 Assistant Commissioner

With lines being managed individually from Geelong, Ballarat, Bendigo why does power outage at a control centre in Melbourne cause all lines to stop or did they all go down?
  Carnot Minister for Railways

No idea if it is at all connected, but METROL were meant to do a big software change this morning on core TCMS data at both the Primary and Disaster Recovery sites...
  jakar Deputy Commissioner

Location: Melbourne
With lines being managed individually from Geelong, Ballarat, Bendigo why does power outage at a control centre in Melbourne cause all lines to stop or did they all go down?
8502
Who has mentioned a power outage? Also those lines you mentioned are not 'managed' individually.

Centrol was unable to receive incoming calls. This obviously creates a safety issue if the train controllers can't receive calls so trains were held until it was rectified. Local comms (UHF) never went down AFAIK.
  SinickleBird Deputy Commissioner

Location: Qantas Club at Mudgee International Airport
Strange idea IMO to do a software upgrade on both Primary and Disaster Recovery sites on the same day.

Isn’t Disaster Recovery intended to cover situations where something goes wrong?

Give the guy who scheduled this a promotion Rolling Eyes
  bevans Site Admin

Location: Melbourne, Australia
Strange idea IMO to do a software upgrade on both Primary and Disaster Recovery sites on the same day.

Isn’t Disaster Recovery intended to cover situations where something goes wrong?

Give the guy who scheduled this a promotion Rolling Eyes
SinickleBird

Do we know if that was actually the cause?
  8502 Assistant Commissioner

With lines being managed individually from Geelong, Ballarat, Bendigo why does power outage at a control centre in Melbourne cause all lines to stop or did they all go down?
Who has mentioned a power outage? Also those lines you mentioned are not 'managed' individually.

Centrol was unable to receive incoming calls. This obviously creates a safety issue if the train controllers can't receive calls so trains were held until it was rectified. Local comms (UHF) never went down AFAIK.
jakar

Hi Jakar that was my assumption.

Can you please explain the comment about individual managed?
  historian Chief Commissioner

With lines being managed individually from Geelong, Ballarat, Bendigo why does power outage at a control centre in Melbourne cause all lines to stop or did they all go down?
Who has mentioned a power outage? Also those lines you mentioned are not 'managed' individually.

Centrol was unable to receive incoming calls. This obviously creates a safety issue if the train controllers can't receive calls so trains were held until it was rectified. Local comms (UHF) never went down AFAIK.

Can you please explain the comment about individual managed?
8502

Train control functions on all V/Line lines are managed centrally from Centrol (as has always been the case).

Signalling functions are also largely centralised in Centrol; in particular the signallers were withdrawn from Geelong (except South Geelong), Ballarat, Bendigo, and Traralgon around a decade ago.

Centrol is consequently a single point of failure, both technically and physically.

Depending on the actual technical failure, if the regional signallers had still been controlling their corridors there may have been no need to stop trains just because Centrol failed.
  jakar Deputy Commissioner

Location: Melbourne
With lines being managed individually from Geelong, Ballarat, Bendigo why does power outage at a control centre in Melbourne cause all lines to stop or did they all go down?
Who has mentioned a power outage? Also those lines you mentioned are not 'managed' individually.

Centrol was unable to receive incoming calls. This obviously creates a safety issue if the train controllers can't receive calls so trains were held until it was rectified. Local comms (UHF) never went down AFAIK.

Hi Jakar that was my assumption.

Can you please explain the comment about individual managed?
8502
You said that the Geelong, Ballarat & Bendigo lines were 'managed' (I can only assume you mean signalled/controlled) from those locations. With the exception of a couple of signal boxes around the place + ground frames etc, all train control and the majority of the signalling around the state is done in Melbourne at Centrol.

Edit: beat me to it historian, only saw your post when I submitted mine!
  8502 Assistant Commissioner

I recall reading that RFR lines a decision was made to control these lines in the regional centres like Geelong, Ballarat, Bendigo and Traralgon.  This created jobs for regional areas and now those jobs have been lost to single point of failure for the entire network?
  Lockspike Chief Commissioner

Train control functions on all V/Line lines are managed centrally from Centrol (as has always been the case).

Signalling functions are also largely centralised in Centrol; in particular the signallers were withdrawn from Geelong (except South Geelong), Ballarat, Bendigo, and Traralgon around a decade ago.

Centrol is consequently a single point of failure, both technically and physically.

Depending on the actual technical failure, if the regional signallers had still been controlling their corridors there may have been no need to stop trains just because Centrol failed.
historian
Agreed Historian; why could trains not work on signal indication? Train Order sections excepted.

Over the history of railways, direct voice communication between control and trains is only a recent innovation. Trains used to run quite successfully without it.

Is V/line so insecure about their signalling system that they don't have the confidence to let trains run without the metaphoric hand reaching out to stop them?
  Lockspike Chief Commissioner

Strange idea IMO to do a software upgrade on both Primary and Disaster Recovery sites on the same day.

Isn’t Disaster Recovery intended to cover situations where something goes wrong?

Give the guy who scheduled this a promotion Rolling Eyes

Do we know if that was actually the cause?
bevans
Whether the software upgrade was a factor in the failure, or not, why wasn't such fiddling done in the small hours of the night, when a hiccup causes little concern, and the nerds not allowed to leave until it is proven to be fully functional?
  jakar Deputy Commissioner

Location: Melbourne
Train control functions on all V/Line lines are managed centrally from Centrol (as has always been the case).

Signalling functions are also largely centralised in Centrol; in particular the signallers were withdrawn from Geelong (except South Geelong), Ballarat, Bendigo, and Traralgon around a decade ago.

Centrol is consequently a single point of failure, both technically and physically.

Depending on the actual technical failure, if the regional signallers had still been controlling their corridors there may have been no need to stop trains just because Centrol failed.
Agreed Historian; why could trains not work on signal indication? Train Order sections excepted.

Over the history of railways, direct voice communication between control and trains is only a recent innovation. Trains used to run quite successfully without it.

Is V/line so insecure about their signalling system that they don't have the confidence to let trains run without the metaphoric hand reaching out to stop them?
Lockspike
Safety standards and policies change over time, no matter what the industry. The inability for trains to contact Centrol in emergency situations, or external services such as Police and Metrol to pass on critical information that may save lives is more than enough to halt services. Its not uncommon for individual services to be cancelled due to a radio/comms fault, its just that you never hear about it.
  BrentonGolding Chief Commissioner

Location: Maldon Junction
Strange idea IMO to do a software upgrade on both Primary and Disaster Recovery sites on the same day.

Isn’t Disaster Recovery intended to cover situations where something goes wrong?

Give the guy who scheduled this a promotion Rolling Eyes

Do we know if that was actually the cause?
Whether the software upgrade was a factor in the failure, or not, why wasn't such fiddling done in the small hours of the night, when a hiccup causes little concern, and the nerds not allowed to leave until it is proven to be fully functional?
Lockspike
I have a good friend who has spent the last 20 years doing this kind of stuff for large corporations, most recently with a big bank. The whole time I have known her this stuff has been done in the middle of the night, often on a Saturday night / Sunday morning, when "traffic" is the quietest

It is quite often an "allnighter" with prep starting in the evening, the Gee-oh button being hit at around midnight and staff and contractors on hand for hours afterwards to monitor for faults and act on same should something go wrong and they have to either go back to the previous version or do some serious fixing if it really goes badly

I have no knowledge about what was done in this case but if it is true that a software upgrade was attempted during a peak service period and that caused this outage then someone needs to fall on their sword, definitely not up to industry standards
  historian Chief Commissioner

I would be very surprised if it was planned work gone wrong.

The WON regularly notifies all concerned of work on the core telecommunications system. This is invariably scheduled in the wee small hours of the morning for exactly the reasons that BrentonGolding mentioned, and is notified in the WON precisely because of the impact on train movement.

It's more likely that something broke. The something need not even be under the direct control of V/Line if they lease comms from a telco.
  historian Chief Commissioner

Agreed Historian; why could trains not work on signal indication? Train Order sections excepted.

Over the history of railways, direct voice communication between control and trains is only a recent innovation. Trains used to run quite successfully without it.

Is V/line so insecure about their signalling system that they don't have the confidence to let trains run without the metaphoric hand reaching out to stop them?
Safety standards and policies change over time, no matter what the industry. The inability for trains to contact Centrol in emergency situations, or external services such as Police and Metrol to pass on critical information that may save lives is more than enough to halt services. Its not uncommon for individual services to be cancelled due to a radio/comms fault, its just that you never hear about it.
jakar

I suspect it's more basic than that. The modern rulebook is based around the idea that the driver can always contact the signaller and train controller (on most of V/Line these roles are combined in one person). Anything happens, the signaller/train controller authorises/supervises the recovery. This goes down to the level of passing any signal (including automatics) at Stop. No telecommunications, no recovery.

With hindsight, this was actually a necessary safety change. Safeworking has, for over 100 years, been fundamentally based on the idea that two (or more) people need to make a mistake for an accident to happen. In the old days there was a team of three on the train (driver, fireman, & guard), and usually a signaller on the ground at the location (*). Now the driver is on their own, hence you need a second person involved. Hence a requirement for telecommunications.

Emergency situations are a bit of a furphy. In an emergency the train is full of mobile telephones; including the driver's. Yes, there are reception blackspots, but there are also emergencies where the driver is incapacitated or the radio system inoperable. And reception blackspots would normally be resolvable by moving a short distance out of the train. (It's an emergency; the train would be stopped.) This is not to say that radio is not very important for efficiently and safely handling emergencies; it's just not critical.

The problem, really, is the concentration in Centrol which makes a single point of failure. In the recent past the regional signallers could have kept their lines going even if Centrol went out (i.e. the signaller acting as the train controller). This was not maintained because it was cheaper in staff and accommodation to centralise the signallers/train controllers. What's the cost of a half hour total disruption versus regional signal boxes? It's worth noting in this context, that Metrol's strategy with the Melbourne Metro is to run the line from two dispersed signalling centres (Sunshine and Dandenong) supervised by Metrol.

(*) This is not to say it always worked; Sunshine is a good counter example.

Sponsored advertisement

Subscribers: 8502, bevans, jakar, SinickleBird

Display from: