Very misleading thread title. It wasn't a data breach - it was a privacy breach. The data was provided willingly.I'm still struggling to see what the issue is here.
Isn't the only reason that the data could be used to identify a particular person is because that person tweeted that they caught a particular train?
If PTV/DOT remove all personal details from the individual Myki transactions then what is the problem. All the person interrogating the data knows is that there is some random person who regularly travels from Point A to Point B. I am inclined to agree with DOT on this one and think that it is a media beat up. Or am I missing something here?
Because it only takes a small amount of effort to pull out the travel of any specific individual from the released data set.
All you need is one or two known data points - times and places the person touched on or off. Let's say you know that your subject of interest touched on at Prahan around 10.30 today. There might be 20 Myki touch-ons at Prahan around that time. You also know that they went to North Richmond. Only one of those touch-ons touched off at North Richmond. Bingo, you've found your person. Or you know that yesterday, they touched on at Prahan at 9.30. Another 20 Myki touch-ons, but it's almost certain that only one of the Myki cards will appear in both sets. Again bingo.
Once you know *one* Myki travel record, you have the Myki card number. You can then extract all the travel records of that person. That would tell you a lot - where do they usually travel, and what time. Roughly where do they live or work.
You might argue that no-one would bother with doing this. Stalkers would. Controlling ex partners would.
Any set of timecoded geographic data points is very difficult to deidentify because an individual's movement over time is unique. This is a well known problem with similar data sets - the log of the cells your mobile phone is registered in; fitness trackers that log your run, etc.