The recent hack of Ashley Madison, and the subsequent discussion, reminded me of something I’ve been meaning to talk about for some time.
Can a web site ever truly delete your data?
This is usually expressed, as my title suggests, by a user asking the web site who hosted that user’s account (and usually directly as a result of a data breach) why that web site still had the user’s data.
This can be because the user deliberately deleted their account, or simply because they haven’t used the service in a long time, and only remembered that they did by virtue of a breach notification letter (or a web site such as Troy Hunt’s haveibeenpwned.com).
1. Is there a ‘delete’ feature?
Web sites do not see it as a good idea to have a ‘delete’ feature for their user accounts – after all, what you’re asking is for a site to devote developer resources to a feature that specifically curtails the ability of that web site to continue to make money from the user.
To an accountant’s eye (or a shareholder’s), that’s money out the door with the prospect of reducing money coming in.
To a user’s eye, it’s a matter of security and trust. If the developer deliberately misses a known part of the user’s lifecycle (sunset and deprecation are both terms developers should be familiar with), it’s fairly clear that there are other things likely to be missing or skimped on. If a site allows users to disconnect themselves, to close their accounts, there’s a paradox that says more users will choose to continue their service, because they don’t feel trapped.
So, let’s assume there is a “delete” or “close my account” feature – and that it’s easy to use and functional.
2. Is there a ‘whoops’ feature for the delete?
In the aftermath of the Ashley Madison hack, I’m sure there’s going to be a few partners who are going to engage in retributive behaviours. Those behaviours could include connecting to any accounts that the partners have shared, and cause them to be closed, deleted and destroyed as much as possible. It’s the digital equivalent of cutting the sleeves off the cheating partner’s suit jackets. Probably.
Assuming you’ve finally settled down and broken/made up, you’ll want those accounts back under your control.
So there might need to be a feature to allow for ‘remorse’ over the deletion of an account. Maybe not for the jealous partner reason, even, but perhaps just because you forgot about a service you were making use of by that account, and which you want to resurrect.
OK, so many sites have a ‘resurrect’ function, or a ‘cool-down’ period before actually terminating an account.
Facebook, for instance, will not delete your account until you’ve been inactive for 30 days.
3. Warrants to search your history
Let’s say you’re a terrorist. Or a violent criminal, or a drug baron, or simply someone who needs to be sued for slanderous / libelous statements made online.
OK, in this case, you don’t WANT the server to keep your history – but to satisfy warrants of this sort, a lawyer is likely to tell the server’s operators that they have to keep history for a specific period of time before discarding them. This allows for court orders and the like to be executed against the server to enforce the rule of law.
So your server probably has to hold onto that data for more than the 30 day inactive period. Local laws are going to put some kind of statute on how long a service provider has to hold onto your data.
As an example, a retention notice served under the UK’s rather steep RIPA law could say the service provider has to hold on to some types of data for as much as 12 months after the data is created.
4. Financial and Business records
If you’ve paid for the service being provided, those transaction details have to be held for possible accounting audits for the next several years (in the US, between 3 and 7 years, depending on the nature of the business, last time I checked).
Obviously, you’re not going to expect an audit to go into complete investigations of all your individual service requests – unless you’re billed to that level. Still, this record is going to consist of personal details of every user in the system, amounts paid, service levels given, a la carte services charged for, and some kind of demonstration that service was indeed provided.
So, even if Ashley Madison, or whoever, provided a “full delete” service, there’s a record that they have to keep somewhere that says you paid them for a service at some time in the past.
Eternal data retention – is it inevitable?
I don’t think eternal data retention is appropriate or desirable. It’s important for developers to know data retention periods ahead of time, and to build them into the tools and services they provide.
Data retention shouldn’t be online
Hackers fetch data from online services. Offline services – truly offline services – are fundamentally impossible to steal over the network. An attacker would have to find the facility where they’re stored, or the truck the tapes/drives are traveling in, and steal the data physically.
Not that that’s impossible, but it’s a different proposition from guessing someone’s password and logging into their servers to steal data.
Once data is no longer required for online use, and can be stored, move it into a queue for offline archiving. Developers should make sure their archivist has a data destruction policy in place as well, to get rid of data that’s just too old to be of use. Occasionally (once a year, perhaps), they should practice a data recovery, just to make sure that they can do so when the auditors turn up. But they should also make sure that they have safeguards in place to prevent/limit illicit viewing / use of personal data while checking these backups.
Not everything has to be retained
Different classifications of data have different retention periods, something I alluded to above. Financial records are at the top end with seven years or so, and the minutiae of day-to-day conversations can probably be deleted remarkably quickly. Some services actually hype that as a value of the service itself, promising the messages will vanish in a snap, or like a ghost.
When developing a service, you should consider how you’re going to classify data so that you know what to keep and what to delete, and under what circumstances. You may need a lawyer to help with that.
Managing your data makes service easier
If you lay the frameworks in place when developing a service, so that data is classified and has a documented lifecycle, your service naturally becomes more loosely coupled. This makes it smoother to implement, easier to change, and more compartmentalised. This helps speed future development.
Providing user lifecycle engenders trust and loyalty
Users who know they can quit are more likely to remain loyal (Apple aside). If a user feels hemmed in and locked in place, all that’s required is for someone to offer them a reason to change, and they’ll do so. Often your own staff will provide the reason to change, because if you’re working hard to keep customers by locking them in, it demonstrates that you don’t feel like your customers like your service enough to stay on their own.
So, be careful who you give data to
Yeah, I know, “to whom you give data”, thanks, grammar pedants.
Remember some basic rules here:
1. Data wants to be free
Yeah, and Richard Stallmann’s windows want to be broken.
Data doesn’t want anything, but the appearance is that it does, because when data is disseminated, it essentially cannot be returned. Just like if you go to RMS’s house and break all his windows, you can’t then put the glass fragments back into the frames.
Developers want to possess and collect data – it’s an innate passion, it seems. So if you give data to a developer (or the developer’s proxy, any application they’ve developed), you can’t actually get it back – in the sense that you can’t tell if the developer no longer has it.
2. Sometimes developers are evil – or just naughty
Occasionally developers will collect and keep data that they know they shouldn’t. Sometimes they’ll go and see which famous celebrities used their service recently, or their ex-partners, or their ‘friends’ and acquaintances.
3. Outside of the EU, your data doesn’t belong to you
EU data protection laws start from the basic assumption that factual data describing a person is essentially the non-transferrable property of the person it describes. It can be held for that person by a data custodian, a business with whom the person has a business relationship, or which has a legal right or requirement to that data. But because the data belongs to the person, that person can ask what data is held about them, and can insist on corrections to factual mistakes.
The US, and many other countries, start from the premise that whoever has collected data about a person actually owns that data, or at least that copy of the data. As a result, there’s less emphasis on openness about what data is held about you, and less access to information about yourself.
Ideally, when the revolution comes and we have a socialist government (or something in that direction), the US will pick up this idea and make it clear that service providers are providing a service and acting only as a custodian of data about their customers.
Until then, remember that US citizens have no right to discover who’s holding their data, how wrong it might be, or to ask for it to be corrected.
4. No one can leak data that you don’t give them
Developers should also think about this – you can’t leak data you don’t hold. Similarly, if a user doesn’t give data, or gives incorrect or value-less data, if it leaks, that data is fundamentally worthless.
The fallout from the Ashley Madison leak is probably reduced significantly by the number of pseudonyms and fake names in use. Probably.
Hey, if you used your real name on a cheating web site, that’s hardly smart. But then, as I said earlier today, sometimes security is about protecting bad people from bad things happening to them.
5. Even pseudonyms have value
You might use the same nickname at several places; you might provide information that’s similar to your real information; you might link multiple pseudonymous accounts together. If your data leaks, can you afford to ‘burn’ the identity attached to the pseudonym?
If you have a long message history, you have almost certainly identified yourself pretty firmly in your pseudonymous posts, by spelling patterns, word usages, etc.
Leaks of pseudonymous data are less problematic than leaks of eponymous data, but they still have their problems. Unless you’re really good at OpSec.
Finally, I was disappointed earlier tonight to see that Troy had already covered some aspects of this topic in his weekly series at Windows IT Pro, but I think you’ll see that his thoughts are from a different direction than mine.