Personal data should not be published unless it is legal to do so.
Personal data can be published if individuals cannot be identified in the data. People’s identities can be hidden by various techniques e.g.
- removing columns such as name and address
- making data less accurate (changing an age of 13 to 10-19 years old)
- removing outlying data
- and many more sophisticated methods.
De-identification takes effort however the data can be extremely valuable to researchers, policy makers and others.
What effort should publishers put into de-identifying data versus releasing other non-sensitive data or other improvements?