NNSquad - Network Neutrality Squad

NNSquad Home Page

NNSquad Mailing List Information

 


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ NNSquad ] Why Data Anonymization is Often Preferable to Data Deletion



           Why Data Anonymization is Often Preferable to Data Deletion

                 http://lauren.vortex.com/archive/000869.html


Earlier today in some other venues, I sent out a brief pointer to a
new report from the Canadian Information & Privacy Commissioner, a
study that explains how -- contrary to some recent memes that have
received a lot of attention -- it *is* possible to anonymize data in
ways that are not conducive to "re-identification" abuses 
( http://bit.ly/lbH5PE [IPC] ).

Several readers have asked me why I consider this report to be
important, recommended reading.  "Why not just delete the data and be
done with it?" they're essentially asking.

While it may seem neat and clean to just quickly delete data that has
the theoretical potential to be misused, that really is far too
simplistic an approach.

A primary way that we learn is by studying our own past.  This applies
in many aspects of life -- with the sort of data under discussion
being only one example.

Web activity log data can be crucial to the forensic analysis of
system errors, failures, illicit access events (and attempts) -- all
of which themselves may have significant privacy-related implications.
If we don't have enough detailed information to study, particularly in
terms of event sequencing and interactions over time, solving such
problems and protecting against future such events can be extremely
difficult, in some cases perhaps impossible.

In the health field, longitudinal (long-term) studies need ways to
analyze data in myriad forms and combinations, but obviously, we also
want to protect patient privacy appropriately.

Search quality -- finding the things that we want on the Web -- is a
rapidly evolving science and art, which would be hobbled in major ways
if it were not possible to study the kinds of searches and search
patterns in which users engage.  Such a "data starved" state of
affairs would be to the detriment of search service users in short
order.

And these are just a few examples of why quickly disposing of data is
in many cases impractical, undesirable, or both.

Fundamentally, to approach this area reasonably we need to consider
retained data "life cycles" in context.

There are some situations -- such as an anonymous tip line -- where to
operate legitimately no data regarding caller identities typically
should be maintained at all.

But in most cases involving conventional Web services, the need to
maintain completely intact data (e.g. server log records)
progressively decreases with the passage of time, which suggests that
an appropriate approach is a defined process of gradually anonymizing
various data elements via suitable techniques and algorithms, while
still maintaining for as long as possible enough structurally detailed
intact and "hashed" data fields to permit continuing analysis and
study for as long as possible.

This is in fact the way that many firms' data life-cycle retention
policies do operate.

Having appropriate policies in place to deal with these issues is
crucial of course, and needs to extend to longer-term backup and
archival aspects as well.

Note, however, that various governments around the world, including
increasingly the U.S., have a rather "schizoid" view of these issues,
simultaneously pressing for companies to delete various user data, but
also to retain much other data (in fully identifiable, non-anonymized
forms) to be delivered on demand to government agencies for
retrospective surveillance and analysis by law enforcement and
intelligence operations.

So we see that, as usual, these are complicated matters, indeed.

But it does seem clear that there are many situations where
appropriate, effective data anonymization is not only extremely useful
for services and users alike, but obviously superior to simplistic
calls for the rapid deletion of data in its entirety.

With the increasing evidence that reports of anonymization's "death"
have been (as Mark Twain would have said) "greatly exaggerated," we
can continue to move forward toward the best technical and policy
approaches for handling retained data, that maximize that data's
potential for improving our lives, while simultaneously minimizing the
risks of it being abused.

--Lauren--
Lauren Weinstein (lauren@vortex.com): http://www.vortex.com/lauren
Co-Founder: People For Internet Responsibility: http://www.pfir.org
Founder:
 - Network Neutrality Squad: http://www.nnsquad.org
 - Global Coalition for Transparent Internet Performance: http://www.gctip.org
 - PRIVACY Forum: http://www.vortex.com
Member: ACM Committee on Computers and Public Policy
Blog: http://lauren.vortex.com
Twitter: https://twitter.com/laurenweinstein 
Google Buzz: http://j.mp/laurenbuzz 
Tel: +1 (818) 225-2800 / Skype: vortex.com