Almost a decade back, I wrote a blog post to try to help find a way to describe the many headed beast that is the personal data eco-system, and specifically the scenario in which individuals are in data sharing relationships with organisations – sometimes knowingly, sometimes not. Having spent the last 6 months deep in the data weeds of GDPR, I thought I’d take time out and update that post to see what had changed, and what’s coming.
Firstly, data volumes and complexity have clearly gone through the roof over the period, and that shows no sign of slowing.
Awareness that there are problems in the space has also gone through the roof. After thousands of data breaches with no obvious direct consequence; Facebook and Cambridge Analytica have managed to ‘trump’ all of that by enabling significant attacks on democracy – a story that will continue to rumble for years I expect.
The General Data Protection Regulation (GDPR) itself is clearly a line in the sand and bringing with it at least some sense of optimism. This New York Times article sums things up nicely; this problem has been building for a decade, and more of the same is not the answer. We need to migrate towards an architecture that places the individual at the centre of their own personal data eco-system, and gives them the tools to originate, integrate and share data about themselves in controlled, audited ways.
The definitions used a decade ago remain the same at the high level, but are worth re-stating and building out with more examples to aid clarity. They are shown in this diagram, and expanded upon below; this categorization is based on provenance, i.e. what is the nature and source of a specific piece of data. It is not about saying a data attribute has to live in one bucket or another; in fact quite the opposite; for example, Alice’s home address can be found in each of these buckets, with different use characteristics and implications depending on which it is being sourced from for any specific use.
My Data – Is that personal data available to an individual that has not yet been shared, or has been shared but under terms/ agreements that enable the individual to retain control over access to the data. Examples would include buying intentions, upcoming life events or changes in circumstance, likes, dislikes, preferences and many more. To date this data has not really existed in any solid, scalable way; most of the innovation in the past two decades has been around the other four data buckets. But My Data, when delivered at scale in standardized ways is the true game changer.
Your Data – In the context of person to organization data sharing, ‘Your Data’ is that which in clearly brought to the party by the organization. That might seem a bit obscure, but is actually an incredibly important data-set; no-one knows as much about an organisations products / services than the organization that produces and sells them. So, when the supply organization chooses to expose its product / service data along with its many attributes (spec, components, provenance, pricing, availability, quality issues, competitors) online via API’s it has the chance to embed and leverage that knowledge and insight in many downstream places.
Our Data - Is that data at the nexus of the customer-supplier relationship; it includes the customer-product relationship (e.g. ownership), valuations, transactions, preferences, interactions and product / service use, including in the IoT data generation sense. However, this data-set carries many problems, almost all down to the current technical, contractual and legal architecture that sees data storage and the terms set around that almost always being defined by and thus favouring the organization / supply side. A significant majority of the issues that GDPR attempts to improve upon are in this Our Data category; for example, ‘data access’ is about enabling an individual to see into the relevant Our Data bucket and understanding what data is held, who it is shared with and what is being done with it. In turn, ‘data portability’ is about moving a specific data-set from one Our Data bucket to the related My Data one (or indeed to another related Our Data one related to the same individual). One individual can easily have a hundred or more ‘our data’ relationships to manage, and herein lies the bulk of the problem and the opportunity discussed in more detail below.
Their Data – Is the data about a specific individual, but being held / used by an entity with no direct data sharing (or otherwise) relationship. This space is largely occupied by the shadowy world of digital advertising networks (adtech), data brokers, credit bureau, but also includes government / security surveillance activities. GDPR brings some clarity and potential relief in this area.
Everybody’s Data – Is the data typically known as ‘open data’, i.e. typically a curated data –set made available in the public interest. A good example of that in the context of this paper would be the register of UK Data Controllers made available by the Information Commissioner. Other examples include anything from geographic co-ordinates (latitude and longitude) to the ASOS product catalogue (a contrast to most who deploy this as ‘Your Data’). Many other examples can be found at the Open Data Directory, best practice is to make this open data available online (consumable via an API) as this enables the data to be used without clunky upload and download processes.
Core Data – Is the core identity data that, like it or not, is readily available and thus spread across and central to all of these many other data-sets. Typically this will be ‘name’, ‘date of birth’ and ‘gender’, or proxies thereof; and often this will be attached to core location data such as work or home address.
The Upcoming My Data Game Changer
As noted above, the ability for individuals to manage and use ‘My Data’ at scale is the game changer – for both individuals and organisations. The current data / systems architecture in which organisations are the holders of the customer-supplier relationship record does not make sense when a solid alternate built around the individual is possible. In numerical terms, the current modus operandi means that even though there are only 7 billion people on earth, there are in my estimation, about 1 trillion customer records. According to Dunn & Bradstreet there are about 220 million organisations worldwide. If individuals had smartphones 25 years back then the model that would have evolved would have been one on which the individual held the relationship record and the relevant organisations subscribed and published to that; that’s by far the more technically efficient model.
The upcoming change is therefore not going to be driven primarily by regulation (and the associated huge fines) or even advanced concepts such as Privacy by Design. It will be driven by simple economics and efficiency; the best data-set available on an individual will be that controlled by the individual; i.e. My Data. ‘Best data’ in this context means:
• Most accurate
• Most up to date
• Most compliant with relevant legislation globally
• Least costly to manage and use
• Readily accessible with modern technologies
• Future looking and future proof
What that means in practical terms for both parties is that there will be a migration from the Organisation Push model to Customer Pull; demand will drive supply, not the other way around. That will eliminate huge amounts of guesswork and waste from over-production of goods through to the enormous time and money sink that is direct marketing and online advertising.
An illustration of what that might look like in practice on the individual’s side is below. This could be thought of as not only do I have an account with my suppliers, but they have an account with me. We have a data sharing pipe between us through which data moves backwards and forwards in appropriate ways, governed by a contract that both agree to.
With this evolving tool-set, I can do the following in the same way across all of my data sharing relationships with organisations irrespective of their technical platform:
Understand, before I share data, the basis and terms under which the organisation wants my data, what types of data they want, what they wish to do with it, and who they wish to share it with.
Where I have gone ahead and shared data, get a receipt for that shared data showing what all those involved have signed up to.
Have ongoing access to the data I have shared, and all data then related to it by my suppliers and ‘things’ so that I can re-use it, correct or update incorrect data, download or share a copy of it, withdraw consent for its use, express preferences around specific uses such as being profiled or subject to decisions made by machines, and ultimately ask for my data record to be deleted where that is a valid option.
Aggregate, combine, voluntarily build upon and, should I wish to, share the combined data from across all or a sub-set of my data sharing relationships.
The My Data data-set for each individual also has huge value for each person even without sharing. When Artificial Intelligence (AI) techniques are applied to this very rich data-set then many insights can be gleaned without sharing, more still when the My Data-set across many individuals is combined for their or the common good.
So, how long is it going to take for all of that giant change to take shape? That’s a tricky one; and dependent on the speed at which early adopter firms emerge and deploy at scale. What I do know though is that the clock starts ticking in just a couple of days, 25th May 2018.