Aug. 10, 2017, 3:25 p.m.
The Bengaluru-based lawyer who specialises in technology, says Big Data and machine learning have broken the decades old consent model, and there is a need for a new paradigm
Privacy and data protection are hot topics in India today. The Supreme Court has just finished hearing arguments on whether privacy is a fundamental right. (It’s yet to give its decision.) The government has set up a committee under Justice BN Srikrishna to draft a data protection bill. For decades, across the world, data protection has been guided by the ‘consent principle’—that the consent of an individual is required for the collection, use, or disclosure of personal information. But, is that enough in this age of social media, mobile, internet of things, machine learning and Big Data?
Rahul Matthan, a partner at law firm Trilegal, says no. He recently published a discussion paper at Takshashila Institution, a Bengaluru-based think tank, offering a new model for data protection that’s based on rights and accountability. In short, Matthan’s argument is that those who collect, process, use or share an individual’s data cannot use one’s consent as a fig leaf to shield themselves from any harm they might cause. They should be held accountable irrespective of consent.
In the interview, Matthan talks about the limitations of consent, the concept of data auditors and IndiaStack’s consent architecture. Edited excerpts:
Q: You say the problem with the current model is that once a business or a government agency takes your consent to use or share data, it’s not liable for any harm it might cause. Can you explain?
I talk of different types of harm—financial, reputational, harm due to manipulation of choice. Financial harm is probably the easiest example, because the harm is quite direct. Let’s talk about how the harm is caused and get to consent in a bit.
When you look at micro-banking, microfinance and various other types of finance right now, deals rely very heavily on the analysis of whether a person is a good credit risk or not by leaning on various extraneous factors. We have social media-driven microfinance companies that will give you a loan based on other factors than what is specified in banking regulations. They take a look at your history of paying your mobile phone bills, the kind of products you buy on Amazon and those kinds of things.
The benefit is that a lot of unbanked can get banked, but the downside is that there will be additional grounds based upon which one could be denied a loan. So, you can think of this as a way in which more people get into the banking net. You can also think of this as a method by which many, who are currently entitled to a loan, suddenly become disqualified because additional factors about credit worthiness have come to light. When you are denied a loan you automatically suffer financial harm, it affects your ability to function, to buy a car or whatever. We don’t see it very pronounced in India because all of it is only starting here. But you see a lot of bad effects in the US and Europe, where this kind of algorithmic processing has been happening for a long time.
The relationship to consent is not actually quite direct. With the framework we have today, we need to give our consent to someone who is processing all of this information and coming to a specific conclusion about us. If someone has come to this conclusion about my credit worthiness by taking information without my consent, then I can say, look you have denied me a loan wrongfully because you should not have used my information without my consent. Therefore you have caused harm due to violation of my privacy.
In the current day and age, there are three reasons why consent is less meaningful. We give consent to everything. First, there is consent fatigue. We don’t read privacy policies because they are incredibly dense, and there are so many of them. There have been many instances where people have put in all sorts of completely obnoxious things just to prove that no one reads consent forms. Because you don’t read it, you may be providing extremely broad consent for people, for example, to take Facebook data and use it to make a decision on financial creditworthiness and things like that. It could have been expressly seen, but you would have never seen it. If you ask them, they will point to this provision which you have accepted. And you have of course accepted it, because without accepting it, you can’t use it. So, this sort of artificial consent can be used against you.
The other point is the fact that you are socially required to connect to your friends on WhatsApp. So, if you don’t sign up for WhatsApp terms and conditions, you absolutely miss out on interaction with them. The argument that we can choose not to sign the terms and conditions, doesn’t hold when you are denied a service that in this day has become a necessity. It’s very hard to opt out of WhatsApp, because a lot of people interact just on WhatsApp. In many ways that choice to opt out is not a real choice.
The real power of data today, the Big Data, is with the interconnection of all of these databases
Second, a lot of databases today are designed to be interoperable, they are API driven. You can interconnect your Fitbit data with your food data, or purchasing data, or GPS data, etc. The real power of data today, the Big Data, is with the interconnection of all of these databases. The point I make with that is, while you give consent to giving your data to each of those data sets, you may also give consent to interconnect that data with something else. It’s actually very difficult for us, even if we went through all the policies, to assess what the impact on privacy will be—if multiple databases are connected, how exactly those combinations could be used to harm us. Very often new business models come at the interconnection of these kinds of data sets. Those business models may be privacy-invasive but still be legitimate on consent framework because we have consented to each of the various parts of data collection.
Third, is the machine learning aspect. We have these very powerful machine learning algorithms today. They can find patterns in what I call ambient data. They collect completely innocuous, non-personal data about you, find patterns out of that, and those patterns can be potentially harmful. Consent covers personal data, and sometimes sensitive personal data. Now, if you are not required to give consent for some kind of data, and that data could be used to find something deeply personal about you, then, what is the point in having the consent filter? They can achieve everything by collecting data for which consent was not required in the first place.
Q: In the rights model that you suggest you say “there will be no restriction on the collection or processing of data” and there would be “no need to first obtain the consent of that person before collection”. Does it not go against some of the principles of privacy such as notice, purpose limitation, etc?
It does. All of those principles are actually based on the consent model. Let us take the principle of notice, which says that I should be notified when certain types of data is collected. But, if I don’t have the ability with that notice to actually assess what harm it could cause me, then what’s the point in being notified? I cannot do much with that information.
Take purpose limitation. I may say I am collecting it for a particular purpose. But as you know most of these purposes are really broad. The reason they are broad is that we can use it for other things than what we had initially intended.
I don’t just want to say that consent is flawed but I do want to provide a solution. And that solution is accountability
The point I am trying to make is, I don’t just want to say that consent is flawed but I do want to provide a solution. And that solution is accountability. Regardless of the fact that you have provided consent, regardless of the fact that you have done A, B and C, you are still accountable. You can’t use consent as a fig leaf to protect you in those circumstances.
We are also talking about global companies that operate in India as they operate elsewhere in the world. Facebook, for example, has to operate in consent type jurisdiction in Europe. They are unlikely to come up with a new interface just for India. All I am saying is, you may have to do all those things to protect privacy in Europe which is equally applicable to India, but in India, the fact that you got consent is not going to save you. You must still be accountable regardless of the fact that you have given notice, and purpose limitations and all those things. I really don’t care about all those things, I am only concerned about whether there will be harm.
Q. Would it be fair to say, the rights model ‘unbundles’ the benefits and harm of data sharing, and makes the data controller accountable for the harm caused?
The point really is there could be unintended harm, [and] there could be intended harm. You want to capture both. Unintended harm is one of those things which even the company that’s doing the collection does not fully appreciate. If you are using a machine learning algorithm and it gives out a result that was gender-biased, clearly the company which ran that did not intend for it to be gender-biased. But the fact that you control the process and came up with a result that was gender-biased, and even if you didn’t intend, I would still want to hold it against you.
Q: You mentioned interoperability. What happens when there are multiple agencies collecting data, processing it—how will you say you caused the harm? How does one apportion the guilt?
The principle is joint and several liability. Joint and several liability in law means that all or any one of them can be made liable. It’s a principle to ensure that every single person is as diligent as possible, particularly in their interactions with each other. I may choose to go after the richest of them; then I can make that person liable for the entire harm that is caused. Then that person’s only defence is that “harm has not been caused by me and I cannot be liable”. What I am trying to do with this model is to ensure that each person is absolutely liable for the data and, that if they pass the data onto someone else, they can’t pass on the liability. So, the point with all of these interconnected data sets is that as long as it’s completely anonymised there can be no harm. I want companies that collect data with the intention of sharing it to be absolutely rigorous about the anonymisation and pseudonymisation of data so that any company that received it downstream cannot do anything which will cause me harm. It’s the only mechanism I can think of in this multiple data collector world, where we can ensure that every single person, big or small, is equally rigorous about what they do.
If a small entity did something to a big data set which allows it to cause harm to someone, then that small entity would be liable for fines that are potentially as large as something that Facebook would get. In the model what we say is, it’s up to 5% of global turnover. The point there is 5% is not profit-based, it’s a turnover-based amount, so it can be actually quite significant.
Q: Do we have the institutional infrastructure in place to enforce such a model in India?
We will have to make it. I have suggested two structures. One is the structure of learned intermediaries. The concept of learned intermediary is very similar to auditors, chartered accountants and people like that who audit the financial records of every company, every year. I am suggesting something similar, but for data. Think of it as data auditor. It’s something which is being suggested without any level of detail at this stage. It’s the concept I want people to get comfortable with. There will be a lot of devil in the detail. Think about it as a group of auditors, who in some periodicity will audit the data process of every entity, and say, yes, the way this entity handles data is not going to cause harm.
These data auditors will do three levels of scrutiny. One is just to see bad actors. Take the state for example. When the state is searching the database to identify people by religion, you need to see why they are doing it. If they are identifying Muslims to ensure safe comfortable passage for the Haj, that’s acceptable. If they are identifying them for any other reason, these auditors can flag it, and say, “look this is in an inappropriate use of data”.
Second is to have the auditors actually examine the algorithm themselves in such a way that you don’t reveal the exact secret sauce of the algorithm, but still the workings are visible.
Third is the black box method of evaluation. You look at the input and output, and you see how much the output varies from standard norm.
The idea for the entire mechanism is we need to get the auditors who will understand that there is something going wrong with data models and correct them, as opposed to punish people for inadvertent mistakes. Of course, punish those who are actively seeking to do harm. But inadvertent mistakes are quite likely and happen quite often.
I envision a future where people would try to get certifications for high standards of data privacy
I would envision a future where people would try to get certifications for high standards of data privacy and eventually we as consumers of these services will start moving to services that are certified AAA for privacy standards, or something like that, and choose not to use similar services that don’t have the same level of privacy ratings. The audit mechanism for now is to see if people are compliant. In time, I am hoping it will be a badge of honour to say that we have for the last five years been AAA rated by such and such data auditor, learned intermediary; and so we will never compromise on these kind of things. That creates a whole currency of expectation.
Q: Do you expect a push back to this idea from businesses because it means they are more accountable, and they have to work harder?
My sense is that big, international companies today are already extremely accountable and they have been accountable for many reasons. One, they are subject to highest levels of privacy standards. They are obliged to adhere to a level of privacy that the most stringent regulation in all the countries they operate in [provides for]. Internet is at the same time everywhere.
Two, there is a reputational harm. That is very difficult to walk away from—particularly abroad. Companies have been destroyed by evidence they are violating privacy and that they don’t have high standards of care. So, for the big companies I have no doubt this is going to be just marginal realignment, if at all.
In Europe there is an accountability provision that has been introduced in the GDPR (General Data Protection Regulation), so even though Europe has consent, there is increasing emphasis towards accountability. This is not a model I invented. People have been talking about it for a long time. To answer your specific question, they have been preparing for this for quite some time.
The only people this may affect are the small businessmen because they may find themselves exposed to greater risks than they have the ability to support. As it is, they are audited by chartered accountants once a year. If in addition they have to have a data audit, it’s an additional cost. But I think some of the worst leaks will happen from small players. It’s probably going to affect small and medium companies the most, but I think in a world that is so data-driven and in a world where they actually benefit from data I think it would probably be a cost that they would have to bear.
Q: How do you see the rights model co-existing with the consent architecture that’s being built as a part of IndiaStack? How do you see it going forward?
I don’t completely eliminate the concept of consent. What I am trying to say there is the principle of autonomy, which I want as an individual. I want to believe I have some rights over what people do with my data. I can understand that I may not fully appreciate it when people collect my data, what they do with it, etc, etc. But once I know what they are doing with it, and what harm it could do to me, I should have the ability to prevent them from doing it any further. That is where the consent architecture in IndiaStack plays a role. To be able to, at any point in time after the collection, switch it on and off in a granular fashion is very important. I think that consent architecture allows us to be granular in a sort of API-based way.
Very often it’s take it or leave it. These are my conditions—it’s a yes-no button—you accept it and you can go ahead and if you don’t accept it, you can’t use, say, WhatsApp. I am not sure about WhatsApp. Maybe it’s more granular. Facebook certainly is. LinkedIn is. All these companies are very granular. You can go deep into their privacy page and individually control privacy for various things. I think IndiaStack is offering that. I think that is its real benefit—that it allows us to exercise autonomy in a granular fashion once we are informed about what the consequences are. And of course, people can always completely switch off. You should always have the ability to do that.
The challenge of all that in this day and age is, how do we know who has collected data, who is building personal profile, given that data can be collected ambiently and through interconnections. But once we know, we should have the ability to switch it on and off.
Q: What has been the reaction to your discussion paper so far?
I am trying to disseminate it as far as possible, to as many people as possible. The government has set up the Srikrishna Committee to write the law. Whether this model is accepted or not, I would like it to be debated. I have taken it to a lot of people, largely academics. It went through an exercise at Takshashila as well, where they tried to tear it apart. We are going to do a round table this month or the next. I do want as many people as possible to hear the thoughts in the paper and see if it makes sense.
We can’t just blindly copy a model that has been used for the last 40 years. We have to find something that is current
Every other person is suggesting a plain vanilla consent model. If you are a real student of technology and society it will be apparent to you that consent may be useful, but it is broken, for all the reasons I mentioned. We can’t just blindly copy a model that has been used for the last 40 years. We have to find something that is current to us.